tp.loader
Class DataCleaner

java.lang.Object
  extended by tp.loader.DataCleaner

public class DataCleaner
extends java.lang.Object

Class for cleaning of loaded text data.


Constructor Summary
DataCleaner()
           
 
Method Summary
 java.lang.String cleanContentFile(java.lang.String content)
          Method removes some special symbols, which occur in contents of Reuters documents.
 java.lang.String cleanPlainText(java.lang.String str)
          Method removes some special symbols, which occur in plain text documents.
 java.lang.String cleanSGMLReuters(java.lang.String str)
          Method removes some special symbols, which occur in Reuters dataset.
 java.lang.String cleanTopic(java.lang.String topic)
          Method removes some special symbols, which occur in topic of Reuters documents.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataCleaner

public DataCleaner()
Method Detail

cleanSGMLReuters

public java.lang.String cleanSGMLReuters(java.lang.String str)
Method removes some special symbols, which occur in Reuters dataset.

Parameters:
str - original string
Returns:
cleaned string

cleanTopic

public java.lang.String cleanTopic(java.lang.String topic)
Method removes some special symbols, which occur in topic of Reuters documents.

Parameters:
str - original string
Returns:
cleaned string

cleanContentFile

public java.lang.String cleanContentFile(java.lang.String content)
Method removes some special symbols, which occur in contents of Reuters documents.

Parameters:
str - original string
Returns:
cleaned string

cleanPlainText

public java.lang.String cleanPlainText(java.lang.String str)
Method removes some special symbols, which occur in plain text documents.

Parameters:
str - original string
Returns:
cleaned string