tp.builder
Class BuilderWord

java.lang.Object
  extended by 
      extended by tp.builder.BuilderWord
All Implemented Interfaces:
BuilderInterface

public class BuilderWord
extends
implements BuilderInterface

Class, which makes representation of text data by words.


Constructor Summary
BuilderWord(DocsRepresentDB.RepresentationModel model, DocsRepresentDB.Preprocessing preprocessing)
          Constructor of the class
 
Method Summary
 DocsRepresentDB buildRepresentation(DocumentsDatabase database)
          Gets text documents from a database and creates their word representation.
 DocsRepresentDB buildRepresentation(DocumentsDatabase database, int depth)
          Mthod creates a representation, if the depth is equal to "1" - because this class creates word representation (higher depth is possible only for N-gram representation).
protected  java.lang.Void doInBackground()
           
 void setMinNGramOccur(int min_ngram_occur)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface tp.builder.BuilderInterface
addPropertyChangeListener
 

Constructor Detail

BuilderWord

public BuilderWord(DocsRepresentDB.RepresentationModel model,
                   DocsRepresentDB.Preprocessing preprocessing)
Constructor of the class

Parameters:
model - model used for representation (binary, TF or TF/IDF)
preprocessing - preprocessing options selected by the user
Method Detail

buildRepresentation

public DocsRepresentDB buildRepresentation(DocumentsDatabase database)
Gets text documents from a database and creates their word representation.

Specified by:
buildRepresentation in interface BuilderInterface
Parameters:
database - input database of text documents
Returns:
database of all document representations

buildRepresentation

public DocsRepresentDB buildRepresentation(DocumentsDatabase database,
                                           int depth)
Mthod creates a representation, if the depth is equal to "1" - because this class creates word representation (higher depth is possible only for N-gram representation).

Specified by:
buildRepresentation in interface BuilderInterface
Parameters:
database - database with all documents
depth - depth (length of a feature) - "1" for word representation
Returns:
created representation

doInBackground

protected java.lang.Void doInBackground()
                                 throws java.lang.Exception
Throws:
java.lang.Exception

setMinNGramOccur

public void setMinNGramOccur(int min_ngram_occur)
Specified by:
setMinNGramOccur in interface BuilderInterface