tp.builder
Class BuilderNGram

java.lang.Object
  extended by 
      extended by tp.builder.BuilderNGram
All Implemented Interfaces:
BuilderInterface

public class BuilderNGram
extends
implements BuilderInterface

Class, which makes representation of text data by N-grams.


Constructor Summary
BuilderNGram(DocsRepresentDB.RepresentationModel model, DocsRepresentDB.Preprocessing preprocessing)
          Constructor of the class
 
Method Summary
 DocsRepresentDB buildRepresentation(DocumentsDatabase database)
          Creation of word representation (if the user didn't set a depth parameter....)
 DocsRepresentDB buildRepresentation(DocumentsDatabase database, int depth)
          Gets text documents from a database and creates their N-gram representation.
protected  java.lang.Void doInBackground()
           
 void setMinNGramOccur(int min_ngram_occur)
          Sets a parameter of minimum N-gram occurence
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface tp.builder.BuilderInterface
addPropertyChangeListener
 

Constructor Detail

BuilderNGram

public BuilderNGram(DocsRepresentDB.RepresentationModel model,
                    DocsRepresentDB.Preprocessing preprocessing)
Constructor of the class

Parameters:
model - model used for representation (binary, TF or TF/IDF)
preprocessing - preprocessing options selected by the user
Method Detail

buildRepresentation

public DocsRepresentDB buildRepresentation(DocumentsDatabase database,
                                           int depth)
Gets text documents from a database and creates their N-gram representation.

Specified by:
buildRepresentation in interface BuilderInterface
Parameters:
database - input database of text documents
depth - the lentgh of N-grams
Returns:
database of all document representations

setMinNGramOccur

public void setMinNGramOccur(int min_ngram_occur)
Sets a parameter of minimum N-gram occurence

Specified by:
setMinNGramOccur in interface BuilderInterface
Parameters:
min_ngram_occur - minimum N-gram occurence

buildRepresentation

public DocsRepresentDB buildRepresentation(DocumentsDatabase database)
Creation of word representation (if the user didn't set a depth parameter....)

Specified by:
buildRepresentation in interface BuilderInterface
Parameters:
database - database of all documents
Returns:
created representation

doInBackground

protected java.lang.Void doInBackground()
                                 throws java.lang.Exception
Throws:
java.lang.Exception