tp.model
Class DocsRepresentDB

java.lang.Object
  extended by tp.model.DocsRepresentDB

public class DocsRepresentDB
extends java.lang.Object

Class representing all documents represented by the DocRepresentTable class.


Nested Class Summary
 class DocsRepresentDB.FeatureDF
          Class to store frequencies of features in documents.
static class DocsRepresentDB.Preprocessing
           
static class DocsRepresentDB.RepresentationFeature
           
static class DocsRepresentDB.RepresentationModel
           
 
Constructor Summary
DocsRepresentDB(DocsRepresentDB.RepresentationModel model, DocsRepresentDB.RepresentationFeature feature, DocsRepresentDB.Preprocessing preprocessing)
          Class construtor.
 
Method Summary
 void addDocument(DocRepresentTable document)
          Adds a new document representation.
 double getAvgCountFeatures()
          Gets the average count of features in documents.
 double getAvgFrequentFeature()
          Gets the average feature frequency in documents.
 double getAvgTFFeature()
          Gets the average value of term frequency among all documents.
 double getAvgTFidfFeature()
          Gets the average value of TF-IDF weight among all documents.
 int getCountDocuments()
          Returs the count of document representations in database.
 int getCountFeatures()
          Gets the count of all features in documents.
 DocRepresentTable getDocumentAt(int index)
          Returns one document representation at the given position
 DocRepresentTable getDocumentSingly()
          Method returns one document, which is next to the actual cursor.
 DocsRepresentDB.RepresentationFeature getFeature()
          Returns a feature type.
 int getFeatureFreqInCorpus(java.lang.String feature)
          Returns an IDF value of a feature for the whole dataset
 java.util.ArrayList<DocsRepresentDB.FeatureDF> getFeaturesDocumentFreq()
          Returns an array of document frequencies for all features
 int getMaxCountFeatures()
          Gets the maximum count of features in one document.
 int getMaxFrequentFeature()
          Gets the maximum frequency of a feature among all documents.
 double getMaxTFFeature()
          Gets the maximum value of term frequency among all documents.
 double getMaxTFidfFeature()
          Gets the maximum value of TF-DF weight among all documents.
 int getMinNGramOccur()
          Gets the minimum N-gram occurence in documents.
 DocsRepresentDB.RepresentationModel getModel()
          Returns actual representation model.
 int getNGramDepth()
          Gets the N-gram depth setting (number of words in one N-gram).
 DocsRepresentDB.Preprocessing getPreprocessing()
          Returns actual pre-processing options.
 void setAvgCountFeatures(double avg_count_features)
          Sets the average count of features in documents.
 void setAvgFrequentFeature(double avg_frequent_feature)
          Sets the average feature frequency in documents.
 void setAvgTFFeature(double avg_tf_feature)
          Sets the average value of term frequency among all documents.
 void setAvgTFidfFeature(double avg_tfidf_feature)
          Sets the average value of TF-IDF weight among all documents.
 void setCountFeatures(int count_features)
          Sets the count of all features in documents.
 void setMaxCountFeatures(int max_count_features)
          Sets the maximum count of features in one document.
 void setMaxFrequentFeature(int max_frequent_feature)
          Sets the maximum frequency of a feature among all documents.
 void setMaxTFFeature(double max_tf_feature)
          Sets the maximum value of term frequency among all documents.
 void setMaxTFidfFeature(double max_tfidf_feature)
          Sets the maximum value of TF-DF weight among all documents.
 void setMinNGramOccur(int min_ngram_occur)
          Sets the minimum N-gram occurence in documents.
 void setNGramDepth(int ngram_depth)
          Sets the N-gram depth setting (number of words in one N-gram).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocsRepresentDB

public DocsRepresentDB(DocsRepresentDB.RepresentationModel model,
                       DocsRepresentDB.RepresentationFeature feature,
                       DocsRepresentDB.Preprocessing preprocessing)
Class construtor.

Parameters:
model - a model for document representation
feature - tyoe of feature used in this representation
preprocessing - pre-processing options
Method Detail

addDocument

public void addDocument(DocRepresentTable document)
Adds a new document representation.

Parameters:
document - document to be added

getDocumentSingly

public DocRepresentTable getDocumentSingly()
Method returns one document, which is next to the actual cursor.

Returns:
one document representation

getDocumentAt

public DocRepresentTable getDocumentAt(int index)
Returns one document representation at the given position

Parameters:
index - a position of a required document
Returns:
one document representation

getCountDocuments

public int getCountDocuments()
Returs the count of document representations in database.

Returns:
count of all documents

getFeatureFreqInCorpus

public int getFeatureFreqInCorpus(java.lang.String feature)
Returns an IDF value of a feature for the whole dataset

Parameters:
feature - a feature (word)
Returns:
IDF value

getFeaturesDocumentFreq

public java.util.ArrayList<DocsRepresentDB.FeatureDF> getFeaturesDocumentFreq()
Returns an array of document frequencies for all features

Returns:
array of document frequencies

getFeature

public DocsRepresentDB.RepresentationFeature getFeature()
Returns a feature type.

Returns:
feature type

getModel

public DocsRepresentDB.RepresentationModel getModel()
Returns actual representation model.

Returns:
a model

getPreprocessing

public DocsRepresentDB.Preprocessing getPreprocessing()
Returns actual pre-processing options.

Returns:
object representing pre-processing options

getAvgCountFeatures

public double getAvgCountFeatures()
Gets the average count of features in documents.

Returns:
average count of features

setAvgCountFeatures

public void setAvgCountFeatures(double avg_count_features)
Sets the average count of features in documents.

Parameters:
avg_count_features - average count of features

getAvgFrequentFeature

public double getAvgFrequentFeature()
Gets the average feature frequency in documents.

Returns:
average feature frequency value

setAvgFrequentFeature

public void setAvgFrequentFeature(double avg_frequent_feature)
Sets the average feature frequency in documents.

Parameters:
avg_frequent_feature - average feature frequency value

getCountFeatures

public int getCountFeatures()
Gets the count of all features in documents.

Returns:
count of all features

setCountFeatures

public void setCountFeatures(int count_features)
Sets the count of all features in documents.

Parameters:
count_features - count of all features

getMaxCountFeatures

public int getMaxCountFeatures()
Gets the maximum count of features in one document.

Returns:
maximum count of features

setMaxCountFeatures

public void setMaxCountFeatures(int max_count_features)
Sets the maximum count of features in one document.

Parameters:
max_count_features - maximum count of features

getMaxFrequentFeature

public int getMaxFrequentFeature()
Gets the maximum frequency of a feature among all documents.

Returns:
maximum feature frequency

setMaxFrequentFeature

public void setMaxFrequentFeature(int max_frequent_feature)
Sets the maximum frequency of a feature among all documents.

Parameters:
max_frequent_feature - maximum feature frequency

getAvgTFFeature

public double getAvgTFFeature()
Gets the average value of term frequency among all documents.

Returns:
average value of TF

setAvgTFFeature

public void setAvgTFFeature(double avg_tf_feature)
Sets the average value of term frequency among all documents.

Parameters:
avg_tf_feature - average value of TF

getAvgTFidfFeature

public double getAvgTFidfFeature()
Gets the average value of TF-IDF weight among all documents.

Returns:
average value of TF-IDF

setAvgTFidfFeature

public void setAvgTFidfFeature(double avg_tfidf_feature)
Sets the average value of TF-IDF weight among all documents.

Parameters:
avg_tfidf_feature - average value of TF-IDF

getMaxTFFeature

public double getMaxTFFeature()
Gets the maximum value of term frequency among all documents.

Returns:
maximum value of TF

setMaxTFFeature

public void setMaxTFFeature(double max_tf_feature)
Sets the maximum value of term frequency among all documents.

Parameters:
max_tf_feature - maximum value of TF

getMaxTFidfFeature

public double getMaxTFidfFeature()
Gets the maximum value of TF-DF weight among all documents.

Returns:
maximum value of TF-IDF

setMaxTFidfFeature

public void setMaxTFidfFeature(double max_tfidf_feature)
Sets the maximum value of TF-DF weight among all documents.

Parameters:
max_tfidf_feature - maximum value of TF-IDF

getMinNGramOccur

public int getMinNGramOccur()
Gets the minimum N-gram occurence in documents.

Returns:
minimum N-gram occurence

setMinNGramOccur

public void setMinNGramOccur(int min_ngram_occur)
Sets the minimum N-gram occurence in documents.

Parameters:
min_ngram_occur - minimum N-gram occurence

getNGramDepth

public int getNGramDepth()
Gets the N-gram depth setting (number of words in one N-gram).

Returns:
N-gram depth

setNGramDepth

public void setNGramDepth(int ngram_depth)
Sets the N-gram depth setting (number of words in one N-gram).

Parameters:
ngram_depth - N-gram depth