public class Predzpracovani
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
boolean |
storno |
Modifier and Type | Method and Description |
---|---|
void |
beh()
Main method used to execute pre-processing, sets the execution time,
progress and all the information and executes all pre-processing methods.
|
void |
dokumentParse(Dokument dokument)
Parses a document into a set of terms and performs all necessary
pre-processing methods.
|
void |
dokumentVahuj(Dokument dokument)
Calls a suitable weighting method for a given document.
|
long |
getCas()
Returns the running time of pre-processing
|
int |
getCelkem()
Returns the count of documents
|
Data |
getData()
Returns the dataset used for pre-processing
|
static Predzpracovani |
getInstance() |
int |
getPostup()
Counts and returns general information about the progress
|
java.lang.String |
getPostupText()
Returns the actual progress information of pre-processing
|
boolean |
isZmena()
Returns, if something was changed in the class
|
boolean |
jeCislo(java.lang.String slovo)
If number removal is disabled, returns false, otherwise it is
checked, if it is a number.
|
boolean |
jeStopSlovo(java.lang.String slovo)
If stop words removal is disabled, returns false, otherwise it is
checked, if it is a stop word.
|
void |
odstanJednoDokumentoveTermy()
Removes all terms, which are contained in only one document.
|
void |
opravID()
Sets all identifiers of terms so as it can be used as an array index.
|
void |
setData(Data data)
Sets the dataset used for pre-processing
|
void |
setZmena(boolean zmena)
Sets the "change" boolean value
|
java.lang.String |
stemm(java.lang.String slovo)
If stemming has to be used, executes Porter stemmer and returns
its results.
|
void |
testSouboru() |
void |
vahujIDF(Dokument dokument)
Counts the IDF weighs for a given document, including normalization.
|
void |
vahujTF(Dokument dokument)
Counts the TF weighs for a given document, including normalization.
|
void |
vahujTFIDF(Dokument dokument)
Counts the TF-IDF weighs for a given document, including normalization.
|
void |
vyberNejcastejsiTermy()
Selects the most frequent terms.
|
void |
ziskejFrekvence()
Gets the maximum frequency of a term in the whole text collection.
|
void |
ziskejKolekciDatZeSlozky()
Takes a folder of document files and stores them as documents (including
title, category and text).
|
void |
ziskejKolekciDatZeSouboru()
Gets the document information from a file.
|
public static Predzpracovani getInstance()
public int getCelkem()
public long getCas()
public int getPostup()
public java.lang.String getPostupText()
public boolean isZmena()
public void setZmena(boolean zmena)
zmena
- boolean valuepublic Data getData()
public void setData(Data data)
data
- the input datasetpublic void testSouboru()
public void beh() throws java.lang.Exception
java.lang.Exception
public void vyberNejcastejsiTermy()
public void dokumentVahuj(Dokument dokument)
dokument
- an input documentpublic void vahujTF(Dokument dokument)
dokument
- an input documentpublic void vahujIDF(Dokument dokument)
dokument
- an input documentpublic void vahujTFIDF(Dokument dokument)
dokument
- an input documentpublic void odstanJednoDokumentoveTermy()
public java.lang.String stemm(java.lang.String slovo)
slovo
- a word to be stemmedpublic boolean jeStopSlovo(java.lang.String slovo)
slovo
- an input wordpublic boolean jeCislo(java.lang.String slovo)
slovo
- an input wordpublic void ziskejFrekvence()
public void opravID()
public void ziskejKolekciDatZeSouboru() throws java.io.IOException
java.io.IOException
public void ziskejKolekciDatZeSlozky() throws java.lang.Exception
java.lang.Exception
public void dokumentParse(Dokument dokument) throws java.io.IOException
dokument
- an input documentjava.io.IOException