
Options of pre-processing
The application enables to use the following methods of text documents pre-processing:- Stop list
The removal of stop words, i.e. the words, which are very frequent don't have influence on a document content representation (e.g. "the", "a", "in" ...). It is possible to import arbitrary stop list stored in a text file. In this file, words must be separated by semicolons. - Stemming
The stemming methods reduces the words into their stems. For example, the words "stemmer", "stemming" and "stemmed" are reduced to "stem". Than, these words can be represented as one feature.