Panel s nápovědou

Options of pre-processing

The application enables to use the following methods of text documents pre-processing:
  1. Stop list
    The removal of stop words, i.e. the words, which are very frequent don't have influence on a document content representation (e.g. "the", "a", "in" ...). It is possible to import arbitrary stop list stored in a text file. In this file, words must be separated by semicolons.
  2. Stemming
    The stemming methods reduces the words into their stems. For example, the words "stemmer", "stemming" and "stemmed" are reduced to "stem". Than, these words can be represented as one feature.