Authors:Doležal Jan, Dytrych Jaroslav, Karásek Miroslav, Kouřil Jan, Otrusina Lubomír, Smrž Pavel
Keywords:corpora, processing, indexing
Set of programs for processing large text corpora. The programs transform data from the HTML format to a vertical text, its annotation at different levels and indexing in MG4J and Elastic.
