Publication Details

Nalezení slovních kořenů v češtině

CHMELAŘ Petr, HELLEBRAND David, HRUŠECKÝ Michal and BARTÍK Vladimír. Nalezení slovních kořenů v češtině. In: Znalosti 2011: Sborník příspěvků 10. ročníku konference. Stará Lesná: VŠB-Technical University of Ostrava, 2011, pp. 66-77. ISBN 978-80-248-2369-0.
English title
Czech Stemming Algorithm
Type
conference paper
Language
czech
Authors
Chmelař Petr, Ing. (DIFS FIT BUT)
Hellebrand David, Ing. (FIT BUT)
Hrušecký Michal (MFF CUNI)
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT)
Keywords

Lemmatization, stemming, Snowball, Czexh language, grammar.

Abstract

The goal was to create an algorithm for stemming Czech language based on grammatical rules, in addition to methods using vocabulary for retrieval and mining of Czech texts. The article includes the basics of Czech word formation for different word classes, description of problems and several stemming and lemmatization algorithms. The main contribution of this work is the implementation of the Snowball stemming algorithm for the Czech language based on complete sets of all prefixes and suffixes, which may occur in Czech words.

Published
2011
Pages
66-77
Proceedings
Znalosti 2011: Sborník příspěvků 10. ročníku konference
Conference
Znalosti 2011, Hotel Academia Stará Lesná, SK
ISBN
978-80-248-2369-0
Publisher
VŠB-Technical University of Ostrava
Place
Stará Lesná, SK
BibTeX
@INPROCEEDINGS{FITPUB9473,
   author = "Petr Chmela\v{r} and David Hellebrand and Michal Hru\v{s}eck\'{y} and Vladim\'{i}r Bart\'{i}k",
   title = "Nalezen\'{i} slovn\'{i}ch ko\v{r}en\r{u} v \v{c}e\v{s}tin\v{e}",
   pages = "66--77",
   booktitle = "Znalosti 2011: Sborn\'{i}k p\v{r}\'{i}sp\v{e}vk\r{u} 10. ro\v{c}n\'{i}ku konference",
   year = 2011,
   location = "Star\'{a} Lesn\'{a}, SK",
   publisher = "V\v{S}B-Technical University of Ostrava",
   ISBN = "978-80-248-2369-0",
   language = "czech",
   url = "https://www.fit.vut.cz/research/publication/9473"
}
Back to top