Doc. Dr. Ing. Jan Černocký

Oparin, I., Glembek, O., Burget, L., Černocký, J.: Morphological random forests for language modeling of inflectional languages, In: Proc. 2008 IEEE Workshop on Spoken Language Technology, Goa, IN, IEEESP, 2008, s. 4, ISBN 978-1-4244-3472-5
Jazyk publikace:angličtina
Název publikace:Morphological random forests for language modeling of inflectional languages
Název (cs):Morfologické náhodné lesy pro jazykové modelování ohebných jazyků
Strany:4
Sborník:Proc. 2008 IEEE Workshop on Spoken Language Technology
Konference:2008 IEEE Workshop on Spoken Language Technology
Místo vydání:Goa, IN
Rok:2008
ISBN:978-1-4244-3472-5
Vydavatel:IEEE Signal Processing Society
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2008/Oparin_SLT2008.pdf [PDF]
Klíčová slova
speech recognition, language modeling
Anotace
Článek je o morfologických náhodných lesích pro jazykové modelování ohebných jazyků
Abstrakt
In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs of both kind outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.
BibTeX:
@INPROCEEDINGS{
   author = {Ilya Oparin and Ondřej Glembek and Lukáš Burget and Jan
	Černocký},
   title = {Morphological random forests for language modeling of
	inflectional languages},
   pages = {4},
   booktitle = {Proc. 2008 IEEE Workshop on Spoken Language Technology},
   year = {2008},
   location = {Goa, IN},
   publisher = {IEEE Signal Processing Society},
   ISBN = {978-1-4244-3472-5},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=8844}
}

Vaše IPv4 adresa: 54.224.79.93
Přepnout na IPv6 spojení

DNSSEC [dnssec]