Doc. Dr. Ing. Jan Černocký
| Oparin, I., Glembek, O., Burget, L., Černocký, J.: Morphological random forests for language modeling of inflectional languages, In: Proc. 2008 IEEE Workshop on Spoken Language Technology, Goa, IN, IEEESP, 2008, s. 4, ISBN 978-1-4244-3472-5 | | Jazyk publikace: | angličtina |
|---|
| Název publikace: | Morphological random forests for language modeling of inflectional languages |
|---|
| Název (cs): | Morfologické náhodné lesy pro jazykové modelování ohebných jazyků |
|---|
| Strany: | 4 |
|---|
| Sborník: | Proc. 2008 IEEE Workshop on Spoken Language Technology |
|---|
| Konference: | 2008 IEEE Workshop on Spoken Language Technology |
|---|
| Místo vydání: | Goa, IN |
|---|
| Rok: | 2008 |
|---|
| ISBN: | 978-1-4244-3472-5 |
|---|
| Vydavatel: | IEEE Signal Processing Society |
|---|
| URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2008/Oparin_SLT2008.pdf [PDF] |
|---|
| Klíčová slova |
|---|
speech recognition, language modeling
|
| Anotace |
|---|
Článek je o morfologických náhodných lesích pro jazykové modelování ohebných jazyků
|
| Abstrakt |
|---|
| In this paper, we are concerned with using decision trees (DT)
and random forests (RF) in language modeling for Czech
LVCSR. We show that the RF approach can be successfully
implemented for language modeling of an inflectional language.
Performance of word-based and morphological DTs
and RFs was evaluated on lecture recognition task. We show
that while DTs perform worse than conventional trigram language
models (LM), RFs of both kind outperform the latter.
WER (up to 3.4% relative) and perplexity (10%) reduction
over the trigram model can be gained with morphological
RFs. Further improvement is obtained after interpolation of
DT and RF LMs with the trigram one (up to 15.6% perplexity
and 4.8% WER relative reduction). In this paper we also investigate
distribution of morphological feature types chosen
for splitting data at different levels of DTs. |
| BibTeX: |
|---|
@INPROCEEDINGS{
author = {Ilya Oparin and Ondřej Glembek and Lukáš Burget and Jan
Černocký},
title = {Morphological random forests for language modeling of
inflectional languages},
pages = {4},
booktitle = {Proc. 2008 IEEE Workshop on Spoken Language Technology},
year = {2008},
location = {Goa, IN},
publisher = {IEEE Signal Processing Society},
ISBN = {978-1-4244-3472-5},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=8844}
} |
|