Conference paper

MIKOLOV Tomáš. LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES. In: Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008, pp. 1-5. ISBN 978-80-214-3617-6.
Publication language:english
Original title:Language models for automatic speech recognition of Czech lectures
Title (cs):Jazykové modely pro rozpoznávání českých přednášek
Pages:1-5
Proceedings:Proc. STUDENT EEICT 2008
Conference:Student EEICT 2008
Place:Brno, CZ
Year:2008
ISBN:978-80-214-3617-6
Publisher:Faculty of Electrical Engineering and Communication BUT
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2008/mikolov_eeict2008.pdf [PDF]
Keywords
language modeling
Annotation
The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.
Abstract
This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.
BibTeX:
@INPROCEEDINGS{
   author = {Tom{\'{a}}{\v{s}} Mikolov},
   title = {Language models for automatic speech recognition of Czech
	lectures},
   pages = {1--5},
   booktitle = {Proc. STUDENT EEICT 2008},
   year = {2008},
   location = {Brno, CZ},
   publisher = {Faculty of Electrical Engineering and Communication BUT},
   ISBN = {978-80-214-3617-6},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=8749}
}

Your IPv4 address: 54.156.92.243
Switch to IPv6 connection

DNSSEC [dnssec]