Publication Details

Language models for automatic speech recognition of Czech lectures

MIKOLOV Tomáš. LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES. In: Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008, pp. 1-5. ISBN 978-80-214-3617-6.

Czech title

Jazykové modely pro rozpoznávání českých přednášek

Type

conference paper

Language

english

Authors

Mikolov Tomáš, Ing. (DCGM FIT BUT)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2008/mikolov_eeict2008.pdf PDF

Keywords

language modeling

Abstract

The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.

Annotation

This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.

Published

2008

Pages

1-5

Proceedings

Proc. STUDENT EEICT 2008

Conference

Student EEICT 2008, Brno, CZ

ISBN

978-80-214-3617-6

Publisher

Faculty of Electrical Engineering and Communication BUT

Place

Brno, CZ

BibTeX

@INPROCEEDINGS{FITPUB8749,
   author = "Tom\'{a}\v{s} Mikolov",
   title = "Language models for automatic speech recognition of Czech lectures",
   pages = "1--5",
   booktitle = "Proc. STUDENT EEICT 2008",
   year = 2008,
   location = "Brno, CZ",
   publisher = "Faculty of Electrical Engineering and Communication BUT",
   ISBN = "978-80-214-3617-6",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/8749"
}