Conference paper

KOMBRINK Stefan, MIKOLOV Tomáš, KARAFIÁT Martin and BURGET Lukáš. Improving Language Models for ASR Using Translated In-domain Data. In: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012, pp. 4405-4408. ISBN 978-1-4673-0044-5.
Publication language:english
Original title:Improving Language Models for ASR Using Translated In-domain Data
Title (cs):Vylepšení jazykových modelů pro rozpoznávání řeči pomocí přeložených dat z cílové oblasti
Pages:4405-4408
Proceedings:Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing
Conference:The 37th International Conference on Acoustics, Speech, and Signal Processing
Place:Kyoto, JP
Year:2012
ISBN:978-1-4673-0044-5
Publisher:IEEE Signal Processing Society
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2012/kombrink_icassp2012_0004405.pdf [PDF]
Keywords
Low Resource ASR, Language Modeling, Machine Translation
Annotation
This paper descibes how to do the acquisition of in-domain training data for the puspose of building speech recognition systems for under-resourced languages.
Abstract
Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off- and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.
BibTeX:
@INPROCEEDINGS{
   author = {Stefan Kombrink and Tom{\'{a}}{\v{s}} Mikolov and Martin
	Karafi{\'{a}}t and Luk{\'{a}}{\v{s}} Burget},
   title = {Improving Language Models for ASR Using Translated In-domain
	Data},
   pages = {4405--4408},
   booktitle = {Proceedings of 2012 IEEE International Conference on
	Acoustics, Speech and Signal Processing},
   year = {2012},
   location = {Kyoto, JP},
   publisher = {IEEE Signal Processing Society},
   ISBN = {978-1-4673-0044-5},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=9927}
}

Your IPv4 address: 54.166.245.10
Switch to IPv6 connection

DNSSEC [dnssec]