| Kombrink, S., Mikolov, T., Karafiát, M., Burget, L.: Improving Language Models for ASR Using Translated In-domain Data, In: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, JP, IEEESP, 2012, p. 4405-4408, ISBN 978-1-4673-0044-5 | | Publication language: | english |
|---|
| Original title: | Improving Language Models for ASR Using Translated In-domain Data |
|---|
| Title (cs): | Vylepsení jazykových modelu pro rozpoznávání reci pomocí prelozených dat z cílové oblasti |
|---|
| Pages: | 4405-4408 |
|---|
| Proceedings: | Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing |
|---|
| Conference: | The 37th International Conference on Acoustics, Speech, and Signal Processing |
|---|
| Place: | Kyoto, JP |
|---|
| Year: | 2012 |
|---|
| ISBN: | 978-1-4673-0044-5 |
|---|
| Publisher: | IEEE Signal Processing Society |
|---|
| URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2012/kombrink_icassp2012_0004405.pdf [PDF] |
|---|
| Keywords |
|---|
| Low Resource ASR, Language Modeling,
Machine Translation |
| Annotation |
|---|
| This paper descibes how to do the acquisition of in-domain training data for the puspose of building speech recognition systems for under-resourced languages. |
| Abstract |
|---|
| Acquisition of in-domain training data to build speech recognition
systems for under-resourced languages can be a costly,
time-demanding and tedious process. In this work, we propose
the use of machine translation to translate English transcripts
of telephone speech into Czech language in order to
improve a Czech CTS speech recognition system. The translated
transcripts are used as additional language model training
data in a scenario where the baseline language model is
trained on off- and close-domain data only. We report perplexities,
OOV and word error rates and examine different
data sets and translators on their suitability for the described
task. |
| BibTeX: |
|---|
@INPROCEEDINGS{
author = {Stefan Kombrink and Tomás Mikolov and Martin Karafiát and
Lukás Burget},
title = {Improving Language Models for ASR Using Translated In-domain
Data},
pages = {4405--4408},
booktitle = {Proceedings of 2012 IEEE International Conference on
Acoustics, Speech and Signal Processing},
year = {2012},
location = {Kyoto, JP},
publisher = {IEEE Signal Processing Society},
ISBN = {978-1-4673-0044-5},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=9927}
} |
|