Ing. Lukáš Burget, Ph.D.
| Kombrink, S., Mikolov, T., Karafiát, M., Burget, L.: Improving Language Models for ASR Using Translated In-domain Data, In: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, JP, IEEESP, 2012, s. 4405-4408, ISBN 978-1-4673-0044-5 | | Jazyk publikace: | angličtina |
|---|
| Název publikace: | Improving Language Models for ASR Using Translated In-domain Data |
|---|
| Název (cs): | Vylepšení jazykových modelů pro rozpoznávání řeči pomocí přeložených dat z cílové oblasti |
|---|
| Strany: | 4405-4408 |
|---|
| Sborník: | Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing |
|---|
| Konference: | The 37th International Conference on Acoustics, Speech, and Signal Processing |
|---|
| Místo vydání: | Kyoto, JP |
|---|
| Rok: | 2012 |
|---|
| ISBN: | 978-1-4673-0044-5 |
|---|
| Vydavatel: | IEEE Signal Processing Society |
|---|
| URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2012/kombrink_icassp2012_0004405.pdf [PDF] |
|---|
| Klíčová slova |
|---|
| Low Resource ASR, Language Modeling,
Machine Translation |
| Anotace |
|---|
| Tento článek pojednává o vylepšení jazykových modelů pro rozpoznávání řeči pomocí přeložených dat z cílové oblasti. |
| Abstrakt |
|---|
| Acquisition of in-domain training data to build speech recognition
systems for under-resourced languages can be a costly,
time-demanding and tedious process. In this work, we propose
the use of machine translation to translate English transcripts
of telephone speech into Czech language in order to
improve a Czech CTS speech recognition system. The translated
transcripts are used as additional language model training
data in a scenario where the baseline language model is
trained on off- and close-domain data only. We report perplexities,
OOV and word error rates and examine different
data sets and translators on their suitability for the described
task. |
| BibTeX: |
|---|
@INPROCEEDINGS{
author = {Stefan Kombrink and Tomáš Mikolov and Martin Karafiát and
Lukáš Burget},
title = {Improving Language Models for ASR Using Translated In-domain
Data},
pages = {4405--4408},
booktitle = {Proceedings of 2012 IEEE International Conference on
Acoustics, Speech and Signal Processing},
year = {2012},
location = {Kyoto, JP},
publisher = {IEEE Signal Processing Society},
ISBN = {978-1-4673-0044-5},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=9927}
} |
|