Conference paperKARAFIÁT Martin, BASKAR Murali K., MATĚJKA Pavel, VESELÝ Karel, GRÉZL František, BURGET Lukáš and ČERNOCKÝ Jan. 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 719-723. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1775.PDF | Publication language: | english |
---|
Original title: | 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation |
---|
Title (cs): | 2016 systém VUT pro Babel: Multilingvální BLSTM akustický model s adaptací založenou na i-vektorech |
---|
Pages: | 719-723 |
---|
Proceedings: | Proceedings of Interspeech 2017 |
---|
Conference: | Interspeech 2017 |
---|
Place: | Stockholm, SE |
---|
Year: | 2017 |
---|
URL: | http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1775.PDF |
---|
Journal: | Proceedings of Interspeech, Vol. 2017, No. 08, FR |
---|
ISSN: | 1990-9772 |
---|
DOI: | 10.21437/Interspeech.2017-1775 |
---|
Publisher: | International Speech Communication Association |
---|
URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2017/karafiat_interspeech2017_IS171775.pdf [PDF] |
---|
Keywords |
---|
Automatic speech recognition, Multilingual neural
networks, Bidirectional Long Short Term Memory, i-vector, |
Annotation |
---|
This article is about the 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation. |
Abstract |
---|
The paper provides an analysis of BUT automatic speech recognition
systems (ASR) built for the 2016 IARPA Babel evaluation.
The IARPA Babel program concentrates on building
ASR system for many low resource languages, where only a
limited amount of transcribed speech is available for each language.
In such scenario, we found essential to train the ASR
systems in a multilingual fashion. In this work, we report superior
results obtained with pre-trained multilingual BLSTM
acoustic models, where we used multi-task training with separate
classification layer for each language. The results reported
on three Babel Year 4 languages show over 3% absolute WER
reductions obtained from such multilingual pre-training. Experiments
with different input features show that the multilingual
BLSTM performs the best with simple log-Mel-filter-bank outputs,
which makes our previously successful multilingual stack
bottleneck features with CMLLR adaptation obsolete. Finally,
we experiment with different configurations of i-vector based
speaker adaptation in the mono- and multi-lingual BLSTM architectures.
This results in additional WER reductions over 1%
absolute. |
BibTeX: |
---|
@INPROCEEDINGS{
author = {Martin Karafi{\'{a}}t and K. Murali Baskar and
Pavel Mat{\v{e}}jka and Karel Vesel{\'{y}} and
Franti{\v{s}}ek Gr{\'{e}}zl and Luk{\'{a}}{\v{s}}
Burget and Jan {\v{C}}ernock{\'{y}}},
title = {2016 BUT Babel system: Multilingual BLSTM acoustic
model with i-vector based adaptation},
pages = {719--723},
booktitle = {Proceedings of Interspeech 2017},
journal = {Proceedings of Interspeech},
volume = {2017},
number = {08},
year = {2017},
location = {Stockholm, SE},
publisher = {International Speech Communication Association},
ISSN = {1990-9772},
doi = {10.21437/Interspeech.2017-1775},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php.en?id=11579}
} |
|