Publication Details

2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation

KARAFIÁT Martin, BASKAR Murali K., MATĚJKA Pavel, VESELÝ Karel, GRÉZL František, BURGET Lukáš and ČERNOCKÝ Jan. 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 719-723. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1775.PDF

Czech title

2016 systém VUT pro Babel: Multilingvální BLSTM akustický model s adaptací založenou na i-vektorech

Type

conference paper

Language

english

Authors

Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Baskar Murali K. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Grézl František, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

Automatic speech recognition, Multilingual neural networks, Bidirectional Long Short Term Memory, i-vector,

Abstract

This article is about the 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation.

Annotation

The paper provides an analysis of BUT automatic speech recognition systems (ASR) built for the 2016 IARPA Babel evaluation. The IARPA Babel program concentrates on building ASR system for many low resource languages, where only a limited amount of transcribed speech is available for each language. In such scenario, we found essential to train the ASR systems in a multilingual fashion. In this work, we report superior results obtained with pre-trained multilingual BLSTM acoustic models, where we used multi-task training with separate classification layer for each language. The results reported on three Babel Year 4 languages show over 3% absolute WER reductions obtained from such multilingual pre-training. Experiments with different input features show that the multilingual BLSTM performs the best with simple log-Mel-filter-bank outputs, which makes our previously successful multilingual stack bottleneck features with CMLLR adaptation obsolete. Finally, we experiment with different configurations of i-vector based speaker adaptation in the mono- and multi-lingual BLSTM architectures. This results in additional WER reductions over 1% absolute.

Published

2017

Pages

719-723

Journal

Proceedings of Interspeech - on-line, vol. 2017, no. 8, ISSN 1990-9772

Proceedings

Proceedings of Interspeech 2017

Conference

Interspeech Conference, Stockholm, SE

Publisher

International Speech Communication Association

Place

Stockholm, SE

DOI

10.21437/Interspeech.2017-1775

UT WoS

000457505000146

EID Scopus

2-s2.0-85039167658

BibTeX

@INPROCEEDINGS{FITPUB11579,
   author = "Martin Karafi\'{a}t and K. Murali Baskar and Pavel Mat\v{e}jka and Karel Vesel\'{y} and Franti\v{s}ek Gr\'{e}zl and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation",
   pages = "719--723",
   booktitle = "Proceedings of Interspeech 2017",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2017,
   number = 08,
   year = 2017,
   location = "Stockholm, SE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2017-1775",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11579"
}