Conference paperVESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF | Publication language: | english |
---|
Original title: | Semi-supervised DNN training with word selection for ASR |
---|
Title (cs): | Částečně kontrolované trénování DNN s výběrem slov pro ASR |
---|
Pages: | 3687-3691 |
---|
Proceedings: | Proceedings of Interspeech 2017 |
---|
Conference: | Interspeech 2017 |
---|
Place: | Stockholm, SE |
---|
Year: | 2017 |
---|
URL: | http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF |
---|
Journal: | Proceedings of Interspeech, Vol. 2017, No. 08, FR |
---|
ISSN: | 1990-9772 |
---|
DOI: | 10.21437/Interspeech.2017-1385 |
---|
Publisher: | International Speech Communication Association |
---|
URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2017/vesely_interspeech2017_IS171385.pdf [PDF] |
---|
Keywords |
---|
semi-supervised training, DNN, word selection,
granularity of confidences |
Annotation |
---|
The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR). |
Abstract |
---|
Not all the questions related to the semi-supervised training of
hybrid ASR system with DNN acoustic model were already
deeply investigated. In this paper, we focus on the question
of the granularity of confidences (per-sentence, per-word, perframe),
the question of how the data should be used (dataselection
by masks, or in mini-batch SGD with confidences as
weights). Then, we propose to re-tune the system with the manually
transcribed data, both with the frame CE training and
sMBR training.
Our preferred semi-supervised recipe which is both simple
and efficient is following: we select words according to the
word accuracy we obtain on the development set. Such recipe,
which does not rely on a grid-search of the training hyperparameter,
generalized well for: Babel Vietnamese (transcribed
11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed
58h) and our custom Switchboard setup (transcribed
14h, untranscribed 95h). We obtained the absolute WER improvements
2.5% for Vietnamese, 2.3% for Bengali and 3.2%
for Switchboard. |
BibTeX: |
---|
@INPROCEEDINGS{
author = {Karel Vesel{\'{y}} and Luk{\'{a}}{\v{s}} Burget
and Jan {\v{C}}ernock{\'{y}}},
title = {Semi-supervised DNN training with word selection
for ASR},
pages = {3687--3691},
booktitle = {Proceedings of Interspeech 2017},
journal = {Proceedings of Interspeech},
volume = {2017},
number = {08},
year = {2017},
location = {Stockholm, SE},
publisher = {International Speech Communication Association},
ISSN = {1990-9772},
doi = {10.21437/Interspeech.2017-1385},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11584}
} |
|