Conference paper

VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Publication language:english
Original title:Semi-supervised DNN training with word selection for ASR
Title (cs):Částečně kontrolované trénování DNN s výběrem slov pro ASR
Pages:3687-3691
Proceedings:Proceedings of Interspeech 2017
Conference:Interspeech 2017
Place:Stockholm, SE
Year:2017
URL:http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Journal:Proceedings of Interspeech, Vol. 2017, No. 08, FR
ISSN:1990-9772
DOI:10.21437/Interspeech.2017-1385
Publisher:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/vesely_interspeech2017_IS171385.pdf [PDF]
Keywords
semi-supervised training, DNN, word selection, granularity of confidences
Annotation
The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).
Abstract
Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.
BibTeX:
@INPROCEEDINGS{
   author = {Karel Vesel{\'{y}} and Luk{\'{a}}{\v{s}} Burget and Jan
	{\v{C}}ernock{\'{y}}},
   title = {Semi-supervised DNN training with word selection for ASR},
   pages = {3687--3691},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = {2017},
   number = {08},
   year = {2017},
   location = {Stockholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-1385},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en?id=11584}
}

Your IPv4 address: 54.81.196.35
Switch to IPv6 connection

DNSSEC [dnssec]