Conference paper

VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from:
Publication language:english
Original title:Semi-supervised DNN training with word selection for ASR
Title (cs):Částečně kontrolované trénování DNN s výběrem slov pro ASR
Proceedings:Proceedings of Interspeech 2017
Conference:Interspeech 2017
Place:Stockholm, SE
Journal:Proceedings of Interspeech, Vol. 2017, No. 08, FR
Publisher:International Speech Communication Association
semi-supervised training, DNN, word selection, granularity of confidences
The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).
Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.
   author = {Karel Vesel{\'{y}} and Luk{\'{a}}{\v{s}} Burget
	and Jan {\v{C}}ernock{\'{y}}},
   title = {Semi-supervised DNN training with word selection
	for ASR},
   pages = {3687--3691},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = 2017,
 number = 08,
   year = 2017,
   location = {Stockholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-1385},
   language = {english},
   url = {}

Your IPv4 address: