Článek ve sborníku konference

VESELÝ Karel, BURGET Lukáš a ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, s. 3687-3691. ISSN 1990-9772. Dostupné z: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Jazyk publikace:angličtina
Název publikace:Semi-supervised DNN training with word selection for ASR
Název (cs):Částečně kontrolované trénování DNN s výběrem slov pro ASR
Strany:3687-3691
Sborník:Proceedings of Interspeech 2017
Konference:Interspeech 2017
Místo vydání:Stockholm, SE
Rok:2017
URL:http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Časopis:Proceedings of Interspeech, roč. 2017, č. 08, FR
ISSN:1990-9772
DOI:10.21437/Interspeech.2017-1385
Vydavatel:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/vesely_interspeech2017_IS171385.pdf [PDF]
Klíčová slova
semi-supervised training, DNN, word selection, granularity of confidences
Anotace
Článek pojednává o částečně kontrolovaném trénování DNN s výběrem slov pro automatizované rozpoznávání řečníka (ASR).
Abstrakt
Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.
BibTeX:
@INPROCEEDINGS{
   author = {Karel Vesel{\'{y}} and Luk{\'{a}}{\v{s}} Burget
	and Jan {\v{C}}ernock{\'{y}}},
   title = {Semi-supervised DNN training with word selection
	for ASR},
   pages = {3687--3691},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = {2017},
   number = {08},
   year = {2017},
   location = {Stockholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-1385},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.cs.iso-8859-2?id=11584}
}

Vaše IPv4 adresa: 107.20.10.203
Přepnout na https

DNSSEC [dnssec]