Conference paper

VESELÝ Karel, GHOSHAL Arnab, BURGET Lukáš and POVEY Daniel. Sequence-discriminative Training of Deep Neural Networks. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 2345-2349. ISBN 978-1-62993-443-3. ISSN 2308-457X.
Publication language:english
Original title:Sequence-discriminative Training of Deep Neural Networks
Title (cs):Sekvenční diskriminativní trénování hlubokých neuronových sítí
Pages:2345-2349
Proceedings:Proceedings of Interspeech 2013
Conference:Interspeech 2013
Place:Lyon, FR
Year:2013
ISBN:978-1-62993-443-3
Journal:Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., No. 8, Lyon, FR
ISSN:2308-457X
Publisher:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf [PDF]
Keywords
speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research
Annotation
This article presents experiments with DNN-HMM hybrid systems trained using frame-based cross-entropy and different sequence-discriminative criteria on the 300 hour Switchboard conversational telephone speech task.
Abstract
Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.
BibTeX:
@INPROCEEDINGS{
   author = {Karel Vesel{\'{y}} and Arnab Ghoshal and Luk{\'{a}}{\v{s}}
	Burget and Daniel Povey},
   title = {Sequence-discriminative Training of Deep Neural Networks},
   pages = {2345--2349},
   booktitle = {Proceedings of Interspeech 2013},
   journal = {Proceedings of the 14th Annual Conference of the
	International Speech Communication Association (Interspeech
	2013).},
   number = {8},
   year = {2013},
   location = {Lyon, FR},
   publisher = {International Speech Communication Association},
   ISBN = {978-1-62993-443-3},
   ISSN = {2308-457X},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=10422}
}

Your IPv4 address: 54.225.47.94
Switch to IPv6 connection

DNSSEC [dnssec]