Publication Details

Sequence-discriminative Training of Deep Neural Networks

VESELÝ Karel, GHOSHAL Arnab, BURGET Lukáš and POVEY Daniel. Sequence-discriminative Training of Deep Neural Networks. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 2345-2349. ISBN 978-1-62993-443-3. ISSN 2308-457X.
Czech title
Sekvenční diskriminativní trénování hlubokých neuronových sítí
Type
conference paper
Language
english
Authors
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Ghoshal Arnab (UEDIN)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Povey Daniel (JHU)
URL
Keywords

speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research

Abstract

This article presents experiments with DNN-HMM hybrid systems trained using frame-based cross-entropy and different sequence-discriminative criteria on the 300 hour Switchboard conversational telephone speech task.

Annotation

Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.

Published
2013
Pages
2345-2349
Journal
Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., no. 8, ISSN 2308-457X
Proceedings
Proceedings of Interspeech 2013
Conference
Interspeech Conference, Lyon, FR
ISBN
978-1-62993-443-3
Publisher
International Speech Communication Association
Place
Lyon, FR
BibTeX
@INPROCEEDINGS{FITPUB10422,
   author = "Karel Vesel\'{y} and Arnab Ghoshal and Luk\'{a}\v{s} Burget and Daniel Povey",
   title = "Sequence-discriminative Training of Deep Neural Networks",
   pages = "2345--2349",
   booktitle = "Proceedings of Interspeech 2013",
   journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).",
   number = 8,
   year = 2013,
   location = "Lyon, FR",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-62993-443-3",
   ISSN = "2308-457X",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10422"
}
Back to top