Publication Details

Sequence-discriminative Training of Deep Neural Networks

VESELÝ Karel, GHOSHAL Arnab, BURGET Lukáš and POVEY Daniel. Sequence-discriminative Training of Deep Neural Networks. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 2345-2349. ISBN 978-1-62993-443-3. ISSN 2308-457X.

Czech title

Sekvenční diskriminativní trénování hlubokých neuronových sítí

Type

conference paper

Language

english

Authors

Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Ghoshal Arnab (UEDIN)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Povey Daniel (JHU)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf PDF

Keywords

speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research

Abstract

This article presents experiments with DNN-HMM hybrid systems trained using frame-based cross-entropy and different sequence-discriminative criteria on the 300 hour Switchboard conversational telephone speech task.

Annotation

Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.

Published

2013

Pages

2345-2349

Journal

Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., no. 8, ISSN 2308-457X

Proceedings

Proceedings of Interspeech 2013

Conference

Interspeech Conference, Lyon, FR

ISBN

978-1-62993-443-3

Publisher

International Speech Communication Association

Place

Lyon, FR

BibTeX

@INPROCEEDINGS{FITPUB10422,
   author = "Karel Vesel\'{y} and Arnab Ghoshal and Luk\'{a}\v{s} Burget and Daniel Povey",
   title = "Sequence-discriminative Training of Deep Neural Networks",
   pages = "2345--2349",
   booktitle = "Proceedings of Interspeech 2013",
   journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).",
   number = 8,
   year = 2013,
   location = "Lyon, FR",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-62993-443-3",
   ISSN = "2308-457X",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10422"
}