Publication Details

The Kaldi Speech Recognition Toolkit

POVEY Daniel, GHOSHAL Arnab, BOULIANNE Gilles, BURGET Lukáš, GLEMBEK Ondřej, GOEL Nagendra K., HANNEMANN Mirko, MOTLÍČEK Petr, QIAN Yanmin, SCHWARZ Petr, SILOVSKÝ Jan, STEMMER Georg and VESELÝ Karel. The Kaldi Speech Recognition Toolkit. In: Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011, pp. 1-4. ISBN 978-1-4673-0366-8.
Czech title
KALDI Toolkit pro rozpoznávání řeči
Type
conference paper
Language
english
Authors
Povey Daniel (JHU)
Ghoshal Arnab (UEDIN)
Boulianne Gilles (CRIM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Goel Nagendra K. (GOVIVACE)
Hannemann Mirko, Dipl.-Ing. (DCGM FIT BUT)
Motlíček Petr, Ing., Ph.D. (IDIAP)
Qian Yanmin (SJTU)
Schwarz Petr, Ing., Ph.D. (DCGM FIT BUT)
Silovský Jan (TUL)
Stemmer Georg (SVOX)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

speech recognition, toolkit

Abstract

We described the design of Kaldi, a free and open-source speech recognition toolkit. The toolkit currently supports modelling of context-dependent phones of arbitrary context lengths, and all commonly used techniques that can be estimated using maximum likelihood. It also supports the recently proposed SGMMs. Development of Kaldi is continuing and we are working on using large language models in the FST framework, lattice generation and discriminative training.

Annotation

We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

Published
2011
Pages
1-4
Proceedings
Proceedings of ASRU 2011
Conference
IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village Resort, Big Island, Hawaii, US
ISBN
978-1-4673-0366-8
Publisher
IEEE Signal Processing Society
Place
Hilton Waikoloa Village Resort, Hawaii, US
BibTeX
@INPROCEEDINGS{FITPUB11196,
   author = "Daniel Povey and Arnab Ghoshal and Gilles Boulianne and Luk\'{a}\v{s} Burget and Ond\v{r}ej Glembek and K. Nagendra Goel and Mirko Hannemann and Petr Motl\'{i}\v{c}ek and Yanmin Qian and Petr Schwarz and Jan Silovsk\'{y} and Georg Stemmer and Karel Vesel\'{y}",
   title = "The Kaldi Speech Recognition Toolkit",
   pages = "1--4",
   booktitle = "Proceedings of ASRU 2011",
   year = 2011,
   location = "Hilton Waikoloa Village Resort, Hawaii, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4673-0366-8",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11196"
}
Back to top