RNNLM Toolkit

by Tomas Mikolov, 2010-2012

The Project has been moved to http://rnnlm.org

Introduction

Neural network based language models are nowdays among the most successful techniques for statistical language modeling. They can be easily applied in wide range of tasks, including automatic speech recognition and machine translation, and provide significant improvements over classic backoff n-gram models. The 'rnnlm' toolkit can be used to train, evaluate and use such models.

The goal of this toolkit is to speed up research progress in the language modeling field. First, by providing useful implementation that can demonstrate some of the principles. Second, for the empirical experiments when used in speech recognition and other applications. And finally third, by providing a strong state of the art baseline results, to which future research that aims to "beat state of the art techniques" should compare to.

Download

rnnlm-0.1h - some older version of the toolkit

rnnlm-0.3e - latest version of the toolkit

Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB

Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink

Slides from my presentation at Google - pdf

RNNLM is now integrated into Kaldi toolkit! Check this.

Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!

Word projections from RNN-80 and RNN-640 models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)

Frequently asked questions

FAQ archive

Contact

Tomas Mikolov - tmikolov@gmail.com

Stefan Kombrink - kombrink@fit.vutbr.cz

Acknowledgements

We would like to thank to all who have helped us with the development of this toolkit, either by providing advices or by testing it. Specially, thanks to Anoop Deoras, Sanjeev Khudanpur, Scott Novotney, Stefan Kombrink, Dan Povey, YongZhe Shi, Geoff Zweig.

References

Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
All the details that did not make it into the papers, more results on additional taks.

Mikolov Tomáš, Sutskever Ilya, Deoras Anoop, Le Hai-Son, Kombrink Stefan, Černocký Jan: Subword Language Modeling with Neural Networks. Not published (rejected from ICASSP 2012).
Using subwords as basic units for RNNLMs has several advantages: no OOV rate, smaller model size and better speed. Just split the infrequent words into subword units.

Mikolov Tomáš, Deoras Anoop, Povey Daniel, Burget Lukáš, Černocký Jan: Strategies for Training Large Scale Neural Network Language Models, In: Proceedings of ASRU 2011
How to train RNN LM on a single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup.

Mikolov Tomáš, Kombrink Stefan, Deoras Anoop, Burget Lukáš, Černocký Jan: RNNLM - Recurrent Neural Network Language Modeling Toolkit, In: ASRU 2011 Demo Session
Brief description of the RNN LM toolkit that is available on this website.

Mikolov Tomáš, Deoras Anoop, Kombrink Stefan, Burget Lukáš, Černocký Jan: Empirical Evaluation and Combination of Advanced Language Modeling Techniques, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.

Kombrink Stefan, Mikolov Tomáš, Karafiát Martin, Burget Lukáš: Recurrent Neural Network based Language Modeling in Meeting Recognition, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Easy way how to adapt RNN LM + speedup tricks for rescoring (can be faster than 0.05 RT)

Deoras Anoop, Mikolov Tomáš, Kombrink Stefan, Karafiát Martin, Khudanpur Sanjeev: Variational Approximation of Long-span Language Models for LVCSR, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
RNN LM can be approximated by n-gram model, and used directly in the decoder at no compuational cost.

Mikolov Tomáš, Kombrink Stefan, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Extensions of Recurrent Neural Network Language Model, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
Better results by using Backpropagation throught time and better speed by using classes.

Mikolov Tomáš, Karafiát Martin, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, JP
We show that RNN LM can be trained just by simple backpropagation, despite the popular beliefs.