RNNLM - nbest rescoring in Kaldi


by Stefan Kombrink, 2011

KALDI is a new all-purpose speech tool kit developed by volunteers under the leadership of Daniel Povey (Microsoft) and being made available under the Apache license. It aims at researchers to share their experimental setups and helps that by providing a unified set of tools to do so. Recently rescoring using RNN language models was implemented.

For using RNN rescoring in KALDI there exists a tool named rnn-rescore-kaldi within the KALDI toolkit. The advantage is that all other tools within KALDI can be used easily (e.g. for decoding, lattice generation, ...) and file formats are shared. This comes at the price of having several requirements, but luckily the installation scripts of KALDI should take care of that mostly themselves.

Kaldi installation (Linux)

# if you have kaldi already installed, run just
svn up
make clean
make depend
# and skip the rest of this section

# To install kaldi from scratch do the following:
# create a dir to put Kaldi+RNN experiment

#install subversion on your Linux machine, then execute
svn co https://kaldi.svn.sourceforge.net/svnroot/kaldi/trunk kaldi-trunk

# compile it
cd kaldi-trunk/tools

cd ../src

# if configure failed then it could probably not find libatlas. make sure it libs are in the build/install/lib dir!
# if not, it helps to run the following:
cd ATLAS/build
rm -rf install
make install DESTDIR=`pwd`/install
cd ../..

# then, edit kaldi.mk, maybe remove -g and replace -O0 by -O2 since it would run really slowly otherwise!
make depend

WSJ (EVAL92) experiment

Please note, that this is another experimental setup with completely different results than what was reported in previous papers! So, don't compare it with numbers mentioned there!
The demo script first downloads this archive containing an kaldi RNN model and reference transcripts for scoring (thanks to LDC for permissions to distribute the transcriptions of the evaluation set!). Then, it extracts n-bests from the given lattices, and rescores them. It combines the old LM scores with the RNN LM scores using several interpolation weights and extracts a new one-best. The WER of the new one-best is then computed using the evaluation transcripts.

cd rnn
# there's the rnn-rescore-kaldi tool which lets you rescore lattices with a RNN model
# there's also an example script running n-best list rescoring. Look at it, then execute

You can modify the script to extract 100 or 1000 best and change interpolation parameters, the results for L=0.5
baseline: 12.2 - Kneser-Ney smoothed 5-gram
10-best: 10.8
100-best: 10.3
1000-best: 10.2
10000-best: 10.2