Článek ve sborníku konference | |
| Deoras, A., Mikolov, T., Kombrink, S., Karafiát, M., Khudanpur, S.: Variational Approximation of Long-span Language Models for LVCSR, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Praha, CZ, IEEESP, 2011, s. 5532-5535, ISBN 978-1-4577-0537-3 | | Jazyk publikace: | angličtina |
|---|
| Název publikace: | Variational Approximation of Long-span Language Models for LVCSR |
|---|
| Název (cs): | Variační aproximace jazykových modelů s dlouhým kontextem pro LVCSR |
|---|
| Strany: | 5532-5535 |
|---|
| Sborník: | Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 |
|---|
| Konference: | International Conference on Acoustics, Speech and Signal Processing 2011 |
|---|
| Místo vydání: | Praha, CZ |
|---|
| Rok: | 2011 |
|---|
| ISBN: | 978-1-4577-0537-3 |
|---|
| Vydavatel: | IEEE Signal Processing Society |
|---|
| URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2011/deoras_icassp2011_5532.pdf [PDF] |
|---|
| Klíčová slova |
|---|
| Recurrent Neural Network, Language Model, Variational Inference |
| Anotace |
|---|
| Autoři publikace presentují variační aproximaci jazykových modelů s dlouhým kontextem pro LVCSR. |
| Abstrakt |
|---|
| Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentencehypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims. |
| BibTeX: |
|---|
@INPROCEEDINGS{
author = {Anoop Deoras and Tomáš Mikolov and Stefan Kombrink and
Martin Karafiát and Sanjeev Khudanpur},
title = {Variational Approximation of Long-span Language Models for
LVCSR},
pages = {5532--5535},
booktitle = {Proceedings of the 2011 IEEE International Conference on
Acoustics, Speech, and Signal Processing, ICASSP 2011},
year = {2011},
location = {Praha, CZ},
publisher = {IEEE Signal Processing Society},
ISBN = {978-1-4577-0537-3},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=9659}
} |
|