Conference paperONDEL Lucas, BURGET Lukáš, ČERNOCKÝ Jan and KESIRAJU Santosh. Bayesian phonotactic language model for acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5750-5754. ISBN 978-1-5090-4117-6. | Publication language: | english |
---|
Original title: | Bayesian phonotactic language model for Acoustic Unit Discovery |
---|
Title (cs): | Bayesovský fonotaktický jazykový model pro automatické hledání řečových jednotek |
---|
Pages: | 5750-5754 |
---|
Proceedings: | Proceedings of ICASSP 2017 |
---|
Conference: | 42nd IEEE International Conference on Acoustics, Speech and Signal Processing |
---|
Place: | New Orleans, US |
---|
Year: | 2017 |
---|
ISBN: | 978-1-5090-4117-6 |
---|
DOI: | 10.1109/ICASSP.2017.7953258 |
---|
Publisher: | IEEE Signal Processing Society |
---|
URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2017/ondel_icassp2017_0005750.pdf [PDF] |
---|
Keywords |
---|
Bayesian non-parametric, Variational Bayes, acoustic unit discovery |
Annotation |
---|
This article is about Bayesian phonotactic language model for acoustic unit discovery (AUD), which has led to the development of a non-parametric Bayesian phone-loop model.
|
Abstract |
---|
Recent work on Acoustic Unit Discovery (AUD) has led to the
development of a non-parametric Bayesian phone-loop model
where the prior over the probability of the phone-like units is
assumed to be sampled from a Dirichlet Process (DP). In this
work, we propose to improve this model by incorporating a
Hierarchical Pitman-Yor based bigram Language Model on
top of the units transitions. This new model makes use of the
phonotactic context information but assumes a fixed number
of units. To remedy this limitation we first train a DP phoneloop
model to infer the number of units, then, the bigram
phone-loop is initialized from the DP phone-loop and trained
until convergence of its parameters. Results show an absolute
improvement of 1-2%on the Normalized Mutual Information
(NMI) metric. Furthermore, we show that, combined with
Multilingual Bottleneck (MBN) features the model yields a
same or higher NMI as an English phone recogniser trained
on TIMIT. |
BibTeX: |
---|
@INPROCEEDINGS{
author = {Lucas Ondel and Luk{\'{a}}{\v{s}} Burget and Jan
{\v{C}}ernock{\'{y}} and Santosh Kesiraju},
title = {Bayesian phonotactic language model for Acoustic
Unit Discovery},
pages = {5750--5754},
booktitle = {Proceedings of ICASSP 2017},
year = {2017},
location = {New Orleans, US},
publisher = {IEEE Signal Processing Society},
ISBN = {978-1-5090-4117-6},
doi = {10.1109/ICASSP.2017.7953258},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=11472}
} |
|