Conference paper

HANNEMANN Mirko, TRMAL Jan, ONDEL Lucas, KESIRAJU Santosh and BURGET Lukáš. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 2836-2840. ISBN 978-1-5090-4117-6.
Publication language:english
Original title:Bayesian joint-sequence models for grapheme-to-phoneme conversion
Title (cs):Bayesovské modelování sdružených sekvencí pro převod grafémů na fonémy
Pages:2836-2840
Proceedings:Proceedings of ICASSP 2017
Conference:42nd IEEE International Conference on Acoustics, Speech and Signal Processing
Place:New Orleans, US
Year:2017
ISBN:978-1-5090-4117-6
Publisher:IEEE Signal Processing Society
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/hannemann_icassp2017_0002836.pdf [PDF]
Files: 
+Type Name Title Size Last modified
iconhannemann_icassp2017_0002836.pdf377 KB2017-06-09 16:03:30
^ Select all
With selected:
Keywords
Bayesian approach, joint-sequence models, weighted finite state transducers, letter-to-sound, grapheme-tophoneme conversion, hierarchical Pitman-Yor-Process
Annotation
This article is about Bayesian joint-sequence models for grapheme-to-phoneme conversion based on the joint-sequence model (JSM).
Abstract
We describe a fully Bayesian approach to grapheme-to-phoneme conversion based on the joint-sequence model (JSM). Usually, standard smoothed n-gram language models (LM, e.g. Kneser-Ney) are used with JSMs to model graphone sequences (joint graphemephoneme pairs). However, we take a Bayesian approach using a hierarchical Pitman-Yor-Process LM. This provides an elegant alternative to using smoothing techniques to avoid over-training. No held-out sets and complex parameter tuning is necessary, and several convergence problems encountered in the discounted Expectation- Maximization (as used in the smoothed JSMs) are avoided. Every step is modeled by weighted finite state transducers and implemented with standard operations from the OpenFST toolkit. We evaluate our model on a standard data set (CMUdict), where it gives comparable results to the previously reported smoothed JSMs in terms of phoneme-error rate while requiring a much smaller training/ testing time. Most importantly, our model can be used in a Bayesian framework and for (partly) un-supervised training.
BibTeX:
@INPROCEEDINGS{
   author = {Mirko Hannemann and Jan Trmal and Lucas Ondel and Santosh
	Kesiraju and Luk{\'{a}}{\v{s}} Burget},
   title = {Bayesian joint-sequence models for grapheme-to-phoneme
	conversion},
   pages = {2836--2840},
   booktitle = {Proceedings of ICASSP 2017},
   year = {2017},
   location = {New Orleans, US},
   publisher = {IEEE Signal Processing Society},
   ISBN = {978-1-5090-4117-6},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11469}
}

Your IPv4 address: 54.224.49.217
Switch to IPv6 connection

DNSSEC [dnssec]