Conference paper

MOTLÍČEK Petr, POVEY Daniel and KARAFIÁT Martin. Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task. In: Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013, pp. 7604-7608. ISBN 978-1-4799-0355-9.
Publication language:english
Original title:Feature And Score Level Combination Of Subspace Gaussians In LVCSR Task
Title (cs):Kombinace výstupů Gaussovek v podprostorech na úrovni parametrů a skóre v LVCSR úloze
Pages:7604-7608
Proceedings:Proceedings of ICASSP 2013
Conference:38th International Conference on Acoustics, Speech, and Signal Processing
Place:Vancouver, CA
Year:2013
ISBN:978-1-4799-0355-9
Publisher:IEEE Signal Processing Society
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2013/motlicek_icassp2013_0007604.pdf [PDF]
Keywords
Automatic Speech Recognition, Discriminative features, System combination
Annotation
We have demonstrated that the SGMM framework is an efficient approach in the LVCSR task. Overall evaluations of SGMMs exploiting powerful but complex PLP-BN features yield similar results as those obtained by conventional HMM/GMMs. Nevertheless, the total number of SGMM parameters is about 3 times less than in the HMM/GMM framework. Evaluation results also indicate different properties of the examined acoustic modeling techniques. Although SGMMs consistently outperform HMM/GMMs when built over individual features, HMM/GMMs can benefit much more from the feature-level combination than SGMMs. Nevertheless based on an analysis measuring complementarity of individual recognition systems, we show that SGMM-based recognizers produce heterogeneous outputs (scores) and thus subsequent score-level combination can bring additional improvement.
Abstract
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMMand HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.
BibTeX:
@INPROCEEDINGS{
   author = {Petr Motl{\'{i}}{\v{c}}ek and Daniel Povey and Martin
	Karafi{\'{a}}t},
   title = {Feature And Score Level Combination Of Subspace Gaussians In
	LVCSR Task},
   pages = {7604--7608},
   booktitle = {Proceedings of ICASSP 2013},
   year = {2013},
   location = {Vancouver, CA},
   publisher = {IEEE Signal Processing Society},
   ISBN = {978-1-4799-0355-9},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=10376}
}

Your IPv4 address: 54.198.0.187
Switch to IPv6 connection

DNSSEC [dnssec]