Conference paper

RATH Shakti P., BURGET Lukáš, KARAFIÁT Martin, GLEMBEK Ondřej and ČERNOCKÝ Jan. A Region-specific Feature-space Transformation for Speaker Adaptation and Singularity Analysis of Jacobian Matrix. In: Proceedings of Interspeeech 2013. Lyon: International Speech Communication Association, 2013, pp. 1228-1232. ISBN 978-1-62993-443-3. ISSN 2308-457X.
Publication language:english
Original title:A Region-specific Feature-space Transformation for Speaker Adaptation and Singularity Analysis of Jacobian Matrix
Title (cs):Regionálně specifická transformace příznakového prostoru pro adaptaci mluvčího a analýza singularity Jakobiánu
Pages:1228-1232
Proceedings:Proceedings of Interspeeech 2013
Conference:Interspeech 2013
Place:Lyon, FR
Year:2013
ISBN:978-1-62993-443-3
Journal:Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., No. 8, Lyon, FR
ISSN:2308-457X
Publisher:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2013/rath_interspeech2013_IS130146.pdf [PDF]
Keywords
speaker recognition, speaker adaptation, feature-space transformation, speech recognition

Annotation
This paper describes the difficulties associated with soft R-FMLLR. By analyzing the Jacobian matrix, it was concluded that the transformation is most likely to be non-invertible and in this case ML estimation adversely affects the performance. A new transformation, hard R-FMLLR, is presented. It is shown that the performance of the proposed method is better than soft R-FMLLR and it is computationally more efficient.
Abstract
In this paper, we present an in-depth analysis of a recently proposed method for speaker adaptation. The method involves a region-specific feature-space transformation, which we refer to as soft R-FMLLR. We argue that the method has certain difficulties, the most significant being the fact that it is noninvertible. An analysis that pertains to the singularity of the Jacobian matrix is presented, from which we note that the matrix becomes near-singular at certain points in the feature space. It indicates that the transformation is non-invertible. We observe that under this case maximum likelihood estimation adversely affects the speech recognition performance. Moreover, sufficient statistics do not exist that makes the estimation procedure computationally very expensive. The concerns outlined above render the method to be unattractive. We propose a simple yet important modification, hard R-FMLLR, and show that the associated Jacobian matrix is assured to be full-rank, and it is computationally efficient. On a large vocabulary continuous speech recognition task the performance of the proposed method is shown to be better than soft R-FMLLR. Further, it is comparable to the widely used CMLLR with regression classes, especially when higher number of transforms are used.
BibTeX:
@INPROCEEDINGS{
   author = {P. Shakti Rath and Luk{\'{a}}{\v{s}} Burget and Martin
	Karafi{\'{a}}t and Ond{\v{r}}ej Glembek and Jan
	{\v{C}}ernock{\'{y}}},
   title = {A Region-specific Feature-space Transformation for Speaker
	Adaptation and Singularity Analysis of Jacobian Matrix},
   pages = {1228--1232},
   booktitle = {Proceedings of Interspeeech 2013},
   journal = {Proceedings of the 14th Annual Conference of the
	International Speech Communication Association (Interspeech
	2013).},
   number = {8},
   year = {2013},
   location = {Lyon, FR},
   publisher = {International Speech Communication Association},
   ISBN = {978-1-62993-443-3},
   ISSN = {2308-457X},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=10431}
}

Your IPv4 address: 54.224.68.56
Switch to IPv6 connection

DNSSEC [dnssec]