PhD. Theses

Year: Supervisor: 
Student:  Title: 
Keywords: 

Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition

Dissertation:2014
Student:Plchot Oldřich, Ing.
Supervisor:Burget Lukáš, doc. Ing., Ph.D.
Department:Department of Computer Graphics and Multimedia FIT BUT
Status:defended
Date:2014-12-17
Files:
Keywords
Speaker Recognition, Gaussian Mixture Model, Subspace Modeling, i--vector, Probabilistic Linear Discriminant Analysis, Discriminative Training
Abstract
This thesis deals with probabilistic models for automatic speaker verification. In particular, the Probabilistic Linear Discriminant Analysis (PLDA) model, which models i--vector representation of speech utterances, is analyzed in detail. The thesis proposes extensions to the standard state-of-the-art PLDA model. The newly proposed Full Posterior Distribution PLDA  models the uncertainty associated with the i--vectorgeneration process. A new discriminative approach to training the speaker verification system based on the~PLDA model is also proposed.When comparing the original PLDA with the model extended by considering the i--vector uncertainty, results obtained with the extended model show up to 20% relative improvement on tests with short segments of speech. As the test segments get longer (more than one minute), the performance gain of the extended model is lower, but it is never worse than the baseline. Training data are, however, usually  available in the form of segments which are sufficiently long and therefore, in such cases, there is no gain from using the extended model  for training. Instead, the training can be performed with the original PLDA model and the extended model can be used if the task is to test on the short segments.The discriminative classifier is based on classifying pairs of i--vectors into two classes representing target and non-target trials. The functional form for obtaining the score for every i--vector pair is derived from the  PLDA model and training is based on the logistic regression minimizing  the cross-entropy error function  between the correct labeling of all trials and the probabilistic labeling proposed by the system. The results obtained with discriminatively trained system are similar to those obtained with generative baseline, but the discriminative approach shows the ability to output better calibrated scores. This property leads to a  better actual verification performance on an unseen evaluation set, which is an important feature for real use scenarios.
ISO 690 Citation
PLCHOT, Oldřich. Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition. Brno, 2014. Available from: http://www.fit.vutbr.cz/study/DP/PD.php?id=347. PhD. Thesis. Brno University of Technology, Faculty of Information Technology. 2014-12-17. Supervisor Burget Lukáš.
BibTeX
@PHDTHESIS{
    author = {Old{\v{r}}ich Plchot},
    title = {Extensions to Probabilistic Linear Discriminant
	Analysis for Speaker Recognition},
    school = {Brno University of Technology,
		Faculty of Information Technology},
    year = {2014},
    location = {Brno, CZ},
    url = {http://www.fit.vutbr.cz/study/DP/PD.php?id=347}
}

Your IPv4 address: 54.224.68.56
Switch to IPv6 connection

DNSSEC [dnssec]