PhD. Theses
Extensions to Probabilistic Linear Discriminant Analysis for Speaker RecognitionDissertation:  2014 

Student:  Plchot Oldřich, Ing. 

Supervisor:  Burget Lukáš, doc. Ing., Ph.D. 
Department:  Department of Computer Graphics and Multimedia FIT BUT 
Status:  defended 

Date:  20141217 

Files:  
Keywords 
Speaker Recognition, Gaussian Mixture Model, Subspace Modeling, ivector, Probabilistic Linear Discriminant Analysis, Discriminative Training  Abstract


This thesis deals with probabilistic models for automatic speaker verification. In particular, the Probabilistic Linear Discriminant Analysis (PLDA) model, which models ivector representation of speech utterances, is analyzed in detail. The thesis proposes extensions to the standard stateoftheart PLDA model. The newly proposed Full Posterior Distribution PLDA models the uncertainty associated with the ivectorgeneration process. A new discriminative approach to training the speaker verification system based on the~PLDA model is also proposed.When comparing the original PLDA with the model extended by considering the ivector uncertainty, results obtained with the extended model show up to 20% relative improvement on tests with short segments of speech. As the test segments get longer (more than one minute), the performance gain of the extended model is lower, but it is never worse than the baseline. Training data are, however, usually available in the form of segments which are sufficiently long and therefore, in such cases, there is no gain from using the extended model for training. Instead, the training can be performed with the original PLDA model and the extended model can be used if the task is to test on the short segments.The discriminative classifier is based on classifying pairs of ivectors into two classes representing target and nontarget trials. The functional form for obtaining the score for every ivector pair is derived from the PLDA model and training is based on the logistic regression minimizing the crossentropy error function between the correct labeling of all trials and the probabilistic labeling proposed by the system. The results obtained with discriminatively trained system are similar to those obtained with generative baseline, but the discriminative approach shows the ability to output better calibrated scores. This property leads to a better actual verification performance on an unseen evaluation set, which is an important feature for real use scenarios.  ISO 690 Citation 

PLCHOT, Oldřich. Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition. Brno, 2014. Available from: http://www.fit.vutbr.cz/study/DP/PD.php?id=347. PhD. Thesis. Brno University of Technology, Faculty of Information Technology. 20141217. Supervisor Burget Lukáš. 
BibTeX 

@PHDTHESIS{
author = {Old{\v{r}}ich Plchot},
title = {Extensions to Probabilistic Linear Discriminant
Analysis for Speaker Recognition},
school = {Brno University of Technology,
Faculty of Information Technology},
year = {2014},
location = {Brno, CZ},
url = {http://www.fit.vutbr.cz/study/DP/PD.php?id=347}
}

