Thesis Details

Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition

Ph.D. Thesis Student: Plchot Oldřich Academic Year: 2014/2015 Supervisor: Burget Lukáš, doc. Ing., Ph.D.
Czech title
Rozšíření pro pravděpodobnostní lineární diskriminační analýzu v rozpoznávání mluvčího
Language
English
Abstract

This thesis deals with probabilistic models for automatic speaker verification. In particular, the Probabilistic Linear Discriminant Analysis (PLDA) model, which models i--vector representation of speech utterances, is analyzed in detail. The thesis proposes extensions to the standard state-of-the-art PLDA model. The newly proposed Full Posterior Distribution PLDA  models the uncertainty associated with the i--vectorgeneration process. A new discriminative approach to training the speaker verification system based on the~PLDA model is also proposed.

When comparing the original PLDA with the model extended by considering the i--vector uncertainty, results obtained with the extended model show up to 20% relative improvement on tests with short segments of speech. As the test segments get longer (more than one minute), the performance gain of the extended model is lower, but it is never worse than the baseline. Training data are, however, usually  available in the form of segments which are sufficiently long and therefore, in such cases, there is no gain from using the extended model  for training. Instead, the training can be performed with the original PLDA model and the extended model can be used if the task is to test on the short segments.

The discriminative classifier is based on classifying pairs of i--vectors into two classes representing target and non-target trials. The functional form for obtaining the score for every i--vector pair is derived from the  PLDA model and training is based on the logistic regression minimizing  the cross-entropy error function  between the correct labeling of all trials and the probabilistic labeling proposed by the system. The results obtained with discriminatively trained system are similar to those obtained with generative baseline, but the discriminative approach shows the ability to output better calibrated scores. This property leads to a  better actual verification performance on an unseen evaluation set, which is an important feature for real use scenarios.

Keywords

Speaker Recognition, Gaussian Mixture Model, Subspace Modeling, i--vector, Probabilistic Linear Discriminant Analysis, Discriminative Training

Department
Degree Programme
Computer Science and Engineering, Field of Study Computer Science and Engineering
Files
Status
defended
Date
17 December 2014
Citation
PLCHOT, Oldřich. Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition. Brno, 2014. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2014-12-17. Supervised by Burget Lukáš. Available from: https://www.fit.vut.cz/study/phd-thesis/347/
BibTeX
@phdthesis{FITPT347,
    author = "Old\v{r}ich Plchot",
    type = "Ph.D. thesis",
    title = "Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2014,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/347/"
}
Back to top