Publication Details

On the use of i-vector posterior distributions in Probabilistic Linear Discriminant Analysis

CUMANI Sandro, LAFACE Pietro and PLCHOT Oldřich. On the use of i-vector posterior distributions in Probabilistic Linear Discriminant Analysis. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 22, no. 4, 2014, pp. 846-857. ISSN 2329-9290. Available from: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06748853&tag=1
Czech title
Využití posteriorních rozložení i-vektorů v pravděpodobnostní lineární diskriminativní analýze
Type
journal article
Language
english
Authors
Cumani Sandro (POLITO)
Laface Pietro, prof. (POLITO)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

I-vector extraction, I-vectors, probabilistic linear discriminant analysis, speaker recognition

Abstract

A PLDA model which exploits the uncertainty of the i-vector extraction process has been presented. We derived the formulation of the likelihood for a Gaussian PLDA model based on the i-vector posterior distribution, and illustrated a new PLDA model, where the inter-speaker variability is assumed to have an segment-dependent distribution, showing that we can rely on the standard PLDA framework simply replacing the likelihood definition.

Annotation

The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, the channel mismatch between the training conditions and the test data, and the duration of the analyzed speech segment. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance. This paper presents a new PLDA model that, unlike the standard one, exploits the intrinsic i-vector uncertainty. Since the recognition accuracy is known to decrease for short speech segments, and their length is one of the main factors affecting the i-vector covariance, we designed a set of experiments aiming at comparing the standard and the new PLDA models on short speech cuts of variable duration, randomly extracted from the conversations included in the NIST SRE 2010 extended dataset, both from interviews and telephone conversations. Our results on NIST SRE 2010 evaluation data show that in different conditions the new model outperforms the standard PLDA by more than 10% relative when tested on short segments with duration mismatches, and is able to keep the accuracy of the standard model for long enough speaker segments. This technique has also been successfully tested in the NIST SRE 2012 evaluation.

Published
2014
Pages
846-857
Journal
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 22, no. 4, ISSN 2329-9290
Publisher
IEEE Signal Processing Society
DOI
UT WoS
000333330900008
EID Scopus
BibTeX
@ARTICLE{FITPUB10636,
   author = "Sandro Cumani and Pietro Laface and Old\v{r}ich Plchot",
   title = "On the use of i-vector posterior distributions in Probabilistic Linear Discriminant Analysis",
   pages = "846--857",
   journal = "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING",
   volume = 22,
   number = 4,
   year = 2014,
   ISSN = "2329-9290",
   doi = "10.1109/TASLP.2014.2308473",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10636"
}
Back to top