Publication Details

Analysis and Optimization of Bottleneck Features for Speaker Recognition

LOZANO Díez Alicia, SILNOVA Anna, MATĚJKA Pavel, GLEMBEK Ondřej, PLCHOT Oldřich, PEŠÁN Jan, BURGET Lukáš and GONZALEZ-RODRIGUEZ Joaquin. Analysis and Optimization of Bottleneck Features for Speaker Recognition. In: Proceedings of Odyssey 2016. Bilbao: International Speech Communication Association, 2016, pp. 352-357. ISSN 2312-2846. Available from: http://www.odyssey2016.org/papers/pdfs_stamped/54.pdf

Czech title

Analýza a optimalizace bottle-neck parametrů pro rozpoznávání mluvčího

Type

conference paper

Language

english

Authors

Lozano Díez Alicia (UAM)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Pešán Jan, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Gonzalez-Rodriguez Joaquin (UAM)

URL

Keywords

bottleneck features, speaker recognition

Abstract

Recently, Deep Neural Network (DNN) based bottleneck features proved to be very effective in i-vector based speaker recognition. However, the bottleneck feature extraction is usually fully optimized for speech rather than speaker recognition task. In this paper, we explore whether DNNs suboptimal for speech recognition can provide better bottleneck features for speaker recognition. We experiment with different features optimized for speech or speaker recognition as input to the DNN. We also experiment with under-trained DNN, where the training was interrupted before the full convergence of the speech recognition objective. Moreover, we analyze the effect of normalizing the features at the input and/or at the output of bottleneck features extraction to see how it affects the final speaker recognition system performance. We evaluated the systems in the SRE10, condition 5, female task. Results show that the best configuration of the DNN in terms of phone accuracy does not necessary imply better performance of the final speaker recognition system. Finally, we compare the performance of bottleneck features and the standard MFCC features in i-vector/PLDA speaker recognition system. The best bottleneck features yield up to 37% of relative improvement in terms of EER.

Annotation

In this work, we studied whether not fully optimized networks trained for ASR could provide better bottleneck features for speaker recognition. Then, we analyzed the influence of different aspects (input features, short-term mean and variance normalization, "under-trained" DNNs) when training DNNs to optimize the performance of speaker recognition systems based on bottleneck features.We evaluated the performance of the resulting bottleneck features in the NIST SRE10, condition 5, female task.

Published

2016

Pages

352-357

Journal

Proceedings of Odyssey: The Speaker and Language Recognition Workshop, vol. 2016, no. 6, ISSN 2312-2846

Proceedings

Proceedings of Odyssey 2016

Conference

Odyssey 2016, Bilbao, ES

Publisher

International Speech Communication Association

Place

Bilbao, ES

DOI

10.21437/Odyssey.2016-51

EID Scopus

2-s2.0-85073255478

BibTeX

@INPROCEEDINGS{FITPUB11219,
   author = "Alicia D\'{i}ez Lozano and Anna Silnova and Pavel Mat\v{e}jka and Ond\v{r}ej Glembek and Old\v{r}ich Plchot and Jan Pe\v{s}\'{a}n and Luk\'{a}\v{s} Burget and Joaquin Gonzalez-Rodriguez",
   title = "Analysis and Optimization of Bottleneck Features for Speaker Recognition",
   pages = "352--357",
   booktitle = "Proceedings of Odyssey 2016",
   journal = "Proceedings of Odyssey: The Speaker and Language Recognition Workshop",
   volume = 2016,
   number = 06,
   year = 2016,
   location = "Bilbao, ES",
   publisher = "International Speech Communication Association",
   ISSN = "2312-2846",
   doi = "10.21437/Odyssey.2016-51",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11219"
}