Conference paperLOZANO-DIEZ Alicia, SILNOVA Anna, MATĚJKA Pavel, GLEMBEK Ondřej, PLCHOT Oldřich, PEŠÁN Jan, BURGET Lukáš and GONZALEZ-RODRIGUEZ Joaquin. Analysis and Optimization of Bottleneck Features for Speaker Recognition. In: Proceedings of Odyssey 2016. Bilbao: International Speech Communication Association, 2016, pp. 352-357. ISSN 2312-2846. Available from: http://www.odyssey2016.org/papers/pdfs_stamped/54.pdf | Publication language: | english |
---|
Original title: | Analysis and Optimization of Bottleneck Features for Speaker Recognition |
---|
Title (cs): | Analýza a optimalizace bottle-neck parametrů pro rozpoznávání mluvčího |
---|
Pages: | 352-357 |
---|
Proceedings: | Proceedings of Odyssey 2016 |
---|
Conference: | Odyssey 2016 |
---|
Place: | Bilbao, ES |
---|
Year: | 2016 |
---|
URL: | http://www.odyssey2016.org/papers/pdfs_stamped/54.pdf |
---|
Journal: | Proceedings of Odyssey: The Speaker and Language Recognition Workshop, Vol. 2016, No. 06, 4 Rue des Fauvettes - Lous Tourils, F-66390 BAIXAS, FR |
---|
ISSN: | 2312-2846 |
---|
DOI: | 10.21437/Odyssey.2016-51 |
---|
Publisher: | International Speech Communication Association |
---|
URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2016/lozano-diez_odyssey2016_stamped_54.pdf [PDF] |
---|
Keywords |
---|
bottleneck features, speaker recognition |
Annotation |
---|
In this work, we studied whether not fully optimized networks
trained for ASR could provide better bottleneck features for
speaker recognition. Then, we analyzed the influence of different
aspects (input features, short-term mean and variance normalization,
"under-trained" DNNs) when training DNNs to optimize
the performance of speaker recognition systems based on
bottleneck features.We evaluated the performance of the resulting
bottleneck features in the NIST SRE10, condition 5, female
task. |
Abstract |
---|
Recently, Deep Neural Network (DNN) based bottleneck features
proved to be very effective in i-vector based speaker recognition.
However, the bottleneck feature extraction is usually
fully optimized for speech rather than speaker recognition task.
In this paper, we explore whether DNNs suboptimal for speech
recognition can provide better bottleneck features for speaker
recognition. We experiment with different features optimized
for speech or speaker recognition as input to the DNN. We also
experiment with under-trained DNN, where the training was interrupted
before the full convergence of the speech recognition
objective. Moreover, we analyze the effect of normalizing the
features at the input and/or at the output of bottleneck features
extraction to see how it affects the final speaker recognition system
performance. We evaluated the systems in the SRE10,
condition 5, female task. Results show that the best configuration
of the DNN in terms of phone accuracy does not necessary
imply better performance of the final speaker recognition
system. Finally, we compare the performance of bottleneck features
and the standard MFCC features in i-vector/PLDA speaker
recognition system. The best bottleneck features yield up to
37% of relative improvement in terms of EER. |
BibTeX: |
---|
@INPROCEEDINGS{
author = {Alicia Lozano-Diez and Anna Silnova and Pavel
Mat{\v{e}}jka and Ond{\v{r}}ej Glembek and
Old{\v{r}}ich Plchot and Jan Pe{\v{s}}{\'{a}}n and
Luk{\'{a}}{\v{s}} Burget and Joaquin
Gonzalez-Rodriguez},
title = {Analysis and Optimization of Bottleneck Features
for Speaker Recognition},
pages = {352--357},
booktitle = {Proceedings of Odyssey 2016},
journal = {Proceedings of Odyssey: The Speaker and Language Recognition
Workshop},
volume = {2016},
number = {06},
year = {2016},
location = {Bilbao, ES},
publisher = {International Speech Communication Association},
ISSN = {2312-2846},
doi = {10.21437/Odyssey.2016-51},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11219}
} |
|