Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge

Czech title

Analýza systémů pro ověřování mluvčího v realistických podmínkách SITW 2016 Challenge

Type

conference paper

Language

english

Authors

Novotný Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

speaker recognition, SRE systems, diarization

Abstract

In this paper, we summarize our efforts for the Speakers In The Wild (SITW) challenge, and we present our findings with this new dataset for speaker recognition. Apart from the standard comparison of different SRE systems, we analyze the use of diarization for dealing with audio segments containing multiple speakers, as in part of the newly introduced enrollment and test protocols, diarization is a necessary system component. Our state-of-the-art systems used in this work utilize both cepstral and DNN-based bottleneck features and are based on i-vectors followed by Probabilistic Linear Discriminant Analysis (PLDA) classifier and logistic regression calibration/fusion. We present both narrow-band (8 kHz) and wide-band (16 kHz) systems together with their fusions.

Annotation

In this paper, we summarize our efforts for the Speakers In The Wild (SITW) challenge, and we present our findings with this new dataset for speaker recognition. Apart from the standard comparison of different SRE systems, we analyze the use of diarization for dealing with audio segments containing multiple speakers, as in part of the newly introduced enrollment and test protocols, diarization is a necessary system component. Our state-of-the-art systems used in this work utilize both cepstral and DNN-based bottleneck features and are based on i-vectors followed by Probabilistic Linear Discriminant Analysis (PLDA) classifier and logistic regression calibration/fusion. We present both narrow-band (8 kHz) and wide-band (16 kHz) systems together with their fusions.

Published

2016

Pages

828-832

Proceedings

Proceedings of Interspeech 2016

Conference

Interspeech Conference, San Francisco, US

ISBN

978-1-5108-3313-5

Publisher

International Speech Communication Association

Place

San Francisco, US

DOI

10.21437/Interspeech.2016-981

UT WoS

000409394400173

EID Scopus

2-s2.0-84994201390

BibTeX

@INPROCEEDINGS{FITPUB11270,
   author = "Ond\v{r}ej Novotn\'{y} and Pavel Mat\v{e}jka and Old\v{r}ich Plchot and Ond\v{r}ej Glembek and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge",
   pages = "828--832",
   booktitle = "Proceedings of Interspeech 2016",
   year = 2016,
   location = "San Francisco, US",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-5108-3313-5",
   doi = "10.21437/Interspeech.2016-981",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11270"
}