Conference paperNOVOTNÝ Ondřej, PLCHOT Oldřich, MATĚJKA Pavel, MOŠNER Ladislav and GLEMBEK Ondřej. On the use of X-vectors for Robust Speaker Recognition. In: Proceedings of Odyssey 2018. Les Sables d´Olonne: International Speech Communication Association, 2018, pp. 168-175. ISSN 2312-2846. | Publication language: | english |
---|
Original title: | On the use of X-vectors for Robust Speaker Recognition |
---|
Title (cs): | K použití x-vektorů pro robustní rozpoznávání mluvčího |
---|
Pages: | 168-175 |
---|
Proceedings: | Proceedings of Odyssey 2018 |
---|
Conference: | Odyssey 2018 |
---|
Place: | Les Sables d´Olonne, FR |
---|
Year: | 2018 |
---|
Journal: | Proceedings of Odyssey: The Speaker and Language Recognition Workshop, Vol. 2018, No. 6, 4 Rue des Fauvettes - Lous Tourils, F-66390 BAIXAS, FR |
---|
ISSN: | 2312-2846 |
---|
DOI: | 10.21437/Odyssey.2018-24 |
---|
Publisher: | International Speech Communication Association |
---|
URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2018/novotny_odyssey2018_54.pdf [PDF] |
---|
Keywords |
---|
Speaker Recognition, Embedding, X-vectors, DNN |
Annotation |
---|
Text-independent speaker verification (SV) is currently in the
process of embracing DNN modeling in every stage of SV system.
Slowly, the DNN-based approaches such as end-to-end
modelling and systems based on DNN embeddings start to be
competitive even in challenging and diverse channel conditions
of recent NIST SREs. Domain adaptation and the need for a
large amount of training data are still a challenge for current
discriminative systems and (unlike with generative models), we
see significant gains from data augmentation, simulation and
other techniques designed to overcome lack of training data.
We present an analysis of a SV system based on DNN embeddings
(x-vectors) and focus on robustness across diverse data
domains such as standard telephone and microphone conversations,
both in clean, noisy and reverberant environments. We
also evaluate the system on challenging far-field data created
by re-transmitting a subset of NIST SRE 2008 and 2010 microphone
interviews. We compare our results with the stateof-
the-art i-vector system. In general, we were able to achieve
better performance with the DNN-based systems, but most importantly,
we have confirmed the robustness of such systems
across multiple data domains. |
BibTeX: |
---|
@INPROCEEDINGS{
author = {Ond{\v{r}}ej Novotn{\'{y}} and Old{\v{r}}ich
Plchot and Pavel Mat{\v{e}}jka and Ladislav
Mo{\v{s}}ner and Ond{\v{r}}ej Glembek},
title = {On the use of X-vectors for Robust Speaker
Recognition},
pages = {168--175},
booktitle = {Proceedings of Odyssey 2018},
journal = {Proceedings of Odyssey: The Speaker and Language Recognition
Workshop},
volume = {2018},
number = {6},
year = {2018},
location = {Les Sables dOlonne, FR},
publisher = {International Speech Communication Association},
ISSN = {2312-2846},
doi = {10.21437/Odyssey.2018-24},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11787}
} |
|