Doc. Ing. Lukáš Burget, Ph.D.
GLEMBEK Ondřej, BURGET Lukáš, KENNY Patrick, KARAFIÁT Martin and MATĚJKA Pavel. Simplification and optimization of IVector Extraction. In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011. Praha: IEEE Signal Processing Society, 2011, pp. 45164519. ISBN 9781457705373.  Publication language:  english 

Original title:  Simplification and optimization of IVector Extraction 

Title (cs):  Zjednodušení a optimalisace extrakce ivektorů 

Pages:  45164519 

Proceedings:  Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 

Conference:  International Conference on Acoustics, Speech and Signal Processing 2011 

Place:  Praha, CZ 

Year:  2011 

ISBN:  9781457705373 

Publisher:  IEEE Signal Processing Society 

URL:  http://www.fit.vutbr.cz/research/groups/speech/publi/2011/glembek_icassp2011_4516.pdf [PDF] 

Keywords 

speaker recognition, ivectors, Joint Factor Analysis, PCA, HLDA 
Annotation 

We managed to reduce the memory requirements and processing time for the ivector extractor training so that higher dimensions can be now used while retaining the recognition accuracy. As for ivector extraction, we managed to reduce the complexity of the algorithm with sacrificing little recognition accuracy, which makes this technique usable in smallscale devices. 
Abstract 

This paper introduces some simplifications to the ivector speaker recognition systems. Ivector extraction as well as training of the ivector extractor can be an expensive task both in terms of memory and speed. Under certain assumptions, the formulas for ivector extractionalso used in ivector extractor trainingcan be simplified and lead to a faster and memory more efficient code. The first assumption is that the GMM component alignment is constant across utterances and is given by the UBM GMM weights. The second assumption is that the ivector extractor matrix can be linearly transformed so that its perGaussian components are orthogonal. We use PCA and HLDA to estimate this transform. 
BibTeX: 

@INPROCEEDINGS{
author = {Ond{\v{r}}ej Glembek and Luk{\'{a}}{\v{s}} Burget and
Patrick Kenny and Martin Karafi{\'{a}}t and Pavel
Mat{\v{e}}jka},
title = {Simplification and optimization of IVector Extraction},
pages = {45164519},
booktitle = {Proceedings of the 2011 IEEE International Conference on
Acoustics, Speech, and Signal Processing, ICASSP 2011},
year = {2011},
location = {Praha, CZ},
publisher = {IEEE Signal Processing Society},
ISBN = {9781457705373},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=9655}
} 
