Conference paper

MCLAREN Mitchell, ABRASH Victor, GRACIARENA Martin, LEI Yun and PEŠÁN Jan. Improving Robustness to Compressed Speech in Speaker Recognition. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 3698-3702. ISBN 978-1-62993-443-3. Available from: http://www.isca-speech.org/archive/interspeech_2013/i13_3698.html
Publication language:english
Original title:Improving Robustness to Compressed Speech in Speaker Recognition
Title (cs):Zlepšení spolehlivosti rozpoznávání mluvčího na komprimované řeči
Pages:3698-3702
Proceedings:Proceedings of Interspeech 2013
Conference:Interspeech 2013
Place:Lyon, FR
Year:2013
URL:http://www.isca-speech.org/archive/interspeech_2013/i13_3698.html
ISBN:978-1-62993-443-3
Publisher:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2013/mcLaren_interspeech2013_IS131394.pdf [PDF]
Keywords
speaker recognition, speech coding, codec degradation, speaker verification.
Annotation
We analyzed the impact of codec-degraded speech on a stateof- the-art PLDA-based speaker identification system and proposed mitigation techniques.
Abstract
The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determine whether robustness to noise generalizes to audio compression. Using a speaker recognition system based on i-vectors and probabilistic linear discriminant analysis (PLDA), we compared four PLDA training scenarios. The first involves training PLDA on clean data, the second included additional noisy and reverberant speech, a third introduces transcoded data matched to the evaluation conditions and the fourth, using codec-degraded speech mismatched to the evaluation conditions. We found that robustness to compressed speech was marginally improved by exposing PLDA to noisy and reverberant speech, with little improvement using trancoded speech in PLDA based on codecs mismatched to the evaluation conditions. Noise-robust features offered a degree of robustness to compressed speech while more significant improvements occurred when PLDA had observed the codec matching the evaluation conditions. Finally, we tested i-vector fusion from the different features, which increased overall system performance but did not improve robustness to codec-degraded speech.
BibTeX:
@INPROCEEDINGS{
   author = {Mitchell McLaren and Victor Abrash and Martin Graciarena and
	Yun Lei and Jan Pe{\v{s}}{\'{a}}n},
   title = {Improving Robustness to Compressed Speech in Speaker
	Recognition},
   pages = {3698--3702},
   booktitle = {Proceedings of Interspeech 2013},
   year = {2013},
   location = {Lyon, FR},
   publisher = {International Speech Communication Association},
   ISBN = {978-1-62993-443-3},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=10630}
}

Your IPv4 address: 54.224.43.96
Switch to IPv6 connection

DNSSEC [dnssec]