GLEMBEK Ondřej, MATĚJKA Pavel, BURGET Lukáš, SCHWARZ Petr, PEŠÁN Jan and PLCHOT Oldřich. Voice-print transformation for migration between automatic speaker identification systems. Abstract book of the 7th European Academy of Forensic Science Conference. Praha: Criminal Police Department Prague, 2015. ISBN 978-80-260-8659-8.
Publication language:english
Original title:Voice-print transformation for migration between automatic speaker identification systems
Book:Abstract book of the 7th European Academy of Forensic Science Conference
Conference:7th European Academy of Forensic Science Conference
Place:Praha, CZ
Publisher:Criminal Police Department Prague
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2015/glembek_eafs_2015-09-10.pdf [PDF]
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2015/glembek_eafs2015_O331_abstrakt.pdf [PDF]
speaker recognition, i-vector transformation
This presentation discusses the scenario of migrating from one forensic automatic speaker identification system (FASIS) to another. In FASIS, an audio recording of a reference speaker is used to train the speaker model. This model is then compared with the model of the tested speaker and a comparison score (in the form of log-likelihood ratio) is computed. System migration is usually motivated by improving the system recognition accuracy, typically because of technological upgrade, or because of the necessity of processing new kind of data. Unfortunately, such migration usually results in the incompatibility of speaker models and, therefore, in the inability to compare two models. The solution would be to re-train the speaker models and rebuild a model database; however, it may and most likely will happen that the access to the original audio file is unavailable, e.g. due to legal issues. This work introduces a technique of transforming the original speaker models so that---with a slight loss in the accuracy---they are compatible with the new FASIS models. We presents the results on the NIST SRE 2010 evaluation tasks. Our system is based on the i-vector framework which converts arbitrarily long audio waveform to a fixed-length low-dimensional vector which serves as a speaker model. In this context, the i-vector is sometimes referred to as a voice-print. We use Artificial Neural Networks to restore the original speaker models by mapping them to the new domain. We show that there is approximately 20\% relative increase in error rates when substituting the new test speaker models with the restored ones. Normally, the incompatibility of the original speaker models without having the audio files available would make such task impossible.

Your IPv4 address:
Switch to IPv6 connection

DNSSEC [dnssec]