Department of Computer Graphics and Multimedia
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods
|Czech title:||Robustní diarizace mluvčích pomocí Bayesovské inference a hlubokého učení|
|Reseach leader:||Diez Sánchez Mireia|
|Team leaders:||Mošner Ladislav (FIT VUT)|
|Agency:||European Comission EU - Horizon 2020|
|Keywords:||Machine learning, statistical data processing and applications using signal processing, Numerical analysis, simulation, optimisation, modelling tools, data mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive science, human computer interaction, natural language processing, Complexity and cryptography, electronic security, privacy, biometrics, Speaker Diarization, Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech Data Mining|
|The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question
"who spoke when?" in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD,
which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization,
we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as
collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian
speaker recognition models.
The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to
other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of
i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the
different stages of SD systems.
The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data.
The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data
mining industry (for example Czech Phonexia or Spanish Agnitio).
The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of
University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization.
The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data
mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances
to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit
|2018||ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel and BURGET Lukáš. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In: Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018, pp. 4874-4878. ISBN 978-1-5386-4658-8.|
|2017||MATĚJKA Pavel, NOVOTNÝ Ondřej, PLCHOT Oldřich, BURGET Lukáš, DIEZ Sánchez Mireia and ČERNOCKÝ Jan. Analysis of Score Normalization in Multilingual Speaker Recognition. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 1567-1571. ISSN 1990-9772.|
| ||MATĚJKA Pavel, PLCHOT Oldřich, NOVOTNÝ Ondřej, CUMANI Sandro, LOZANO-DIEZ Alicia, SLAVÍČEK Josef, DIEZ Sánchez Mireia, GRÉZL František, GLEMBEK Ondřej, KAMSALI Veera Mounika, SILNOVA Anna, BURGET Lukáš, ONDEL Lucas, KESIRAJU Santosh and ROHDIN Johan A. BUT- PT System Description for NIST LRE 2017. In: Proceedings of NIST Language Recognition Workshop 2017. Orlando, Florida: National Institute of Standards and Technology, 2017, pp. 1-6.|
| ||PLCHOT Oldřich, MATĚJKA Pavel, SILNOVA Anna, NOVOTNÝ Ondřej, DIEZ Sánchez Mireia, ROHDIN Johan A., GLEMBEK Ondřej, BRÜMMER Niko, SWART Albert du Preez, PRIETO Jesús J., GARCIA Perera Leibny Paola, BUERA Luis, KENNY Patrick, ALAM Jahangir and BHATTACHARYA Gautam. Analysis and Description of ABC Submission to NIST SRE 2016. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 1348-1352. ISSN 1990-9772.|
| ||VESELÝ Karel, BASKAR Murali K., DIEZ Sánchez Mireia and BENEŠ Karel. MGB-3 BUT System: Low-resource ASR on Egyptian YOUTUBE data. In: Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017, pp. 368-373. ISBN 978-1-5090-4788-8.|