Department of Computer Graphics and Multimedia
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods
|Czech title:||Robustní diarizace mluvčích pomocí Bayesovské inference a hlubokého učení|
|Reseach leader:||Diez Sánchez Mireia|
|Team leaders:||Mošner Ladislav (FIT VUT)|
|Agency:||European Comission EU - Horizon 2020|
|Keywords:||Machine learning, statistical data processing and applications using signal processing, Numerical analysis, simulation, optimisation, modelling tools, data mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive science, human computer interaction, natural language processing, Complexity and cryptography, electronic security, privacy, biometrics, Speaker Diarization, Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech Data Mining|
|The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question
"who spoke when?" in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD,
which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization,
we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as
collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian
speaker recognition models.
The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to
other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of
i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the
different stages of SD systems.
The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data.
The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data
mining industry (for example Czech Phonexia or Spanish Agnitio).
The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of
University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization.
The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data
mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances
to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit