Speech enhancement front-end for robust automatic speech recognition with large amount of training data

Czech title:Parametrizace s obohacováním řeči pro robustní automatické rozpoznávání řeči s velkým objemem trénovacích dat
Reseach leader:®molíková Kateřina
Team leaders:Černocký Jan
Agency:NTT Corporation
Start:2017-10-01
End:2018-09-30
Keywords:speech recognition, robustness, large data, DNN embeddings
Annotation:
The purpose of the Joint Research is to develop Speech enhancement front-end for robust automatic speech recognition with large amount of training data through the cooperation of NTT and BUT. The work is relying on embeddings produced by neural networks in various places of the processing chain.

Publications

2018DELCROIX Marc, ®MOLÍKOVÁ Kateřina, KINOSHITA Keisuke, OGAWA Atsunori and NAKATANI Tomohiro. Single Channel Target Speaker Extraction and Recognition with Speaker Beam. In: Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018, pp. 5554-5558. ISBN 978-1-5386-4658-8.
 ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel and BURGET Lukáą. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In: Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018, pp. 4874-4878. ISBN 978-1-5386-4658-8.
2017®MOLÍKOVÁ Kateřina. Summary report of project "Speech enhancement front-end for robust automatic speech recognition with large amount of training data" for Year 2017. Brno: NTT Corporation, 2017.
 ®MOLÍKOVÁ Kateřina, DELCROIX Marc, KINOSHITA Keisuke, HIGUCHI Takuya, OGAWA Atsunori and NAKATANI Tomohiro. Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction. In: Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017, pp. 8-15. ISBN 978-1-5090-4788-8.

Your IPv4 address: 54.162.15.31
Switch to IPv6 connection

DNSSEC [dnssec]