Department of Computer Graphics and Multimedia


Technologies of speech processing for efficient human-machine communication

Czech title:Technologie zpracování řeči pro efektivní komunikaci člověk-počítač
Reseach leader:Černocký Jan
Team leaders:Hannemann Mirko, Heřmanský Hynek, Szőke Igor, Žižka Josef
Team members:Karafiát Martin, Ondel Lucas
Agency:Technology Agency of the Czech Republic
Code:TA01011328
Start:2011-01-01
End:2014-12-31
Keywords:speech recognition, electronic dictionaries, defense and security, mobile devices, dialogue systems, CRM, eLearning
Annotation:
Project aims at development of advanced techniques in speech recognition and their deployment in the functional applications: search in electronic dictionaries on mobile devices, dictating translations, in defense and security, in dialogue systems, in client-care systems (CRM, helpdesk etc.) and in audio-visual access to teaching materials.

Products

2014Audiovisual lecture browser, prototype, 2014
Authors: Žižka Josef, Szőke Igor, Fapšo Michal
2012KALDI speech recognition toolkit, software, 2012
Authors: Povey Daniel, Ghoshal Arnab, Boulianne Gilles, Burget Lukáš, Glembek Ondřej, Goel Nagendra K., Hannemann Mirko, Motlíček Petr, Qian Yanmin, Schwarz Petr, Silovský Jan, Stemmer Georg, Veselý Karel

Publications

2015ONDEL Lucas, ANGUERA Xavier and LUQUE Jordi. MASK+:Data-Driven Regions Selection for Acoustic Fingerprinting. In: Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015, pp. 335-339. ISBN 978-1-4673-6997-8.
2014GLEMBEK Ondřej, MA Jeff, MATĚJKA Pavel, ZHANG Bing, PLCHOT Oldřich, BURGET Lukáš and MATSOUKAS Spyros. Domain Adaptation Via Within-class Covariance Correction in I-Vector Based Speaker Recognition Systerms. In: Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014, pp. 4060-4064. ISBN 978-1-4799-2892-7.
 KARAFIÁT Martin, GRÉZL František, HANNEMANN Mirko and ČERNOCKÝ Jan. BUT Neural Network Features for Spontaneous Vietnamese in BABEL. In: Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014, pp. 5659-5663. ISBN 978-1-4799-2892-7.
 KARAFIÁT Martin, GRÉZL František, VESELÝ Karel, HANNEMANN Mirko, SZŐKE Igor and ČERNOCKÝ Jan. BUT 2014 Babel System: Analysis of adaptation in NN based systems. In: Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014, pp. 3002-3006. ISBN 978-1-63439-435-2.
 KARAFIÁT Martin, VESELÝ Karel, SZŐKE Igor, BURGET Lukáš, GRÉZL František, HANNEMANN Mirko and ČERNOCKÝ Jan. BUT ASR System for BABEL Surprise Evaluation 2014. In: Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014, pp. 501-506. ISBN 978-1-4799-7129-9.
 MARTÍNEZ González David, BURGET Lukáš, STAFYLAKIS Themos, LEI Yun, KENNY Patrick and LLEIDA Eduardo. Unscented Transform For Ivector-based Noisy Speaker Recognition. In: Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014, pp. 4070-4074. ISBN 978-1-4799-2892-7.
2013EGOROVA Ekaterina, VESELÝ Karel, KARAFIÁT Martin, JANDA Miloš and ČERNOCKÝ Jan. Manual and Semi-Automatic Approaches to Building a Multilingual Phoneme Set. In: Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013, pp. 7324-7328. ISBN 978-1-4799-0355-9.
 LEI Yun, BURGET Lukáš and SCHEFFER Nicolas. A Noise Robust I-Vector Extractor Using Vector Taylor Series For Speaker Recognition. In: Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013, pp. 6788-6791. ISBN 978-1-4799-0355-9.
 PLCHOT Oldřich, MATSOUKAS Spyros, MATĚJKA Pavel, DEHAK Najim, MA Jeff, CUMANI Sandro, GLEMBEK Ondřej, HEŘMANSKÝ Hynek, MESGARANI Nima, SOUFIFAR Mehdi Mohammad, THOMAS Samuel, ZHANG Bing and ZHOU Xinhui et al. Developing A Speaker Identification System For The DARPA RATS Project. In: Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013, pp. 6768-6772. ISBN 978-1-4799-0355-9.
 RATH Shakti P., BURGET Lukáš, KARAFIÁT Martin, GLEMBEK Ondřej and ČERNOCKÝ Jan. A Region-specific Feature-space Transformation for Speaker Adaptation and Singularity Analysis of Jacobian Matrix. In: Proceedings of Interspeeech 2013. Lyon: International Speech Communication Association, 2013, pp. 1228-1232. ISBN 978-1-62993-443-3. ISSN 2308-457X.
 RATH Shakti P., POVEY Daniel, VESELÝ Karel and ČERNOCKÝ Jan. Improved Feature Processing for Deep Neural Networks. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 109-113. ISBN 978-1-62993-443-3. ISSN 2308-457X.
2012CUMANI Sandro, PLCHOT Oldřich and KARAFIÁT Martin. Independent Component Analysis and MLLR Transforms for Speaker Identification. In: Proc. International Conference on Acoustics, Speech, and Signal P. Kyoto: IEEE Signal Processing Society, 2012, pp. 4365-4368. ISBN 978-1-4673-0044-5.
 DEORAS Anoop, MIKOLOV Tomáš, KOMBRINK Stefan and CHURCH Kenneth. Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model. Speech Communication. Amsterdam: Elsevier Science, 2012, vol. 2012, no. 8, pp. 1-16. ISSN 0167-6393.
 KARAFIÁT Martin, JANDA Miloš, ČERNOCKÝ Jan and BURGET Lukáš. Region Dependent Linear Transforms in Multilingual Speech Recognition. In: Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012, pp. 4885-4888. ISBN 978-1-4673-0044-5.
 KOMBRINK Stefan, MIKOLOV Tomáš, KARAFIÁT Martin and BURGET Lukáš. Improving Language Models for ASR Using Translated In-domain Data. In: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012, pp. 4405-4408. ISBN 978-1-4673-0044-5.
 POVEY Daniel, HANNEMANN Mirko, BOULIANNE Gilles, BURGET Lukáš, GHOSHAL Arnab, JANDA Miloš, KARAFIÁT Martin, KOMBRINK Stefan, MOTLÍČEK Petr, QIAN Yanmin, RIEDHAMMER Korbinian, VESELÝ Karel and VU Ngoc Thang. Generating Exact Lattices in The WFST Framework. In: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto: IEEE Signal Processing Society, 2012, pp. 4213-4216. ISBN 978-1-4673-0044-5.
 RATH Shakti P., KARAFIÁT Martin, GLEMBEK Ondřej and ČERNOCKÝ Jan. A factorized representation of FMLLR transform based on QR-decomposition. In: Proceedings of Interspeech 2012. Portland, Oregon: International Speech Communication Association, 2012, pp. 1-4. ISBN 978-1-62276-759-5. ISSN 1990-9772.
 SOUFIFAR Mehdi Mohammad, CUMANI Sandro, BURGET Lukáš and ČERNOCKÝ Jan. Discriminative Classifiers for Phonotactic Language Recognition with iVectors. In: Proc. International Conference on Acoustics, Speech, and Signal Processing 2012. Kyoto: IEEE Signal Processing Society, 2012, pp. 4853-4856. ISBN 978-1-4673-0044-5.
 SZŐKE Igor, FAPŠO Michal and VESELÝ Karel. BUT2012 Approaches for Spoken Web Search - MediaEval 2012. In: Working Notes Proceedings of the MediaEval 2012 Workshop. Pisa: CEUR-WS.org, 2012, pp. 1-2. ISSN 1613-0073.
 SZŐKE Igor, FAPŠO Michal, ŽIŽKA Josef, BERAN Vítězslav and ČERNOCKÝ Jan. Efektivní přístup ke znalostem v audio-vizuálních záznamech. In: Proceedings of the Annual Database Conference. Praha: The University of Technology Košice, 2012, pp. 57-74. ISBN 978-80-553-1049-7.
 VESELÝ Karel, KARAFIÁT Martin, GRÉZL František, JANDA Miloš and EGOROVA Ekaterina. The Language-Independent Bottleneck Features. In: Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012, pp. 336-341. ISBN 978-1-4673-5124-9.
2011DEORAS Anoop, MIKOLOV Tomáš and CHURCH Kenneth. A Fast Re-scoring Strategy to Capture Long-Distance Dependencies. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing July 2011 Edinburgh, Scotland, UK. Edinburgh: Association for Computational Linguistics, 2011, pp. 1116-1127. ISBN 978-1-937284-11-4.
 GRÉZL František and KARAFIÁT Martin. Integrating recent MLP feature extraction techniques into TRAP architecture. In: Proceedings of Interspeech 2011. Florence: International Speech Communication Association, 2011, pp. 1229-1232. ISBN 978-1-61839-270-1. ISSN 1990-9772.
 GRÉZL František. The Role of Neural Network Size in TRAP/HATS Feature Extraction. In: Proceedings Text, Speech and Dialogue 2011. Plzeň: Springer Verlag, 2011, pp. 315-322. ISBN 978-3-642-23537-5. ISSN 0302-9743.
 KARAFIÁT Martin, BURGET Lukáš, MATĚJKA Pavel, GLEMBEK Ondřej and ČERNOCKÝ Jan. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. In: Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011, pp. 152-157. ISBN 978-1-4673-0366-8.
 KOMBRINK Stefan, MIKOLOV Tomáš, KARAFIÁT Martin and BURGET Lukáš. Recurrent Neural Network based Language Modeling in Meeting Recognition. In: Proceedings of Interspeech 2011. Florence: International Speech Communication Association, 2011, pp. 2877-2880. ISBN 978-1-61839-270-1. ISSN 1990-9772.
 MIKOLOV Tomáš, DEORAS Anoop, KOMBRINK Stefan, BURGET Lukáš and ČERNOCKÝ Jan. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. In: Proceedings of Interspeech 2011. Florence: International Speech Communication Association, 2011, pp. 605-608. ISBN 978-1-61839-270-1. ISSN 1990-9772.
 MIKOLOV Tomáš, DEORAS Anoop, POVEY Daniel, BURGET Lukáš and ČERNOCKÝ Jan. Strategies for Training Large Scale Neural Network Language Models. In: Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011, pp. 196-201. ISBN 978-1-4673-0366-8.
 MIKOLOV Tomáš, KOMBRINK Stefan, DEORAS Anoop, BURGET Lukáš and ČERNOCKÝ Jan. RNNLM - Recurrent Neural Network Language Modeling Toolkit. In: Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011, pp. 1-4. ISBN 978-1-4673-0366-8.
 POVEY Daniel, GHOSHAL Arnab, BOULIANNE Gilles, BURGET Lukáš, GLEMBEK Ondřej, GOEL Nagendra K., HANNEMANN Mirko, MOTLÍČEK Petr, QIAN Yanmin, SCHWARZ Petr, SILOVSKÝ Jan, STEMMER Georg and VESELÝ Karel. The Kaldi Speech Recognition Toolkit. In: Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011, pp. 1-4. ISBN 978-1-4673-0366-8.
 VESELÝ Karel, KARAFIÁT Martin and GRÉZL František. Convolutive Bottleneck Network Features for LVCSR. In: Proceedings of ASRU 2011. Big Island, Hawaii: IEEE Signal Processing Society, 2011, pp. 42-47. ISBN 978-1-4673-0366-8.

Your IPv4 address: 23.22.58.239
Switch to IPv6 connection

DNSSEC [dnssec]