Publication Details

BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge

KOCOUR Martin, UMESH Jahnavi, KARAFIÁT Martin, ŠVEC Ján, LOPEZ Fernando, BENEŠ Karel, DIEZ Sánchez Mireia, SZŐKE Igor, LUQUE Jordi, VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. In: Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022, pp. 276-280. Available from: https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf
Czech title
BCN2BRNO: Fúze ASR systémů pro Albayzin 2022 Speech to Text Challenge
Type
conference paper
Language
english
Authors
Kocour Martin, Ing. (DCGM FIT BUT)
Umesh Jahnavi (FIT BUT)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Švec Ján, Ing. (DCGM FIT BUT)
Lopez Fernando (Telefónica)
Beneš Karel, Ing. (DCGM FIT BUT)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT)
Luque Jordi (Telefónica)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

ASR fusion, end-to-end model, self-supervised learning, automatic speech recognition.

Abstract

Research on the development of Automatic Speech Recognition systems for the Albayzin 2022 Challenge. We train and evaluate both hybrid systems and those based on end-to-end models. We also investigate the use of self-supervised learning speech representations from pre-trained models and their impact on ASR performance (as opposed to training models directly from scratch). Additionally, we also apply the Whisper model in a zero-shot fashion, postprocessing its output to fit the required transcription format. On top of tuning the model architectures and overall training schemes, we improve the robustness of our models by augmenting the training data with noises extracted from the target domain. Moreover, we apply rescoring with an external LM on top of N-best hypotheses to adjust each sentence score and pick the single best hypothesis. All these efforts lead to a significant WER reduction. Our single best system and the fusion of selected systems achieved 16.3% and 13.7% WER respectively on RTVE2020 test partition, i.e. the official evaluation partition from the previous Albayzin challenge.

Published
2022
Pages
276-280
Proceedings
Proceedings of IberSpeech 2022
Conference
IberSPEECH 2022 Conference, Granada, ES
Publisher
International Speech Communication Association
Place
Granada, ES
DOI
BibTeX
@INPROCEEDINGS{FITPUB12859,
   author = "Martin Kocour and Jahnavi Umesh and Martin Karafi\'{a}t and J\'{a}n \v{S}vec and Fernando Lopez and Karel Bene\v{s} and Mireia S\'{a}nchez Diez and Igor Sz\H{o}ke and Jordi Luque and Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge",
   pages = "276--280",
   booktitle = "Proceedings of IberSpeech 2022",
   year = 2022,
   location = "Granada, ES",
   publisher = "International Speech Communication Association",
   doi = "10.21437/IberSPEECH.2022-56",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12859"
}
Back to top