Conference paper

VESELÝ Karel, BASKAR Murali K., DIEZ Sánchez Mireia and BENEŠ Karel. MGB-3 BUT System: Low-resource ASR on Egyptian YOUTUBE data. In: Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017, pp. 368-373. ISBN 978-1-5090-4788-8.
Publication language:english
Original title:MGB-3 BUT System: Low-resource ASR on Egyptian YOUTUBE data
Title (cs):MGB-3 BUT Systém: egyptské rozpoznávání řeči s omezenými zdroji
Pages:368-373
Proceedings:Proceedings of ASRU 2017
Conference:2017 IEEE Automatic Speech Recognition and Understanding Workshop
Place:Okinawa, JP
Year:2017
ISBN:978-1-5090-4788-8
Publisher:IEEE Signal Processing Society
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/vesely_asru2017_mgb3-paper.pdf [PDF]
Files: 
+Type Name Title Size Last modified
iconvesely_asru2017_mgb3-paper.pdf155 KB2018-01-04 17:19:41
^ Select all
With selected:
Keywords
MGB-3, ASR adaptation, low-resource ASR, Egyptian Arabic, diarization
Annotation
In this paper we described the adaptation strategies we used in the MGB-3 evaluations. BUT System was using low-resource ASR on Egyptian YOUTUBE data.
Abstract
This paper presents a series of experiments we performed during our work on the MGB-3 evaluations. We both describe the submitted system, as well as the post-evaluation analysis. Our initial BLSTM-HMM system was trained on 250 hours of MGB-2 data (Al-Jazeera), it was adapted with 5 hours of Egyptian data (YouTube). We included such techniques as diarization, n-gram language model adaptation, speed perturbation of the adaptation data, and the use of all 4 correct references. The 4 references were either used for supervision with a confusion network, or we included each sentence 4x with the transcripts from all the annotators. Then, it was also helpful to blend the augmented MGB-3 adaptation data with 15 hours of MGB-2 data. Although we did not rank with our single system among the best teams in the evaluations, we believe that our analysis will be highly interesting not only for the other MGB-3 challenge participants.
BibTeX:
@INPROCEEDINGS{
   author = {Karel Vesel{\'{y}} and K. Murali Baskar and Mireia
	S{\'{a}}nchez Diez and Karel Bene{\v{s}}},
   title = {MGB-3 BUT System: Low-resource ASR on Egyptian YOUTUBE data},
   pages = {368--373},
   booktitle = {Proceedings of ASRU 2017},
   year = {2017},
   location = {Okinawa, JP},
   publisher = {IEEE Signal Processing Society},
   ISBN = {978-1-5090-4788-8},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11595}
}

Your IPv4 address: 23.22.240.119
Switch to IPv6 connection

DNSSEC [dnssec]