Ing. Martin Karafiát, Ph.D.

KARAFIÁT Martin, GRÉZL František, BURGET Lukáš, SZŐKE Igor and ČERNOCKÝ Jan. Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge. In: Proceedings of Interspeech 2015. Dresden: International Speech Communication Association, 2015, pp. 2454-2458. ISBN 978-1-5108-1790-6. ISSN 1990-9772.
Publication language:english
Original title:Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge
Title (cs):Tři způsoby adaptace telefonního rozpoznávače pro neviděnou reverberovanou řeč ve VUT systému pro soutěž ASpIRE
Pages:2454-2458
Proceedings:Proceedings of Interspeech 2015
Conference:INTERSPEECH 2015
Place:Dresden, DE
Year:2015
ISBN:978-1-5108-1790-6
Journal:Proceedings of Interspeech, Vol. 2015, No. 09, FR
ISSN:1990-9772
Publisher:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2015/karafiat_interspeech2015_IS151376.pdf [PDF]
Files: 
+Type Name Title Size Last modified
iconkarafiat_interspeech2015_IS151376.pdf101 KB2017-03-01 18:36:27
^ Select all
With selected:
Keywords
speech recognition, reverberation, dereverberation, neural networks, DNN
Annotation
We have presented our work towards the ASR of wide-band noisy reverberant speech in ASpIRE challenge. To solve this task, we have started with augmenting Fisher data with artificially noised and reverberated versions.
Abstract
This paper describes several strategies tested in BUT’s submission to the IARPA ASpIRE challenge. The ASpIRE task was to develop an automatic speech recognition (ASR) system for wide-band noisy reverberant speech, while only clean CTS (Fisher) data was allowed for ASR training. To solve this task, we have started with augmenting Fisher data with artificially noised and reverberated versions. The most obvious adaptation was (1) to re-train the whole GMM/HMM-based ASR system. Then, two techniques were designed and tested to make the adaptation easier and overcome retraining the whole ASR on large amount of speech: (2) we trained a speech enhancement DNN (also called de-noising auto-encoder), and (3) we adapted the feature extraction based on stacked bottle-neck networks (SBN). While re-training the whole system works the best, only slightly inferior results were obtained with the autoencoder denoising followed by retraining of the first layers of the SBN hierarchy, letting most of the ASR system trained on clean Fisher unchanged. This shows a promising, efficient and fast way to port ASR systems to new conditions.
BibTeX:
@INPROCEEDINGS{
   author = {Martin Karafi{\'{a}}t and Franti{\v{s}}ek Gr{\'{e}}zl and
	Luk{\'{a}}{\v{s}} Burget and Igor Sz{\H{o}}ke and Jan
	{\v{C}}ernock{\'{y}}},
   title = {Three ways to adapt a CTS recognizer to unseen reverberated
	speech in BUT system for the ASpIRE challenge},
   pages = {2454--2458},
   booktitle = {Proceedings of Interspeech 2015},
   journal = {Proceedings of Interspeech},
   volume = {2015},
   number = {09},
   year = {2015},
   location = {Dresden, DE},
   publisher = {International Speech Communication Association},
   ISBN = {978-1-5108-1790-6},
   ISSN = {1990-9772},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=10972}
}

Your IPv4 address: 54.146.59.207
Switch to IPv6 connection

DNSSEC [dnssec]