Článek ve sborníku konference

DAS Amit, HASEGAWA-JOHNSON Mark a VESELÝ Karel. Deep Auto-encoder Based Multi-task Learning Using Probabilistic Transcriptions. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, s. 2073-2077. ISSN 1990-9772. Dostupné z: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0582.PDF
Jazyk publikace:angličtina
Název publikace:Deep Auto-encoder Based Multi-task Learning Using Probabilistic Transcriptions
Název (cs):Multi-task trénování s pravděpodobnostními přepisy založené na hlubokém autoenkodéru
Sborník:Proceedings of Interspeech 2017
Konference:Interspeech 2017
Místo vydání:Stockholm, SE
Časopis:Proceedings of Interspeech, roč. 2017, č. 08, FR
Vydavatel:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/das_interspeech2017_IS170582.pdf [PDF]
Klíčová slova
cross-lingual speech recognition, probabilistic transcription, deep neural networks, multi-task learning
Článek pojednává o multi-task trénování s pravděpodobnostními přepisy založené na hlubokém autoenkodéru.
We examine a scenario where we have no access to native transcribers in the target language. This is typical of language communities that are under-resourced. However, turkers (online crowd workers) available in online marketplaces can serve as valuable alternative resources for providing transcripts in the target language. We assume that the turkers neither speak nor have any familiarity with the target language. Thus, they are unable to distinguish all phone pairs in the target language; their transcripts therefore specify, at best, a probability distribution called a probabilistic transcript (PT). Standard deep neural network (DNN) training using PTs do not necessarily improve error rates. Previously reported results have demonstrated some success by adopting the multi-task learning (MTL) approach. In this study, we report further improvements by introducing a deep auto-encoder based MTL. This method leverages large amounts of untranscribed data in the target language in addition to the PTs obtained from turkers. Furthermore, to encourage transfer learning in the feature space, we also examine the effect of using monophones from transcripts in well-resourced languages. We report consistent improvement in phone error rates (PER) for Swahili, Amharic, Dinka, and Mandarin.
   author = {Amit Das and Mark Hasegawa-Johnson and Karel
   title = {Deep Auto-encoder Based Multi-task Learning Using
	Probabilistic Transcriptions},
   pages = {2073--2077},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = 2017,
 number = 08,
   year = 2017,
   location = {Stockholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-582},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.cs.iso-8859-2?id=11585}

Vaše IPv4 adresa:
Přepnout na https