Exploiting Language Information for Situational Awareness (ELISA)
|Czech title:||Využití jazykových informací pro informování v různých situacích (ELISA)|
|Reseach leader:||Burget Lukáš|
|Team leaders:||Černocký Jan, Matějka Pavel, Szőke Igor|
|Team members:||Beneš Karel, Fér Radek, Glembek Ondřej, Ondel Lucas, Skácel Miroslav, Žmolíková Kateřina|
|Agency:||University of Southern California|
|Keywords:||Speech processing, language, apeech mining|
|Speech processing in our proposal will be addressed by low-resource or language-agnostic technologies. Rather than concentrating on mining the content (for which, obviously, standard resources such as acoustic model, language model or pronunciation dictionary will be lacking), speech data will be handled by a multitude of "speech miners" that make minimum use of resources of the target language.
The processing will begin with a reliable voice activity detection (VAD) capable of segmenting the signal into useful and useless portions. Often regarded as "not a rocket science", a good VAD is crucial for correct functioning of the following blocks and for human processing of speech input. Our work will improve on existing DNN-based VAD that proved its efficiency in a difficult RATS setting [Ng2012]. A processing with several phone posterior estimators with either mono-lingual or multilingual phoneme sets [Schwarz2009] will follow to provide the "miners" with a coherent low-dimensional representation.
The first real "miner" will be language identification (LID) with a significant set of target languages (>60). Even if it is not sure that the target language will be in this set, LID will allow to detect segments in English or possibly in other languages for which we have ASR technology. We will follow our recent development of LID base on features derived from phone posteriors [Plchot2013] as well as on DNNs. We will also work on enrollment of a new language with very little data (down to one utterance). Another "miner" will perform basic speaking style recognition allowing to separate read speech from spontaneous. Finally, speaker recognition (SRE) or clustering will allow to gather information about speakers (in case they were previously enrolled) or at least to perform coarse speaker clustering, as for the analyst, the information on who is speaking can be equally important as what is said. Here, we will build up on our significant track in iVector-based SRE and will mainly work on automatic adaptation and calibration on unlabeled data-sets [Brummer2014]|
|2017||HANNEMANN Mirko, TRMAL Jan, ONDEL Lucas, KESIRAJU Santosh and BURGET Lukáš. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 2836-2840. ISBN 978-1-5090-4117-6.|
| ||KESIRAJU Santosh, PAPPAGARI Raghavendra, ONDEL Lucas, BURGET Lukáš, DEHAK Najim, KHUDANPUR Sanjeev, ČERNOCKÝ Jan and GANGASHETTY Suryakanth V. Topic identification of spoken documents using unsupervised acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5745-5749. ISBN 978-1-5090-4117-6.|
| ||LIU Chunxi, YANG Jinyi, SUN Ming, KESIRAJU Santosh, ROTT Alena, ONDEL Lucas, GHAHREMANI Pegah, DEHAK Najim, BURGET Lukáš and KHUDANPUR Sanjeev. An Empirical evaluation of zero resource acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5305-5309. ISBN 978-1-5090-4117-6.|
| ||ONDEL Lucas, BURGET Lukáš, ČERNOCKÝ Jan and KESIRAJU Santosh. Bayesian phonotactic language model for acoustic unit discovery. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 5750-5754. ISBN 978-1-5090-4117-6.|
| ||PAPADOPOULOS Pavlos, TRAVADI Ruchir, VAZ Colin, MALANDRAKIS Nikolaos, HERMJAKOB Ulf, POURDAMGHANI Nima, PUST Michael, ZHANG Boliang, PAN Xiaoman, LU Di, LIN Ying, GLEMBEK Ondřej, BASKAR Murali K., KARAFIÁT Martin, BURGET Lukáš, HASEGAWA-JOHNSON Mark, JI Heng, MAY Jonathan, KNIGHT Kevin and NARAYANAN Shrikanth. Team ELISA System for DARPA LORELEI Speech Evaluation 2016. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 2053-2057. ISSN 1990-9772.|
|2016||KESIRAJU Santosh, BURGET Lukáš, SZŐKE Igor and ČERNOCKÝ Jan. Learning document representations using subspace multinomial model. In: Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016, pp. 700-704. ISBN 978-1-5108-3313-5.|