Journal article

LOPEZ-MORENO Ignacio, GONZALEZ-DOMINGUEZ Javier, MARTÍNEZ González David, PLCHOT Oldřich, GONZALEZ-RODRIGUEZ Joaquin and MORENO Pedro. On the use of deep feedforward neural networks for automatic language identification. Computer Speech and Language. Amsterdam: Elsevier Science, 2016, vol. 2016, no. 40, pp. 46-59. ISSN 0885-2308. Available from: http://www.sciencedirect.com/science/article/pii/S088523081530036X
Publication language:english
Original title:On the use of deep feedforward neural networks for automatic language identification
Title (cs):Využití hlubokých dopředných neuronových sítí pro automatickou identifikaci jazyka
Pages:46-59
Place:NL
Year:2016
URL:http://www.sciencedirect.com/science/article/pii/S088523081530036X
Journal:Computer Speech and Language, Vol. 2016, No. 40, Amsterdam, NL
ISSN:0885-2308
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2016/plchot_csl2016_1-s2.0-S088523081530036X-main.pdf [PDF]
Files: 
+Type Name Title Size Last modified
iconlopez-moreno_csl2016_za FIT_plchot.pdf1,58 MB2017-03-06 11:08:55
^ Select all
With selected:
Keywords
LID; DNN; Bottleneck; i-vectors
Annotation
In this work, we presented an extensive study of the use of deep neural networks for LID. Guided by the success of DNNs for acoustic modelling, we explored their capability to learn discriminative language information from speech signals.
Abstract
In this work, we present a comprehensive study on the use of deep neural networks (DNNs) for automatic language identification (LID). Motivated by the recent success of using DNNs in acoustic modeling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from its short-term acoustic features. We propose two different DNN-based approaches. In the first one, the DNN acts as an end-to-end LID classifier, receiving as input the speech features and providing as output the estimated probabilities of the target languages. In the second approach, the DNN is used to extract bottleneck features that are then used as inputs for a state-of-the-art i-vector system. Experiments are conducted in two different scenarios: the complete NIST Language Recognition Evaluation dataset 2009 (LRE'09) and a subset of the Voice of America (VOA) data from LRE'09, in which all languages have the same amount of training data. Results for both datasets demonstrate that the DNN-based systems significantly outperform a state-of-art i-vector system when dealing with short-duration utterances. Furthermore, the combination of the DNN-based and the classical i-vector system leads to additional performance improvements (up to 45% of relative improvement in both EER and CavgCavg on 3s and 10s conditions, respectively).
BibTeX:
@ARTICLE{
   author = {Ignacio Lopez-Moreno and Javier Gonzalez-Dominguez and David
	Gonz{\'{a}}lez Mart{\'{i}}nez and Old{\v{r}}ich Plchot and
	Joaquin Gonzalez-Rodriguez and Pedro Moreno},
   title = {On the use of deep feedforward neural networks for automatic
	language identification},
   pages = {46--59},
   journal = {Computer Speech and Language},
   volume = {2016},
   number = {40},
   year = {2016},
   ISSN = {0885-2308},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php?id=11180}
}

Your IPv4 address: 54.166.19.237
Switch to IPv6 connection

DNSSEC [dnssec]