########################################################### # Brno University of Technology # # Czech Republic # ########################################################### # Language Recognition Evaluation 2005 # ########################################################### General comments: - Hardware for REAL-TIME FACTORS: P4 2.4GHz, RAM 500MB. - Realtime factors for dialect recognition are an additional computation cost to LID. - The language scores for all combined systems were merged by linear merging. ########################################################### 1) PRIMARY SYSTEM: The Primary system was tuned to the primary condition. This system consists of two subsystems: PPRLM and GMM PPRLM The PPRLM consists of three PRLM systems with hybrid HMM/ANN recognizers trained on Hungarian, Russian and Czech languages from SpeechDat-E databases [4]. In phoneme recognizers, the split temporal context approach with 3 neural nets was used [1]. The phoneme recognizers first generate phoneneme lattices using only acoustic scores, then these lattices are expanded by the corresponding language models (LM) as in [2]. The language modeling is done by 3-gram backoff models with WittenBell discounting. The 3-gram counts are estimated from lattices and weighted by posteriors derived from lattices. Counts for testing are derived from lattices too. Two language models are used for each language: the target language model - LM trained on target language, and an anti-model to the target language, trained on non-target data weighted by posteriors of the data producing false alarms (inspired by [3], precise description in [6]). The final score is linear combination of log LM score and log anti-LM score. Target language models were trained on the Callfriend Corpus. The Hindi-English was trained on the Foreign Accented English (FAE) database from OGI. Anti-language model were trained on Callfriend Corpus, FAE, OGI multilanguage and on OGI 22 languages. In the anti-model training, only the segments unseen during the training of target LMs were used. GMM GMM with 256 Gaussian mixtures per language trained was trained by MMI criterion (Maximum Mutual Information) [8] using STK toolkit [6]. The features are shifted delta cepstra [7] (7,1,3,7) derived from 7 cepstral coefficient and C0 + direct cepstral coefficients (together 56 dimensions). The VTLN was applied. GMMs were trained on the Callfriend Corpus; Hindi and Tamil English from the FAE, OGI multilanguage and OGI 22 languages. REALTIME FACTOR: [3xPRLM(lattice generation,lattice expansion) + GMM(VTLN,GMM)] 3x(0.28+0.4) + (0.1+0.1) = 2.25 DIALECT MAN: 3xPRLM+GMM trained on Callfriend. DIALECT ENG: 3xPRLM+GMM trained on Callfriend, Hindi and Tamil from FAE. NOTE: This system is not designed to reject languages. ########################################################### 2) SECONDARY SYSTEM: Similar as primary system but rejects out-set languages. 6 background models are trained on non-target languages from Callfriend, OGI multilanguage and OGI 22 languages. The best score out of the 6 is taken as the background model score. Background models have their anti-models, too. PPRLM + GMM: Trained on Callfriend Corpus, Foreign Accented English database, OGI multilanguage and OGI 22 languages databases. REALTIME FACTOR: [3xPRLM(lattice generation,lattice expansion) + GMM(VTLN,GMM)] 3x(0.28+0.7) + (0.1+0.1) = 3.2 DIALECTS: the same as in primary system NOTE: In case the system detects that the segment was spoken in an out-of-set language, "False" is assigned to all languages. ########################################################### 3) PPRLM Subsystem of the primary system REALTIME FACTOR: [3xPRLM(lattice generation + lattice expansion)] 3x(0.28 +0.4) = 2.1 DIALECT MAN: - PPRLM, LM trained on Callfriend. - REALTIME FACTOR: [3xPRLM(lattice expansion)] 3x(0.11)=0.3 DIALECT ENG: - PPRLM, LM trained on Callfriend, Hindi and Tamil from FAE. - REALTIME FACTOR: [3xPRLM(lattice expansion)] 3x(0.11)=0.3 ########################################################### 4) GMM256/MMI Subsystem of the primary system REALTIME FACTOR: GMM(segmentation,VTLN,GMM) 0.25+0.1+0.1=0.45 DIALECT MAN: - GMM256/MMI trained on Callfriend. - REALTIME FACTOR: GMM = 0.02 DIALECT ENG: - GMM128/MMI trained on Callfriend, Hindi and Tamil from FAE. - REALTIME FACTOR: GMM = 0.01 ########################################################### 5) FAST ( LID = PRLM - Hungarian, DIALECTS=GMM-MMI) subsystem of the primary PPRLM from primary system - only Hungarian phoneme recognizer. The difference to the primary system is that the lattices are not expanded by LM models during testing, so that the 3-gram counts derived from lattices are based only on acoustic scores. REALTIME FACTOR: PRLM 0.3 DIALECTS see 4) ????? ########################################################### [1] Pavel Matejka, Petr Schwarz, Jan Cernocky, Pavel Chytil, "Phonotactic Language Identification using High Quality Phoneme Recognition", Eurospeech 2005. [2] J.L. Gauvain, A. Messaoudi, and H. Schwenk, "Language recognition using phoneme lattices", ICSLP 2004. [3] A. Stolcke, et al. , "The SRI March 2000 Hub-5 conversational speech transcription system", in Proceedings NIST Speech Transcription Workshop, 2000. [4] SpeechDat-E: http://www.fee.vutbr.cz/SPEECHDAT-E [5] P. Matejka, L. Burget, P. Schwarz, and J. Cernocky: "Use of anti-models to further improve state-of-the-art PRLM Language Recognition System", submitted to ICASSP 2006, Toulouse, France, 2006. [6] STK: http://www.fit.vutbr.cz/speech/sw/stk.html [7] P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. R. Deller Jr.: "Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features", in Proc. ICSLP, Denver, CO, September 2002. [8] L. Burget, P. Matejka, and J. Cernocky, "Discriminative training techniques for acoustic language identification", submitted to ICASSP 2006, Toulouse, France, 2006.