###########################################################
#           Brno University of Technology                 #
#                  Czech Republic                         #
###########################################################
#         Language Recognition Evaluation 2005            #
###########################################################


General comments: 

- Hardware for REAL-TIME FACTORS: P4 2.4GHz, RAM 500MB.

- Realtime factors for dialect recognition are an additional
  computation cost to LID. 

- The language scores for all combined systems were merged by
  linear merging. 

###########################################################

1) PRIMARY SYSTEM:

The Primary system was tuned to the primary condition.
This system consists of two subsystems: PPRLM and GMM

PPRLM

The PPRLM consists of three PRLM systems with hybrid HMM/ANN
recognizers trained on Hungarian, Russian and Czech languages from
SpeechDat-E databases [4]. In phoneme recognizers, the split temporal
context approach with 3 neural nets was used [1]. The phoneme
recognizers first generate phoneneme lattices using only acoustic
scores, then these lattices are expanded by the corresponding language
models (LM) as in [2]. 

The language modeling is done by 3-gram backoff models with WittenBell
discounting. The 3-gram counts are estimated from lattices and
weighted by posteriors derived from lattices. Counts for testing are
derived from lattices too. 

Two language models are used for each language: the target language
model - LM trained on target language, and an anti-model to the target
language, trained on non-target data weighted by posteriors of the
data producing false alarms (inspired by [3], precise description
in [6]). The final score is linear combination of log LM score and
log anti-LM score. 

Target language models were trained on the Callfriend Corpus. The
Hindi-English was trained on the Foreign Accented English (FAE)
database from OGI. Anti-language model were trained on Callfriend
Corpus, FAE, OGI multilanguage and on OGI 22 languages. In the
anti-model training, only the segments unseen during the training of
target LMs were used. 

GMM

GMM with 256 Gaussian mixtures per language trained was trained by MMI
criterion (Maximum Mutual Information) [8] using STK toolkit [6]. The
features are shifted delta cepstra [7] (7,1,3,7) derived from 7
cepstral coefficient and C0 +  direct cepstral coefficients (together
56 dimensions). The VTLN was applied.

GMMs were trained on the Callfriend Corpus;  Hindi and Tamil English
from the FAE, OGI multilanguage and OGI 22 languages. 

REALTIME FACTOR: 
[3xPRLM(lattice generation,lattice expansion) + GMM(VTLN,GMM)]
    3x(0.28+0.4) + (0.1+0.1) = 2.25

DIALECT MAN: 3xPRLM+GMM trained on Callfriend.
DIALECT ENG: 3xPRLM+GMM trained on Callfriend, Hindi and Tamil from FAE.

NOTE: This system is not designed to reject languages.

###########################################################

2) SECONDARY SYSTEM:

Similar as primary system but rejects out-set languages. 6 background
models are trained on non-target languages from Callfriend, OGI
multilanguage and OGI 22 languages. The best score out of the 6 is
taken as the background model score. Background models have their
anti-models, too.

PPRLM + GMM:

Trained on Callfriend Corpus, Foreign Accented English database, OGI
multilanguage and OGI 22 languages databases. 

REALTIME FACTOR: 
[3xPRLM(lattice generation,lattice expansion) + GMM(VTLN,GMM)]
 3x(0.28+0.7) + (0.1+0.1) = 3.2

DIALECTS: the same as in primary system

NOTE: In case the system detects that the segment was spoken in an
out-of-set language, "False" is assigned to all languages. 

###########################################################

3) PPRLM

Subsystem of the primary system

REALTIME FACTOR: 
[3xPRLM(lattice generation + lattice expansion)] 
3x(0.28 +0.4) = 2.1

DIALECT MAN:
- PPRLM, LM trained on Callfriend.
- REALTIME FACTOR: 
  [3xPRLM(lattice expansion)] 
  3x(0.11)=0.3

DIALECT ENG:
- PPRLM, LM trained on Callfriend, Hindi and Tamil from FAE.
- REALTIME FACTOR: 
  [3xPRLM(lattice expansion)] 
  3x(0.11)=0.3


###########################################################

4) GMM256/MMI

Subsystem of the primary system

REALTIME FACTOR: 
GMM(segmentation,VTLN,GMM)  
0.25+0.1+0.1=0.45

DIALECT MAN:
- GMM256/MMI trained on Callfriend.
- REALTIME FACTOR: 
  GMM =  0.02

DIALECT ENG:
- GMM128/MMI trained on Callfriend, Hindi and Tamil from FAE.
- REALTIME FACTOR: 
  GMM =  0.01


###########################################################

5) FAST ( LID = PRLM - Hungarian, DIALECTS=GMM-MMI)

subsystem of the primary PPRLM from primary system - only Hungarian
phoneme recognizer. The difference to the primary system is that the
lattices are not expanded by LM models during testing, so that the
3-gram counts derived from lattices are based only on acoustic
scores. 

REALTIME FACTOR: 
PRLM 0.3

DIALECTS see 4) ????? 

###########################################################

[1] Pavel Matejka, Petr  Schwarz, Jan  Cernocky, Pavel  Chytil,
"Phonotactic Language Identification using High Quality Phoneme
Recognition", Eurospeech 2005. 

[2] J.L. Gauvain, A. Messaoudi, and H. Schwenk, "Language recognition
using phoneme lattices", ICSLP 2004. 

[3] A. Stolcke, et al. , "The SRI March 2000 Hub-5 conversational
speech transcription system", in Proceedings NIST Speech Transcription
Workshop, 2000.

[4] SpeechDat-E: http://www.fee.vutbr.cz/SPEECHDAT-E

[5] P. Matejka, L. Burget, P. Schwarz, and J. Cernocky: "Use of
anti-models to further improve state-of-the-art PRLM Language
Recognition System", submitted to ICASSP 2006, Toulouse, France,
2006. 

[6] STK: http://www.fit.vutbr.cz/speech/sw/stk.html

[7] P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, 
D. A.  Reynolds, and J. R.  Deller Jr.: "Approaches to Language
Identification Using Gaussian Mixture Models and Shifted Delta
Cepstral Features", in Proc. ICSLP, Denver, CO, September 2002.

[8] L. Burget, P. Matejka, and J. Cernocky, "Discriminative training
techniques for acoustic language identification", submitted to ICASSP  
2006, Toulouse, France, 2006.