Publication Details

Temporal processing for feature extraction in speech recognition, shortened version of habilitation thesis

ČERNOCKÝ Jan. Temporal processing for feature extraction in speech recognition. Vědecké spisy VUT. Edice Habilitační a inaugurační spisy, sv. 112. Brno: Publishing house of Brno University of Technology VUTIUM, 2003, pp. 1-30. ISBN 80-214-2395-1.
Czech title
Časové zpracování pro výpočet příznaků v rozpoznávání řeči
Type
book chapter
Language
english
Authors
URL
Keywords

automatic speech processing, speech recognition, features for speech recognition, temporal filtering, neural networks, data-driven techniques

Abstract

Temporal processing for feature extraction in speech recognition

Annotation

Speech recognition is a booming research field, having large number of applications in telecommunications (especially mobile), automobile industry, consumer electronics, military and security, etc. Speech recognition systems are classically built from three basic blocks: feature extraction, acoustic matching and language modeling. While the last two are trained on data (annotated databases for acoustics and large speech corpora for the LM), feature extraction block is often neglected and most often, mel-frequency cepstral coefficients (MFCC) are used. This work concentrates on two techniques that should improve the feature extraction. The first one is temporal filtering of feature trajectories using filters designed on data using Linear Discriminant Analysis (LDA). This technique is shown to improve the recognition accuracy of isolated Czech words, confirming previous results on US-English obtained by our colleagues from OGI Portland. The second part of the work concentrates on more revolutionary approach of feature extraction using TRAPs (temporal patterns) whose fundamentals were also laid at OGI. Several experiments were conducted on three databases during author's stay at OGI. Although we have shown that TRAPs are comparable to MFCC's only on a small vocabulary recognition task, we believe that combination of frequency-band processing and neural nets will become very important in the next decade, and that they will become standard blocks of feature extraction.

Published
2003
Pages
1-30
Book
Vědecké spisy VUT
Series
Edice Habilitační a inaugurační spisy, sv. 112
ISBN
80-214-2395-1
Publisher
Publishing house of Brno University of Technology VUTIUM
Place
Brno, CZ
BibTeX
@INBOOK{FITPUB7240,
   author = "Jan \v{C}ernock\'{y}",
   title = "Temporal processing for feature extraction in speech  recognition, shortened version of habilitation thesis",
   pages = "1--30",
   booktitle = "V\v{e}deck\'{e} spisy VUT",
   series = "Edice Habilita\v{c}n\'{i} a inaugura\v{c}n\'{i} spisy, sv. 112",
   year = 2003,
   location = "Brno, CZ",
   publisher = "Publishing house of Brno University of Technology VUTIUM",
   ISBN = "80-214-2395-1",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/7240"
}
Back to top