| Kockmann, M., Burget, L., Černocký, J.: Application of speaker- and language identification state-of-the-art techniques for emotion recognition, In: Speech Communication, roč. 53, č. 9, 2011, Amsterdam, NL, s. 1172-1185, ISSN 0167-6393 | | Jazyk publikace: | angličtina |
|---|
| Název publikace: | Application of speaker- and language identification state-of-the-art techniques for emotion recognition |
|---|
| Název (cs): | Použití aktuálních technik pro identifikaci řečníka a jazyka v rozpoznávání emocí |
|---|
| Strany: | 1172-1185 |
|---|
| Kniha: | Speech Communication |
|---|
| Místo vydání: | NL |
|---|
| Rok: | 2011 |
|---|
| URL: | http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271578&_user=640830&_pii=S0167639311000082&_check=y&_origin=search&_zone=rslt_list_item&_coverDate=2011-12-31&wchp=dGLbVlS-zSkWz&md5=2a79c3d171cd13a3689408115666e2ef/1-s2.0-S0167639311000082-main |
|---|
| Časopis: | Speech Communication, roč. 53, č. 9, Amsterdam, NL |
|---|
| ISSN: | 0167-6393 |
|---|
| Vydavatel: | Elsevier Science |
|---|
| URL: | http://www.fit.vutbr.cz/research/groups/speech/publi/2011/kockman_article_speech%20communication_53_elsevier2011.pdf [PDF] |
|---|
| Klíčová slova |
|---|
| Emotion recognition; Gaussian mixture models; Maximum-mutual-information; Intersession variability compensation; Score-level fusion |
| Anotace |
|---|
Autoři tohoto článku ukazují, že získávání znaků a metody statistického modelování, které jsou obvykle používány v rozpoznávání mluvčího a jazyka mohou být s úspěchem použity i pro rozpoznávání emocí.
|
| Abstrakt |
|---|
| This article describes our efforts of transferring feature extraction and statistical modeling techniques from the fields of speaker and
language identification to the related field of emotion recognition. We give detailed insight to our acoustic and prosodic feature extraction
and show how to apply Gaussian Mixture Modeling techniques on top of it. We focus on different flavors of Gaussian Mixture
Models (GMMs), including more sophisticated approaches like discriminative training using Maximum-Mutual-Information (MMI) criterion
and InterSession Variability (ISV) compensation. Both techniques show superior performance in language and speaker identification.
Furthermore, we combine multiple system outputs by score-level fusion to exploit the complementary information in diverse
systems. Our proposal is evaluated with several experiments on the FAU Aibo Emotion Corpus containing non-acted spontaneous emotional
speech. Within the Interspeech 2009 Emotion Challenge we could achieve the best results for the 5-class task of the Open Performance
Sub-Challenge with an unweighted average recall of 41.7%. Further additional experiments on the acted Berlin Database of
Emotional Speech show the capability of intersession variability compensation for emotion recognition. |
| BibTeX: |
|---|
@ARTICLE{
author = {Marcel Kockmann and Lukáš Burget and Jan Černocký},
title = {Application of speaker- and language identification
state-of-the-art techniques for emotion recognition},
pages = {1172--1185},
booktitle = {Speech Communication},
journal = {Speech Communication},
volume = {53},
number = {9},
year = {2011},
publisher = {Elsevier Science},
ISSN = {0167-6393},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=9676}
} |
|