Článek ve sborníku konference

KOMBRINK Stefan, HANNEMANN Mirko, BURGET Lukáš a HEŘMANSKÝ Hynek. Recovery of Rare Words in Lecture Speech. In: Proc. Text, Speech and Dialogue 2010. Brno: Springer Verlag, 2010, s. 330-337. ISBN 978-3-642-15759-2. ISSN 0302-9743.
Jazyk publikace:angličtina
Název publikace:Recovery of Rare Words in Lecture Speech
Název (cs):Obnova řídkých slov v rozpoznávání řeči z přednášek
Strany:330-337
Sborník:Proc. Text, Speech and Dialogue 2010
Konference:13th International Conference on Text, Speech and Dialogue, TSD 2010
Místo vydání:Brno, CZ
Rok:2010
ISBN:978-3-642-15759-2
Časopis:Lecture Notes in Computer Science, roč. 2010, č. 9, DE
ISSN:0302-9743
Vydavatel:Springer Verlag
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2010/kombrink_TSD_2010_330.pdf [PDF]
Klíčová slova
speech, rare words, recognizer, detect OOV words, sub-words, lectures
Anotace
Článek pojednává o obnově řídkých slov v rozpoznávání řeči z přednášek. Používáme hybridní word/sub-word rozpoznávač ke zjištění OOV slov, která se objevují v anglických hovorech.
Abstrakt
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The first type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words.We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detections corresponding to recovered IV words, we were able to increase the precision of the OOV detection substantially
BibTeX:
@INPROCEEDINGS{
   author = {Stefan Kombrink and Mirko Hannemann and Lukáš Burget and
	Hynek Heřmanský},
   title = {Recovery of Rare Words in Lecture Speech},
   pages = {330--337},
   booktitle = {Proc. Text, Speech and Dialogue 2010},
   journal = {Lecture Notes in Computer Science},
   volume = {2010},
   number = {9},
   year = {2010},
   location = {Brno, CZ},
   publisher = {Springer Verlag},
   ISBN = {978-3-642-15759-2},
   ISSN = {0302-9743},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.cs?id=9323}
}

Vaše IPv4 adresa: 54.205.241.107
Přepnout na IPv6 spojení

DNSSEC [dnssec]