Článek ve sborníku konference

ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, KINOSHITA Keisuke, HIGUCHI Takuya, OGAWA Atsunori a NAKATANI Tomohiro. Speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In: Proceedings of Interspeech 2017. Stocholm: International Speech Communication Association, 2017, s. 2655-2659. ISSN 1990-9772. Dostupné z: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0667.PDF
Jazyk publikace:angličtina
Název publikace:Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
Název (cs):Směrovač paprsku založený na neuronové síti poučené o řečníkovi pro extrakci řečníka ze směsi řečových signálů
Strany:2655-2659
Sborník:Proceedings of Interspeech 2017
Konference:Interspeech 2017
Místo vydání:Stocholm, SE
Rok:2017
URL:http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0667.PDF
Časopis:Proceedings of Interspeech, roč. 2017, č. 08, FR
ISSN:1990-9772
DOI:10.21437/Interspeech.2017-667
Vydavatel:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_interspeech2017_IS170667.pdf [PDF]
Klíčová slova
speaker extraction, speaker-aware neural network, beamforming, mask estimation
Anotace
Článek pojednává o směrovači paprsku, založeném na neuronové síti, která je poučená o řečníkovi pro extrakci řečníka ze směsi řečových signálů.
Abstrakt
In this work, we address the problem of extracting one target speaker from a multichannel mixture of speech. We use a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise. To overcome the permutation ambiguity of neural network mask estimation, which arises in presence of multiple speakers, we propose to inform the neural network about the target speaker so that it learns to follow the speaker characteristics through the utterance. We investigate and compare different methods of passing the speaker information to the network such as making one layer of the network dependent on speaker characteristics. Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.
BibTeX:
@INPROCEEDINGS{
   author = {Kate{\v{r}}ina {\v{Z}}mol{\'{i}}kov{\'{a}} and
	Marc Delcroix and Keisuke Kinoshita and Takuya
	Higuchi and Atsunori Ogawa and Tomohiro Nakatani},
   title = {Speaker-aware neural network based beamformer for
	speaker extraction in speech mixtures},
   pages = {2655--2659},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = 2017,
 number = 08,
   year = 2017,
   location = {Stocholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-667},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.cs.iso-8859-2?id=11587}
}

Vaše IPv4 adresa: 3.83.192.109
Přepnout na https