Článek ve sborníku konference

HIGUCHI Takuya, KINOSHITA Keisuke, DELCROIX Marc, ŽMOLÍKOVÁ Kateřina a NAKATANI Tomohiro. Deep clustering-based beamforming for separation with unknown number of sources. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, s. 1183-1187. ISSN 1990-9772. Dostupné z: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0721.PDF
Jazyk publikace:angličtina
Název publikace:Deep clustering-based beamforming for separation with unknown number of sources
Název (cs):Směrování paprsku založené na hlubokém shlukování s neznámým počtem zdrojů
Strany:1183-1187
Sborník:Proceedings of Interspeech 2017
Konference:Interspeech 2017
Místo vydání:Stockholm, SE
Rok:2017
URL:http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0721.PDF
Časopis:Proceedings of Interspeech, roč. 2017, č. 08, FR
ISSN:1990-9772
DOI:10.21437/Interspeech.2017-721
Vydavatel:International Speech Communication Association
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2017/higuchi_interspeech2017_IS170721.pdf [PDF]
Klíčová slova
source separation, source counting, timefrequency masking, beamforming
Anotace
Tento článek pojednává o směrování paprsku založeném na hlubokém shlukování s neznámým počtem zdrojů.
Abstrakt
This paper extends a deep clustering algorithm for use with
time-frequency masking-based beamforming and perform separation
with an unknown number of sources. Deep clustering
is a recently proposed single-channel source separation algorithm,
which projects inputs into the embedding space and
performs clustering in the embedding domain. In deep clustering,
bi-directional long short-term memory (BLSTM) recurrent
neural networks are trained to make embedding vectors
orthogonal for different speakers and concurrent for the same
speaker. Then, by clustering the embedding vectors at test time,
we can estimate time-frequency masks for separation. In this
paper, we extend the deep clustering algorithm to a multiple
microphone setup and incorporate deep clustering-based timefrequency
mask estimation into masking-based beamforming,
which has been shown to be more effective than masking for
automatic speech recognition. Moreover, we perform source
counting by computing the rank of the covariance matrix of the
embedding vectors. With our proposed approach, we can perform
masking-based beamforming in a multiple-speaker case
without knowing the number of speakers. Experimental results
show that our proposed deep clustering-based beamformer
achieves comparable source separation performance to that obtained
with a complex Gaussian mixture model-based beamformer,
which requires the number of sources in advance for
mask estimation.
BibTeX:
@INPROCEEDINGS{
   author = {Takuya Higuchi and Keisuke Kinoshita and Marc
	Delcroix and Kate{\v{r}}ina
	{\v{Z}}mol{\'{i}}kov{\'{a}} and Tomohiro Nakatani},
   title = {Deep clustering-based beamforming for separation
	with unknown number of sources},
   pages = {1183--1187},
   booktitle = {Proceedings of Interspeech 2017},
   journal = {Proceedings of Interspeech},
   volume = {2017},
   number = {08},
   year = {2017},
   location = {Stockholm, SE},
   publisher = {International Speech Communication Association},
   ISSN = {1990-9772},
   doi = {10.21437/Interspeech.2017-721},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.cs.iso-8859-2?id=11586}
}

Vaše IPv4 adresa: 3.91.79.74
Přepnout na https

DNSSEC [dnssec]