Conference paper

ZEINALI Hossein, BURGET Lukáš and ČERNOCKÝ Jan. Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge. In: Proceedings of DCASE 2018 Workshop. Surrey: Tampere University of Technology, 2018, pp. 1-5. ISBN 978-952-15-4262-6. Available from: http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Zeinali_149.pdf
Publication language:english
Original title:Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge
Title (cs):Konvoluční neuronová síť a X-vektor embedding pro DCASE2018 soutěž v klasifikaci akustického prostředí
Pages:1-5
Proceedings:Proceedings of DCASE 2018 Workshop
Conference:Detection and Classification of Acoustic Scenes and Events
Place:Surrey, GB
Year:2018
URL:http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Zeinali_149.pdf
ISBN:978-952-15-4262-6
Publisher:Tampere University of Technology
URL:http://www.fit.vutbr.cz/research/groups/speech/publi/2018/zeinali_dcase2018_149.pdf [PDF]
Keywords
Audio scene classification, Convolutional neural networks, Deep learning, x-vectors, Regularized LDA
Annotation
In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task1a).
BibTeX:
@INPROCEEDINGS{
   author = {Hossein Zeinali and Luk{\'{a}}{\v{s}} Burget and
	Jan {\v{C}}ernock{\'{y}}},
   title = {Convolutional Neural Networks and X-Vector
	Embedding for DCASE2018 Acoustic Scene
	Classification Challenge},
   pages = {1--5},
   booktitle = {Proceedings of DCASE 2018 Workshop},
   year = {2018},
   location = {Surrey, GB},
   publisher = {Tampere University of Technology},
   ISBN = {978-952-15-4262-6},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=11882}
}

Your IPv4 address: 3.92.28.84