Publication Details

Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

ZEINALI Hossein, BURGET Lukáš and ČERNOCKÝ Jan. Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge. In: Proceedings of DCASE 2018 Workshop. Surrey: Tampere University of Technology, 2018, pp. 1-5. ISBN 978-952-15-4262-6. Available from: http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Zeinali_149.pdf
Czech title
Konvoluční neuronová síť a X-vektor embedding pro DCASE2018 soutěž v klasifikaci akustického prostředí
Type
conference paper
Language
english
Authors
URL
Keywords

Audio scene classification, Convolutional neural networks, Deep learning, x-vectors, Regularized LDA

Abstract

In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task 1a).

Published
2018
Pages
1-5
Proceedings
Proceedings of DCASE 2018 Workshop
Conference
Detection and Classification of Acoustic Scenes and Events, Surrey, GB
ISBN
978-952-15-4262-6
Publisher
Tampere University of Technology
Place
Surrey, GB
BibTeX
@INPROCEEDINGS{FITPUB11882,
   author = "Hossein Zeinali and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge",
   pages = "1--5",
   booktitle = "Proceedings of DCASE 2018 Workshop",
   year = 2018,
   location = "Surrey, GB",
   publisher = "Tampere University of Technology",
   ISBN = "978-952-15-4262-6",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11882"
}
Back to top