Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Sharma, Jivitesh; Granmo, Ole-Christoffer; Goodwin, Morten

dc.contributor.author	Sharma, Jivitesh
dc.contributor.author	Granmo, Ole-Christoffer
dc.contributor.author	Goodwin, Morten
dc.date.accessioned	2023-03-07T13:50:48Z
dc.date.available	2023-03-07T13:50:48Z
dc.date.created	2021-01-07T00:33:35Z
dc.date.issued	2020
dc.identifier.citation	Sharma, J., Granmo, O-C. & Goodwin, M. (2020). Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network. Interspeech, 2020, 1186-1190.	en_US
dc.identifier.issn	2308-457X
dc.identifier.uri	https://hdl.handle.net/11250/3056508
dc.description	Author's accepted manuscript	en_US
dc.description.abstract	In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform CQT) and Chromagram. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use attention odules that perform channel and spatial attention together. We use the mix-up data augmentation technique to further boost performance. Our model is able to achieve state-of-the-art performance on three enchmark environment sound classification datasets, i.e. the UrbanSound8K (97.52%), ESC-10 (94.75%) and ESC-50 (87.45%).	en_US
dc.language.iso	eng	en_US
dc.publisher	International Speech Communication Association	en_US
dc.title	Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© 2020 ISCA	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	1186-1190	en_US
dc.source.volume	2020	en_US
dc.source.journal	Interspeech	en_US
dc.identifier.doi	https://doi.org/10.21437/Interspeech.2020-1303
dc.identifier.cristin	1866678
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Article.pdf
Størrelse:: 572.3Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel