CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection

Sharif, Md Haidar; Lei, Jiao; Omlin, Christian Walter Peter

dc.contributor.author	Sharif, Md Haidar
dc.contributor.author	Lei, Jiao
dc.contributor.author	Omlin, Christian Walter Peter
dc.date.accessioned	2023-11-03T08:50:11Z
dc.date.available	2023-11-03T08:50:11Z
dc.date.created	2023-10-26T09:45:48Z
dc.date.issued	2023
dc.identifier.citation	Sharif, M. H., Lei, J. & Omlin, C. W. P. (2023). CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection. Sensors, 23 (18).	en_US
dc.identifier.issn	1424-8220
dc.identifier.uri	https://hdl.handle.net/11250/3100420
dc.description.abstract	Video anomaly event detection (VAED) is one of the key technologies in computer vision for smart surveillance systems. With the advent of deep learning, contemporary advances in VAED have achieved substantial success. Recently, weakly supervised VAED (WVAED) has become a popular VAED technical route of research. WVAED methods do not depend on a supplementary self-supervised substitute task, yet they can assess anomaly scores straightway. However, the performance of WVAED methods depends on pretrained feature extractors. In this paper, we first address taking advantage of two pretrained feature extractors for CNN (e.g., C3D and I3D) and ViT (e.g., CLIP), for effectively extracting discerning representations. We then consider long-range and short-range temporal dependencies and put forward video snippets of interest by leveraging our proposed temporal self-attention network (TSAN). We design a multiple instance learning (MIL)-based generalized architecture named CNN-ViT-TSAN, by using CNN- and/or ViT-extracted features and TSAN to specify a series of models for the WVAED problem. Experimental results on publicly available popular crowd datasets demonstrated the effectiveness of our CNN-ViT-TSAN.	en_US
dc.language.iso	eng	en_US
dc.publisher	MDPI	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection	en_US
dc.title.alternative	CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2023 The Author(s)	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.volume	23	en_US
dc.source.journal	Sensors	en_US
dc.source.issue	18	en_US
dc.identifier.doi	https://doi.org/10.3390/s23187734
dc.identifier.cristin	2188641
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Article.pdf
Størrelse:: 1.564Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal