Exploration and Performance Analysis of Clustering Algorithms for Time-Series Data with Dimension Reduction

Lingstad, Aleksander Markus; Hansen, Daniel Nguyen; Suvatne, Marius

dc.contributor.advisor	Jiao, Lei
dc.contributor.advisor	Omslandseter, Rebekka Olsson
dc.contributor.author	Lingstad, Aleksander Markus
dc.contributor.author	Hansen, Daniel Nguyen
dc.contributor.author	Suvatne, Marius
dc.date.accessioned	2022-09-21T16:24:37Z
dc.date.available	2022-09-21T16:24:37Z
dc.date.issued	2022
dc.identifier	no.uia:inspera:106884834:23178813
dc.identifier.uri	https://hdl.handle.net/11250/3020370
dc.description.abstract	Clustering is an attempt to form groups of similar objects, and it is a powerful tool for discovering valuable underlying patterns in the data. When clustering on high dimensional data, the algorithms can suffer from the curse of dimensionality. This is a problem that occurs when data becomes sparse due to many dimensions, and can lead to poor clustering performance. Dimensionality reduction methods (DRMs) are thus designed to help alleviate this issue. For a time-series that is a temporal set of points, each consecutive point in time can be considered a dimension and therefore it belongs to high dimensional data. Time-Series K-Means (TSK-Means) with Dynamic Time Warping (DTW) is an algorithm that has been proven successful for clustering time-series. However, TSK-Means is computationally complex and might require substantial training time due to the potentially high dimensionality of time-series. This thesis studies the clustering of time-series data, provided by temperature sensors installed in refrigerators, trying to make it less computationally complex by the use of the DRMs Principal Component Analysis (PCA), Time-Series Autoencoder (TSA), and SelfOrganizing Maps (SOM). We utilize these methods in combination with three clustering algorithms, namely, K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Agglomerative Hierarchical Clustering (AHC), to potentially find valuable patterns in the provided data. The clusters and patterns were evaluated on a theoretical and practical level regarding the application of pattern recognition and detection in the domain of refrigerator temperature monitoring and logging. This is an effort to improve refrigerator maintenance and quality assurance, deviation management, and to potentially reduce food loss. The results indicate that TSK-Means outperforms any other combination of DRMs and clustering algorithms when it comes to detecting patterns in the data, despite being more computationally complex. Regardless, the use of DRMs simplified the clustering process of time-series, and allowed the K-Means algorithm to detect patterns more efficiently than the TSK-Means algorithm. The clusters and patterns that were discovered seem promising for the application of deviation management and refrigerator quality assurance.
dc.description.abstract
dc.language
dc.publisher	University of Agder
dc.title	Exploration and Performance Analysis of Clustering Algorithms for Time-Series Data with Dimension Reduction
dc.type	Master thesis

Files in this item

Name:: no.uia:inspera:106884834:23178 ...
Size:: 33.82Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Master's theses in Information and Communication Technology [508]
MM500, IKT590, IKT591

Show simple item record