Exploration and Performance Analysis of Clustering Algorithms for Time-Series Data with Dimension Reduction

Lingstad, Aleksander Markus; Hansen, Daniel Nguyen; Suvatne, Marius

Lingstad, Aleksander Markus; Hansen, Daniel Nguyen; Suvatne, Marius

Master thesis

Åpne

no.uia:inspera:106884834:23178813.pdf (33.82Mb)

Permanent lenke

https://hdl.handle.net/11250/3020370

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Master's theses in Information and Communication Technology [508]

Sammendrag

Clustering is an attempt to form groups of similar objects, and it is a powerful tool for

discovering valuable underlying patterns in the data. When clustering on high dimensional

data, the algorithms can suffer from the curse of dimensionality. This is a problem that

occurs when data becomes sparse due to many dimensions, and can lead to poor clustering

performance. Dimensionality reduction methods (DRMs) are thus designed to help alleviate this issue. For a time-series that is a temporal set of points, each consecutive point

in time can be considered a dimension and therefore it belongs to high dimensional data.

Time-Series K-Means (TSK-Means) with Dynamic Time Warping (DTW) is an algorithm

that has been proven successful for clustering time-series. However, TSK-Means is computationally complex and might require substantial training time due to the potentially high

dimensionality of time-series.

This thesis studies the clustering of time-series data, provided by temperature sensors installed in refrigerators, trying to make it less computationally complex by the use of the

DRMs Principal Component Analysis (PCA), Time-Series Autoencoder (TSA), and SelfOrganizing Maps (SOM). We utilize these methods in combination with three clustering algorithms, namely, K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Agglomerative Hierarchical Clustering (AHC), to potentially find valuable patterns in the provided data. The clusters and patterns were evaluated on a theoretical and

practical level regarding the application of pattern recognition and detection in the domain

of refrigerator temperature monitoring and logging. This is an effort to improve refrigerator

maintenance and quality assurance, deviation management, and to potentially reduce food

loss.

The results indicate that TSK-Means outperforms any other combination of DRMs and

clustering algorithms when it comes to detecting patterns in the data, despite being more

computationally complex. Regardless, the use of DRMs simplified the clustering process of

time-series, and allowed the K-Means algorithm to detect patterns more efficiently than the

TSK-Means algorithm. The clusters and patterns that were discovered seem promising for

the application of deviation management and refrigerator quality assurance.

Utgiver

University of Agder