Data mining K-clustering problem
MetadataVis full innførsel
In statistic and data mining, k-means clustering is well known for its efficiency in clustering large data sets. The aim is to group data points into clusters such that similar items are lumped together in the same cluster. In general, given a set of objects together with their attributes, the goal is to divide the objects into k clusters such that objects lying in one cluster should be as close as possible to each other’s (homogeneity) and objects lying in different clusters are further apart from each other. However, there exist some flaws in classical K-means clustering algorithm. According to the method, first, the algorithm is sensitive to selecting initial Centroid and can be easily trapped at a local minimum regarding to the measurement (the sum of squared errors) used in the model. And on the other hand, the K-means problem in terms of finding a global minimal sum of the squared errors is NP-hard even when the number of the cluster is equal 2 or the number of attribute for data point is 2, so finding the optimal clustering is believed to be computationally intractable. In this dissertation, to solving the k-means clustering problem, we provide designing a Variant Types of K-means in a Multilevel Context, which in this algorithm we consider the issue of how to derive an optimization model to the minimum sum of squared errors for a given data set. We introduce the variant type of k-means algorithm to guarantee the result of clustering is more accurate than clustering by basic k-means algorithms. We believe this is one type of k-means clustering algorithm that combines theoretical guarantees with positive experimental results.
Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2012 – Universitetet i Agder, Grimstad