Comparison of initialization methods of K-means clustering for small data

Detta är en Kandidat-uppsats från Uppsala universitet/Statistiska institutionen

Sammanfattning: Clustering of observations into groups arises as a fundamental challenge both in academia and industry. Many clustering algorithms exist, and the most widely used clustering algorithm, the K-means, notably suffers from sensitivity to initial allocation of cluster centers. Moreover, many heuristics and algorithms have been developed to find the best initial allocation, and this experimental study compares methods of initialization by measuring how well the initialization methods perform on simulated, small datasets, through various performance criterion. The results show that using the output clusters of a Hierarchical clustering is the best initialization method. Moreover, the most popular methods, Random partitioning and KMeans++, perform poorly. Although the experimental setup may favour some initialization methods over others, the applied researchers are recommended to perform a Hierarchical clustering as an initialization of the K-means algorithm.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)