A Fast Deterministic Kmeans Initialization
Omar Kettani and Faical Ramdani. A Fast Deterministic Kmeans Initialization. International Journal of Applied Information Systems 12(2):6-11, May 2017. URL, DOI BibTeX
@article{10.5120/ijais2017451683, author = "Omar Kettani and Faical Ramdani", title = "A Fast Deterministic Kmeans Initialization", journal = "International Journal of Applied Information Systems", issue_date = "May 2017", volume = 12, number = 2, month = "May", year = 2017, issn = "2249-0868", pages = "6-11", url = "http://www.ijais.org/archives/volume12/number2/984-2017451683", doi = "10.5120/ijais2017451683", publisher = "Foundation of Computer Science (FCS), NY, USA", address = "New York, USA" }
Abstract
The k-means algorithm remains one of the most widely used clustering methods, in spite of its sensitivity to the initial settings. This paper explores a simple, computationally low, deterministic method which provides k-means with initial seeds to cluster a given data set. It is simply based on computing the means of k samples with equal parts taken from the given data set. We test and compare this method to the related well know kkz initialization algorithm for k-means, using both simulated and real data, and find it to be more efficient in many cases.
Reference
- Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of- squares clustering. Machine Learning, 75, 245 - 249 (2009).
- Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
- Peña J.M., Lozano J.A., Larrañaga P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10), 1027 - 1040 (1999).
- 4.. Forgy E., "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications". Biometrics, 21, 768 - 769 (1965).
- Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 - 1035 (2007).
- Bahmani B., Moseley B., Vattani A., Kumar R., Vassilvitskii S.:Scalable K-means++. In: Proceedings of the VLDB Endowment (2012).
- I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
- 8.Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
- L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
Keywords
k-means initialization, kkz