CFP last date
17 June 2024
Reseach Article

A Fast Deterministic Kmeans Initialization

by Omar Kettani, Faical Ramdani
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 2
Year of Publication: 2017
Authors: Omar Kettani, Faical Ramdani
10.5120/ijais2017451683

Omar Kettani, Faical Ramdani . A Fast Deterministic Kmeans Initialization. International Journal of Applied Information Systems. 12, 2 ( May 2017), 6-11. DOI=10.5120/ijais2017451683

@article{ 10.5120/ijais2017451683,
author = { Omar Kettani, Faical Ramdani },
title = { A Fast Deterministic Kmeans Initialization },
journal = { International Journal of Applied Information Systems },
issue_date = { May 2017 },
volume = { 12 },
number = { 2 },
month = { May },
year = { 2017 },
issn = { 2249-0868 },
pages = { 6-11 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number2/984-2017451683/ },
doi = { 10.5120/ijais2017451683 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:07:56.282663+05:30
%A Omar Kettani
%A Faical Ramdani
%T A Fast Deterministic Kmeans Initialization
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 2
%P 6-11
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The k-means algorithm remains one of the most widely used clustering methods, in spite of its sensitivity to the initial settings. This paper explores a simple, computationally low, deterministic method which provides k-means with initial seeds to cluster a given data set. It is simply based on computing the means of k samples with equal parts taken from the given data set. We test and compare this method to the related well know kkz initialization algorithm for k-means, using both simulated and real data, and find it to be more efficient in many cases.

References
  1. Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of- squares clustering. Machine Learning, 75, 245 - 249 (2009).
  2. Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
  3. Peña J.M., Lozano J.A., Larrañaga P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10), 1027 - 1040 (1999).
  4. 4.. Forgy E., "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications". Biometrics, 21, 768 - 769 (1965).
  5. Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 - 1035 (2007).
  6. Bahmani B., Moseley B., Vattani A., Kumar R., Vassilvitskii S.:Scalable K-means++. In: Proceedings of the VLDB Endowment (2012).
  7. I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
  8. 8.Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
  9. L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
Index Terms

Computer Science
Information Sciences

Keywords

k-means initialization kkz