Google scholar arxiv informatics ads IJAIS publications are indexed with Google Scholar, NASA ADS, Informatics et. al.

Call for Paper

-

July Edition 2023

International Journal of Applied Information Systems solicits high quality original research papers for the July 2023 Edition of the journal. The last date of research paper submission is June 15, 2023.

A Fast Deterministic Kmeans Initialization

Omar Kettani, Faical Ramdani. Published in Information Sciences

International Journal of Applied Information Systems
Year of Publication: 2017
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Omar Kettani, Faical Ramdani
10.5120/ijais2017451683
Download full text
  1. Omar Kettani and Faical Ramdani. A Fast Deterministic Kmeans Initialization. International Journal of Applied Information Systems 12(2):6-11, May 2017. URL, DOI BibTeX

    @article{10.5120/ijais2017451683,
    	author = "Omar Kettani and Faical Ramdani",
    	title = "A Fast Deterministic Kmeans Initialization",
    	journal = "International Journal of Applied Information Systems",
    	issue_date = "May 2017",
    	volume = 12,
    	number = 2,
    	month = "May",
    	year = 2017,
    	issn = "2249-0868",
    	pages = "6-11",
    	url = "http://www.ijais.org/archives/volume12/number2/984-2017451683",
    	doi = "10.5120/ijais2017451683",
    	publisher = "Foundation of Computer Science (FCS), NY, USA",
    	address = "New York, USA"
    }
    

Abstract

The k-means algorithm remains one of the most widely used clustering methods, in spite of its sensitivity to the initial settings. This paper explores a simple, computationally low, deterministic method which provides k-means with initial seeds to cluster a given data set. It is simply based on computing the means of k samples with equal parts taken from the given data set. We test and compare this method to the related well know kkz initialization algorithm for k-means, using both simulated and real data, and find it to be more efficient in many cases.

Reference

  1. Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of- squares clustering. Machine Learning, 75, 245 - 249 (2009).
  2. Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
  3. Peña J.M., Lozano J.A., Larrañaga P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10), 1027 - 1040 (1999).
  4. 4.. Forgy E., "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications". Biometrics, 21, 768 - 769 (1965).
  5. Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 - 1035 (2007).
  6. Bahmani B., Moseley B., Vattani A., Kumar R., Vassilvitskii S.:Scalable K-means++. In: Proceedings of the VLDB Endowment (2012).
  7. I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
  8. 8.Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
  9. L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.

Keywords

k-means initialization, kkz