CFP last date
15 October 2024
Reseach Article

An Improved Agglomerative Clustering Method

by Omar Kettani, Faical Ramdani
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 3
Year of Publication: 2017
Authors: Omar Kettani, Faical Ramdani
10.5120/ijais2017451689

Omar Kettani, Faical Ramdani . An Improved Agglomerative Clustering Method. International Journal of Applied Information Systems. 12, 3 ( June 2017), 16-23. DOI=10.5120/ijais2017451689

@article{ 10.5120/ijais2017451689,
author = { Omar Kettani, Faical Ramdani },
title = { An Improved Agglomerative Clustering Method },
journal = { International Journal of Applied Information Systems },
issue_date = { June 2017 },
volume = { 12 },
number = { 3 },
month = { June },
year = { 2017 },
issn = { 2249-0868 },
pages = { 16-23 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number3/989-2017451689/ },
doi = { 10.5120/ijais2017451689 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:07:58.339252+05:30
%A Omar Kettani
%A Faical Ramdani
%T An Improved Agglomerative Clustering Method
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 3
%P 16-23
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is a common and useful exploratory task widely used in Data mining. Among the many existing clustering algorithms, the Agglomerative Clustering Method (ACM) introduced by the authors suffers from an obvious drawback: its sensitivity to data ordering. To overcome this issue, we propose in this paper to initialize the ACM by using the KKZ seed algorithm. The proposed approach (called KKZ_ACM) has a lower computational time complexity than the famous k-means algorithm. We evaluated its performance by applying on various benchmark datasets and compare with ACM, k-means++ and KKZ_ k-means. Our performance studies have demonstrated that the proposed approach is effective in producing consistent clustering results in term of average Silhouette index.

References
  1. Kettani, O. ; Ramdani, F. & Tadili, B. An Agglomerative Clustering Method for Large Data Sets.International Journal of Computer Applications 92(14):1-7, April 2014. DOI:10.5120/16074-4952
  2. I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
  3. Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
  4. Garey M.R., Johnson D.S. “Computers and Intractability: A Guide to the Theory of NP-Completeness”W. H. Freeman & Co. New York, NY, USA ©1979
  5. E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21 (1965) 768.
  6. MacQueen, J.B., 1967. Some Method for Classification and Analysis of Multivariate Observations, Proceeding of the Berkeley Symposium on Mathematical Statistics and Probability, (MSP’67), Berkeley, University of California Press, pp: 281-297.
  7. L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
  8. Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
  9. D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035.
  10. Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering k-means k-means++ KKZ