An Improved Agglomerative Clustering Method
Omar Kettani and Faical Ramdani. An Improved Agglomerative Clustering Method. International Journal of Applied Information Systems 12(3):16-23, June 2017. URL, DOI BibTeX
@article{10.5120/ijais2017451689, author = "Omar Kettani and Faical Ramdani", title = "An Improved Agglomerative Clustering Method", journal = "International Journal of Applied Information Systems", issue_date = "June 2017", volume = 12, number = 3, month = "June", year = 2017, issn = "2249-0868", pages = "16-23", url = "http://www.ijais.org/archives/volume12/number3/988-2017451689", doi = "10.5120/ijais2017451689", publisher = "Foundation of Computer Science (FCS), NY, USA", address = "New York, USA" }
Abstract
Clustering is a common and useful exploratory task widely used in Data mining. Among the many existing clustering algorithms, the Agglomerative Clustering Method (ACM) introduced by the authors suffers from an obvious drawback: its sensitivity to data ordering. To overcome this issue, we propose in this paper to initialize the ACM by using the KKZ seed algorithm. The proposed approach (called KKZ_ACM) has a lower computational time complexity than the famous k-means algorithm. We evaluated its performance by applying on various benchmark datasets and compare with ACM, k-means++ and KKZ_ k-means. Our performance studies have demonstrated that the proposed approach is effective in producing consistent clustering results in term of average Silhouette index.
Reference
- Kettani, O. ; Ramdani, F. & Tadili, B. An Agglomerative Clustering Method for Large Data Sets.International Journal of Computer Applications 92(14):1-7, April 2014. DOI:10.5120/16074-4952
- I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
- Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
- Garey M.R., Johnson D.S. “Computers and Intractability: A Guide to the Theory of NP-Completeness”W. H. Freeman & Co. New York, NY, USA ©1979
- E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21 (1965) 768.
- MacQueen, J.B., 1967. Some Method for Classification and Analysis of Multivariate Observations, Proceeding of the Berkeley Symposium on Mathematical Statistics and Probability, (MSP’67), Berkeley, University of California Press, pp: 281-297.
- L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
- Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
- D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035.
- Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
Keywords
Clustering, k-means, k-means++, KKZ