# Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

**Year of Publication:**2014

Ajiboye Adeleke R., Isah-kebbe Hauwau and Oladele Tinuke O.. Article: Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms.

*International Journal of Applied Information Systems*7(7):21-26, August 2014. BibTeX@article{key:article, author = "Ajiboye Adeleke R. and Isah-kebbe Hauwau and Oladele Tinuke O.", title = "Article: Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms", journal = "International Journal of Applied Information Systems", year = 2014, volume = 7, number = 7, pages = "21-26", month = "August", note = "Published by Foundation of Computer Science, New York, USA" }

### Abstract

Exploring the dataset features through the application of clustering algorithms is a viable means by which the conceptual description of such data can be revealed for better understanding, grouping and decision making. Some clustering algorithms, especially those that are partitioned-based, clusters any data presented to them even if similar features do not present. This study explores the performance accuracies of partitioning-based algorithms and probabilistic model-based algorithm. Experiments were conducted using k-means, k-medoids and EM-algorithm. The study implements each algorithm using RapidMiner Software and the results generated was validated for correctness in accordance to the concept of external criteria method. The clusters formed revealed the capability and drawbacks of each algorithm on the data points.

### Reference

- D. Napoleon and P. G. Lakshmi, "An efficient K-Means clustering algorithm for reducing time complexity using uniform distribution data points," in Trendz in Information Sciences & Computing (TISC), 2010, pp. 42-45.
- S. C. Suh, Practical Applications of Data Mining: Jones & Barlett Learning, LLC 2012.
- P. Berkhin, "A survey of clustering data mining techniques," in Grouping multidimensional data, ed: Springer, 2006, pp. 25-71.
- B. Mirkin, Clustering: A Data Recovery Approach: CRC Press, 2012.
- G. M. Daiyan, F. Abid, M. Khan, and A. H. Tareq, "An efficient grid algorithm for faster clustering using K medoids approach," in Computer and Information Technology (ICCIT), 2012 15th International Conference on, 2012, pp. 1-3.
- J. Han, M. Kamber, and J. Pei, DATA MINING Concepts and Techniques: Morgan Kaufmann, 3rd Edition, 2012.
- C. -H. Lin, C. -C. Chen, H. -L. Lee, and J. -R. Liao, "Fast K-means algorithm based on a level histogram for image retrieval," Expert Systems with Applications, vol. 41, pp. 3276-3283, 2014.
- Z. Huang, "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge Discovery, vol. 2, pp. 283-304, 1998.
- R. Forsati, M. Mahdavi, M. Shamsfard, and M. Reza Meybodi, "Efficient stochastic algorithms for document clustering," Information Sciences, vol. 220, pp. 269-291, 2013.
- C. Ding and T. Li, "Adaptive dimension reduction using discriminant analysis and k-means clustering," in Proceedings of the 24th international conference on Machine learning, 2007, pp. 521-528.
- A. P. Reynolds, G. Richards, and V. J. Rayward-Smith, "The application of k-medoids and pam to the clustering of rules," in Intelligent Data Engineering and Automated Learning–IDEAL 2004, ed: Springer, 2004, pp. 173-178.
- S. M. Razavi Zadegan, M. Mirzaie, and F. Sadoughi, "Ranked< i> k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets," Knowledge-Based Systems, vol. 39, pp. 133-143, 2013.
- R. Joshi, A. Patidar, and S. Mishra, "Scaling k-medoid algorithm for clustering large categorical dataset and its performance analysis," in Electronics Computer Technology (ICECT), 2011 3rd International Conference on, 2011, pp. 117-121.
- C. Ordonez and E. Omiecinski, "FREM: fast and robust EM clustering for large data sets," in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 590-599.
- C. Ambroise, M. Dang, and G. Govaert, "Clustering of spatial data by the EM algorithm," in geoENV I—Geostatistics for environmental applications, ed: Springer, 1997, pp. 493-504.
- C. Carson, S. Belongie, H. Greenspan, and J. Malik, "Blobworld: Image segmentation using expectation-maximization and its application to image querying," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 1026-1038, 2002.
- J. Erman, M. Arlitt, and A. Mahanti, "Traffic classification using clustering algorithms," in Proceedings of the 2006 SIGCOMM workshop on Mining network data, 2006, pp. 281-286.
- L. R. Kaufman and P. Rousseeuw, "Finding groups in data: An introduction to cluster analysis," Hoboken NJ John Wiley & Sons Inc, 1990.
- S. Ben-David and M. Ackerman, "Measures of clustering quality: A working set of axioms for clustering," in Advances in neural information processing systems, 2009, pp. 121-128.
- H. -S. Park and C. -H. Jun, "A simple and fast algorithm for K-medoids clustering," Expert Systems with Applications, vol. 36, pp. 3336-3341, 2009.
- A Cross-country Database for Sector Investment and Capital – An open repository of the World Bank: http://go. worldbank. org/K955YO0N00 (accessed on June 23, 2014).

### Keywords

Clustering, Algorithm, K-means, EM-clustering, K-medoids