CFP last date
15 May 2024
Reseach Article

Enhanced Classification via Clustering Techniques using Decision Tree for Feature Selection

by Balogun Abdullateef O., Mabayoje Modinat A., Salihu Shakirat, Arinze Salvation A.
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 9 - Number 6
Year of Publication: 2015
Authors: Balogun Abdullateef O., Mabayoje Modinat A., Salihu Shakirat, Arinze Salvation A.
10.5120/ijais2015451425

Balogun Abdullateef O., Mabayoje Modinat A., Salihu Shakirat, Arinze Salvation A. . Enhanced Classification via Clustering Techniques using Decision Tree for Feature Selection. International Journal of Applied Information Systems. 9, 6 ( September 2015), 11-16. DOI=10.5120/ijais2015451425

@article{ 10.5120/ijais2015451425,
author = { Balogun Abdullateef O., Mabayoje Modinat A., Salihu Shakirat, Arinze Salvation A. },
title = { Enhanced Classification via Clustering Techniques using Decision Tree for Feature Selection },
journal = { International Journal of Applied Information Systems },
issue_date = { September 2015 },
volume = { 9 },
number = { 6 },
month = { September },
year = { 2015 },
issn = { 2249-0868 },
pages = { 11-16 },
numpages = {9},
url = { https://www.ijais.org/archives/volume9/number6/811-2015451425/ },
doi = { 10.5120/ijais2015451425 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:00:22.720002+05:30
%A Balogun Abdullateef O.
%A Mabayoje Modinat A.
%A Salihu Shakirat
%A Arinze Salvation A.
%T Enhanced Classification via Clustering Techniques using Decision Tree for Feature Selection
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 9
%N 6
%P 11-16
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information overload has raggedly increased as a result of the advances in the aspect of storage capabilities and data collection in previous years. The growth seen in the number of observation has partly cause a collapse in analytical method but the increases in the number of variable associated with each observation has grossly collapse it. The number of variables that are measured on each observation.is referred to as the dimension of the data, and a major problem of dataset containing high dimensions is that, there exist only few “important” measured variables for understanding the fundamental occurrences of interest. Hence, dimension reduction of the original data prior to any modeling of the data is of great necessity today. In this paper, a précis of K-Means, Expectation Maximization and J48 decision tree classifier is presented with a framework on the performance measurement of base classifiers with and without feature reduction. A performance evaluation was carried out based on F-Measure, Precision, Recall, True Positive Rate, False Positive Rate, ROC Area and Time taken to build model. The experiment revealed that the reduced dataset yielded improved results than the full dataset after performing classification via clustering.

References
  1. Patil, T.R. and Sherekar, S.S (2013). Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification. International Journal of Computer Science and Applications. Vol. 6, No.2.
  2. Preeti K. and Rajeswari K. (2014). Selection of Significant Features using Decision Tree Classifiers. International Journal for Engineering Research and Applications (IJERA).
  3. Osama, A.A. (2008). Comparisons between Data Clustering Algorithms. The International Arab Journal of Information Technology, Vol. 5, No. 3.
  4. Yong, G.J., Min S.K. and Jun H. (2014). Clustering Performance Comparison using K-means and Expectation Maximization Algorithms. Biotechnology & Biotechnological Equipment, 28:sup1, S44-S48.
  5. Ian H.W., Eibe F. and Mark A.H. (2011). Data Minig: Practical Machine Learning Tools and Techniques (3rd edition). Morgan Kaufmann Publishers Inc., Inc., San Francisco, CA, USA.
  6. Bezdek, J.C. (1980). A Convergence Theorem for the Fuzzy C-means Clustering Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  7. Mahdi, E. and Fazekas, G. (2011). Feature Selection as an Improving Step for Decision Tree Construction. 2009 International Conference on Machine Learning and Computing IPCSIT Vol. 3. p. 35.
  8. Sang-Hyun C. and Hee-Su C. (2014). Feature Selection using Attribute Ratio in NSL-KDD data. International Conference Data Mining, Civil and Mechanical Engineering (ICDMCME’2014), Feb 4-5, 2014 Bali (Indonesia).
  9. Neil, A., Andrew, S. and Doug, T (n.d). Clustering with EM and K-Means.
  10. Mehmet, A., I. Cigdem and A. Mutlu. (2010). “A hybrid classification method of K Nearest Neighbor, Bayesian Methods and Genetic Algorithm,” Expert Systems with Applications vol. 37, p. 5061–7.
  11. Namita B., Deepti M., (2013). Comparative Study of EM and K-Means Clustering Techniques in WEKA interface. International Journal of Advanced Technology & Engineering Research (IJATER) Volume 3, Issue 4, Pp 40.
  12. Kesavalu, E., Reddy, V.N. and Rajulu, P.G. (2011). A Study of Intrusion Detection in Data Mining. Proceedings of the World Congress on Engineering 2011 Vol IIIWCE 2011, July 6-8, 2011, London, UK.
Index Terms

Computer Science
Information Sciences

Keywords

K-Means (KM) Expectation Maximization (EM) Decision Tree Feature Selection Data mining