CFP last date
15 May 2024
Reseach Article

A Novel Class Imbalance Learning Method using Neural Networks

by K. Nageswara Rao, D. Rajya Lakshmi, T. Venkateswara Rao
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 3 - Number 7
Year of Publication: 2012
Authors: K. Nageswara Rao, D. Rajya Lakshmi, T. Venkateswara Rao
10.5120/ijais12-450594

K. Nageswara Rao, D. Rajya Lakshmi, T. Venkateswara Rao . A Novel Class Imbalance Learning Method using Neural Networks. International Journal of Applied Information Systems. 3, 7 ( August 2012), 31-38. DOI=10.5120/ijais12-450594

@article{ 10.5120/ijais12-450594,
author = { K. Nageswara Rao, D. Rajya Lakshmi, T. Venkateswara Rao },
title = { A Novel Class Imbalance Learning Method using Neural Networks },
journal = { International Journal of Applied Information Systems },
issue_date = { August 2012 },
volume = { 3 },
number = { 7 },
month = { August },
year = { 2012 },
issn = { 2249-0868 },
pages = { 31-38 },
numpages = {9},
url = { https://www.ijais.org/archives/volume3/number7/245-0594/ },
doi = { 10.5120/ijais12-450594 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:46:02.925148+05:30
%A K. Nageswara Rao
%A D. Rajya Lakshmi
%A T. Venkateswara Rao
%T A Novel Class Imbalance Learning Method using Neural Networks
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 3
%N 7
%P 31-38
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i. e. examples of one class in a training data set vastly outnumber examples of the other class(es). In this paper, we present a new hybrid approach using neural networks to improve the class imbalance results. This algorithm provides a simpler and faster alternative by using multi perceptron back propagation neural network as base algorithm. We conduct experiments using eleven UCI data sets from various application domains using four base learners, and five evaluation metrics. Experimental results show that our method has shown good performance in terms of Area under the ROC Curve, F-measure, precision, TP rate and TN rate values than many existing class imbalance learning methods.

References
  1. J. Wu, S. C. Brubaker, M. D. Mullin, and J. M. Rehg, "Fast asymmetric learning for cascade face detection," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 30, no. 3, pp. 369–382, Mar. 2008.
  2. N. V. Chawla, N. Japkowicz, and A. Kotcz, Eds. , Proc. ICML Workshop Learn. Imbalanced Data Sets, 2003.
  3. N. Japkowicz, Ed. , Proc. AAAI Workshop Learn. Imbalanced Data Sets, 2000.
  4. G. M. Weiss, "Mining with rarity: A unifying framework," ACM SIGKDD Explor. Newslett. , vol. 6, no. 1, pp. 7–19, Jun. 2004.
  5. N. V. Chawla, N. Japkowicz, and A. Kolcz, Eds. , Special Issue Learning Imbalanced Datasets, SIGKDD Explor. Newsl. ,vol. 6, no. 1, 2004.
  6. W. -Z. Lu and D. Wang, "Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme," Sci. Total. Enviro. , vol. 395, no. 2-3, pp. 109–116, 2008.
  7. Y. -M. Huang, C. -M. Hung, and H. C. Jiau, "Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem," Nonlinear Anal. R. World Appl. , vol. 7, no. 4, pp. 720–747, 2006.
  8. D. Cieslak, N. Chawla, and A. Striegel, "Combating imbalance in network intrusion datasets," in IEEE Int. Conf. Granular Comput. , 2006, pp. 732–737.
  9. M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi, "Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance," Neural Netw. , vol. 21, no. 2–3, pp. 427–436, 2008.
  10. A. Freitas, A. Costa-Pereira, and P. Brazdil, "Cost-sensitive decision trees applied to medical data," in Data Warehousing Knowl. Discov. (Lecture Notes Series in Computer Science), I. Song, J. Eder, and T. Nguyen, Eds. ,
  11. K. Kilic¸,O¨ zgeUncu and I. B. Tu¨rksen, "Comparison of different strategies of utilizing fuzzy clustering in structure identification," Inf. Sci. , vol. 177, no. 23, pp. 5153–5162, 2007.
  12. M. E. Celebi, H. A. Kingravi, B. Uddin, H. Iyatomi, Y. A. Aslandogan, W. V. Stoecker, and R. H. Moss, "A methodological approach to the classification of dermoscopy images," Comput. Med. Imag. Grap. , vol. 31, no. 6, pp. 362–373, 2007.
  13. X. Peng and I. King, "Robust BMPM training based on second-order cone programming and its application in medical diagnosis," Neural Netw. , vol. 21, no. 2–3, pp. 450–457, 2008. Berlin/Heidelberg, Germany: Springer, 2007, vol. 4654, pp. 303–312.
  14. RukshanBatuwita and Vasile Palade (2010) FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning, IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010, pp no:558-571.
  15. N. Japkowicz and S. Stephen, "The Class Imbalance Problem: A Systematic Study," Intelligent Data Analysis, vol. 6, pp. 429-450, 2002.
  16. M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection," Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997.
  17. G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," SIGKDD Explorations, vol. 6, pp. 20-29, 2004. 1
  18. D. Cieslak and N. Chawla, "Learning decision trees for unbalanced data," in Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer-Verlag, 2008, pp. 241–256.
  19. G. Weiss, "Mining with rarity: A unifying framework," SIGKDD Explor. Newslett. , vol. 6, no. 1, pp. 7–19, 2004.
  20. N. Chawla, K. Bowyer, and P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," J. Artif. Intell. Res. , vol. 16, pp. 321–357, 2002.
  21. J. Zhang and I. Mani, "KNN approach to unbalanced data distributions: A case study involving information extraction," in Proc. Int. Conf. Mach. Learning, Workshop: Learning Imbalanced Data Sets, Washington, DC, 2003, pp. 42–48.
  22. T. Jo and N. Japkowicz, "Class imbalances versus small disjuncts," ACM SIGKDD Explor. Newslett. , vol. 6, no. 1, pp. 40–49, 2004.
  23. S. Zou, Y. Huang, Y. Wang, J. Wang, and C. Zhou, "SVM learning from imbalanced data by GA sampling for protein domain prediction," in Proc. 9th Int. Conf. Young Comput. Sci. , Hunan, China, 2008, pp. 982– 987.
  24. A. Asuncion D. Newman. (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online]. Available: http://www. ics. uci. edu/?mlearn/MLRepository. htm
  25. J. R. Quinlan, C4. 5: Programs for Machine Learning, 1st ed. San Mateo, CA: Morgan Kaufmann Publishers, 1993.
  26. C. -T. Su and Y. -H. Hsiao, "An evaluation of the robustness of MTS for imbalanced data," IEEE Trans. Knowl. Data Eng. , vol. 19, no. 10, pp. 1321– 1332, Oct. 2007.
  27. D. Drown, T. Khoshgoftaar, and N. Seliya, "Evolutionary sampling and software quality modeling of high-assurance systems," IEEE Trans. Syst. , Man, Cybern. A, Syst. , Humans. , vol. 39, no. 5, pp. 1097–1107, Sep. 2009.
  28. S. Garc´?a, A. Fern´andez, and F. Herrera, "Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems," Appl. Soft Comput. , vol. 9, no. 4, pp. 1304–1314, 2009.
  29. X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. -H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, "Top 10 algorithms in data mining," Knowl. Inf. Syst. , vol. 14, pp. 1–37, 2007.
Index Terms

Computer Science
Information Sciences

Keywords

Classification class imbalance weighted sampling subset filtering CILNN