CFP last date
15 December 2023
Reseach Article

An Efficient Software Fault Prediction Model using Cluster based Classification

by Pradeep Singh, Shrish Verma
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 7 - Number 3
Year of Publication: 2014
Authors: Pradeep Singh, Shrish Verma

Pradeep Singh, Shrish Verma . An Efficient Software Fault Prediction Model using Cluster based Classification. International Journal of Applied Information Systems. 7, 3 ( May 2014), 35-41. DOI=10.5120/ijais14-451160

@article{ 10.5120/ijais14-451160,
author = { Pradeep Singh, Shrish Verma },
title = { An Efficient Software Fault Prediction Model using Cluster based Classification },
journal = { International Journal of Applied Information Systems },
issue_date = { May 2014 },
volume = { 7 },
number = { 3 },
month = { May },
year = { 2014 },
issn = { 2249-0868 },
pages = { 35-41 },
numpages = {9},
url = { },
doi = { 10.5120/ijais14-451160 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2023-07-05T18:54:51.099956+05:30
%A Pradeep Singh
%A Shrish Verma
%T An Efficient Software Fault Prediction Model using Cluster based Classification
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 7
%N 3
%P 35-41
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

Predicting fault -prone software components is an economically important activity due to limited budget allocation for software testing. In recent years data mining techniques are used to predict the software faults .In this research, we present a cluster based fault prediction classifiers which increases the probability of detection. The expectation from a predictor is to get very high probability of detection to get more reliable and test effective software. In our experiments, we used fault data from mission critical systems. In this paper we have used discretization as preprocessing and cluster based classification for prediction of fault-prone software modules. Clustering based classification allows production of comprehensible models of software faults exploiting symbolic learning algorithms. To evaluate this approach we perform an extensive comparative analysis with benchmark results of software fault prediction for the same data sets. Our proposed model shows better results than the standard and benchmark approaches for software fault prediction. Our proposed model gives superior probability of detection (pd) 83.3% and balance rates 685%.

  1. M.J. Harrold, Testing: a roadmap, in: Proceedings of the Conference on the Future of Software Engineering, ACM Press, New York, NY, 2000.
  2. B.V. Tahat, B. Korel, A. Bader, Requirement-based automated black-box test generation, in: Proceedings of the 25th Annual International Computer Software and Applications Conference, Chicago, Illinois, 2001, pp. 489–495
  3. Wohlin, C., Aurum, A., Petersson, H., Shull, F., & Ciolkowski, M. (2002). Software inspection benchmarking— A qualitative and quantitative comparative opportunity. In METRICS ’02: Proceedings of the 8th international symposium on software metrics (pp. 118–127). IEEE Computer Society.
  4. Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering. IEEE Press, 22, 751–761
  5. Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, IEEE Computer Society, 32(11), 2–13
  6. F. Shull, V.B. Boehm, A. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M. Zelkowitz, “What We Have Learned About Fighting Defects,” Proc. Eighth Int’l Software Metrics Symp., pp. 249-258, 2002
  7. Tosun, A., Turhan, B., & Bener, A. (2009). Practical Considerations in Deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In PROMISE’09: Proceedings of the first international conference on predictor models in software engineering. Vancouver, Canada.
  8. N. Nagappan and T. Ball, Static Analysis Tools as Early Indicators of Pre-Release Defect Density, Proc. Intl Conf. Software Eng., 2005.
  9. T. Khoshgoftaar and E. Allen, “Model Software Quality with Classification Trees,” Recent Advances in Reliability and Quality Eng., pp. 247-270, 2001.
  10. Li, Q., & Yao, C. (2003). Real-time concepts for embedded systems. San Francisco: CMP Books.
  11. M. Evett, T. Khoshgoftaar, P. Chien, E. Allen, GP-based software quality prediction, in: Proceedings of the Third Annual Genetic Programming Conference, San Francisco, CA, 1998, pp. 60–65.
  12. T.M. Khoshgoftaar, N. Seliya, Software quality classification modeling using the SPRINT decision tree algorithm, in: Proceedings of the Fourth IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, 2002, pp. 365–374.
  13. M.M. Thwin, T. Quah, Application of neural networks for software quality prediction using object-oriented metrics, in: Proceedings of the 19th International Conference on Software Maintenance, Amsterdam, The Netherlands, 2003, pp. 113–122.
  14. K. El Emam, S. Benlarbi, N. Goel, S. Rai, Comparing case-based reasoning classifiers for predicting high risk software components, Journal of Systems and Software 55 (3) (2001) 301–320.
  15. X. Yuan, T.M. Khoshgoftaar, E.B. Allen, K. Ganesan, An application of fuzzy clustering to software quality prediction, in: Proceedings of the Third IEEE Symposium on Application-Specific Systems and Software Engineering Technology, IEEE Computer Society, Washington, DC, 2000, pp. 85.
  16. H.M. Olague, S. Gholston, S. Quattlebaum, Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes, IEEE Transactions on Software Engineering 33 (6) (2007) 402–419.
  17. K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines, Journal of Systems and Software 81 (5) (2008) 649–660.
  18. Catal C, Diri B. ”Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem”, Information Sciences. 179:pp.1040-1058,2009.
  19. P. Tomaszewski, J. Hakansson, H. Grahn, and L. Lundberg, Statistical models vs. expert estimation for fault prediction in modified code-an industrial case study, The Journal of Systems and Software, vol. 80, no. 8, pp. 12271238, 2007.
  20. I. Gondra, Applying machine learning to software fault-proneness prediction, Journal of Systems and Software 81 (2) (2008) 186–195.
  21. T. Quah, Estimating software readiness using predictive models, Information Sciences, 2008
  22. B. Turhan and A. Bener, Analysis of Naive Bayes Assumptions on Software Fault Data: An Empirical Study, Data & Knowledge Eng., vol. 68, no. 2, pp. 278-290, 2009.
  23. Ayse Tosun Misirli, Ayse Basar Bener, Burak Turhan: An industrial case study of classifier ensembles for locating software defects. Software Quality Journal 19(3): 515-536 (2011)
  24. Boetticher, G., Menzies, T., & Ostrand, T. J. (2007). The PROMISE repository of empirical software engineering data West Virginia University, Lane Department of Computer Science and Electrical Engineering.
  26. Amasaki, S., Takagi, Y., Mizuno, O., & Kikuno, T. (2005). Constructing a Bayesian belief network to predict final quality in embedded system development. IEICE Transactions on Information and Systems, 134, 1134–1141.
  27. Kan, S. H. (2002). Metrics and models in software quality engineering. Reading: Addison-Wesley.
  28. Oral, A. D., & Bener, A. (2007). Defect Prediction for Embedded Software. ISCIS ’07: Proceedings of the 22nd international symposium on computer and information sciences (pp. 1–6).
  29. T.M. Khoshgoftaar, N. Seliya, Fault prediction modeling for software quality estimation: comparing commonly used techniques, Empirical Software Engineering 8 (3) (2003) 255–283
  30. Zhong, S., Khoshgoftaar, T.M., and Seliya, N., “Analyzing Software Measurement Data with Clustering Techniques”, IEEE Intelligent Systems, Special issue on Data and Information Cleaning and Pre-processing, Vol (2), 2004, pp. 20-27.
  31. T. Menzies, J. DiStefano, A. Orrego, and R. Chapman, “Assessing Predictors of Software Defects,” Proc. Workshop Predictive Software Models, 2004.
  32. Fenton, N., Neil, M., “A Critique of Software Defect Prediction Models”, IEEE Transactions on Software Engineering, Vol 25(5), 1999, pp.675-689.
  33. M. Halstead, Elements of Software Science. Elsevier, 1977.
  34. T. McCabe, “A Complexity Measure,” IEEE Trans. Software Eng.,vol. 2, no. 4, pp. 308-320, Dec. 1976.
  35. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software fault prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 2008.
  36. U. M. Fayyad and K. B. Irani, Multi-interval discretisation of continuous-valued attributes," in Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. 1993, pp. 1022-1027,
  37. D. Chiu, A. Wong, and B. Cheung, “Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis,” Knowledge Discovery in Databases, G. Piatesky-Shapiro and W.J. Frowley, ed., MIT Press, 1991.
  38. X. Wu, “A Bayesian Discretizer for Real-Valued Attributes,” The Computer J., vol. 39, 1996.
  39. A. Paterson and T.B. Niblett, ACLS Manual. Edinburgh: Intelligent Terminals, Ltd, 1987
  40. R. Kerber, “ChiMerge: Discretization of Numeric Attributes,” Proc. Ninth Int’l Conf. Artificial Intelligence (AAAI-91), pp. 123-128, 1992.
  41. H. Liu and R. Setiono, “Feature Selection via Discretization,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 4, pp. 642-645, July/ Aug. 1997.
  42. Dougherty, J., Kohavi, R., and Sahami, M. (1995), Supervised and Unsupervised discretization of continuous features. Machine Learning 10(1), 57-78.
  43. Shull, F. J., Carver, J. C., Vegas, S., & Juristo, N. (2008). The role of replications in empirical software engineering. Empirical Software Engineering Journal, 13, 211–218.
  44. Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining IEEE transactions on knowledge and data engineering. IEEE Educational Activities Department, 15, 1437–1447.
  45. J. Lung, J. Aranda, S.M. Easterbrook, G.V. Wilson, On the difficulty of replicating human subjects studies in software engineering, in: Proceedings of the 30th International Conference on Software Engineering, 2008, pp. 191–200.
  46. T. Menzies, J. DiStefano, A. Orrego, and R. Chapman, “Assessing Predictors of Software Defects,” Proc. Workshop Predictive Software Models, 2004.
  47. J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symp. Math. Statistics and Probability, pages 281{297, 1967.
  48. Lee, E. A. (2002). Embedded software, advances in computers 56. London: Academic Press.
  49. I.H. Witten and E. Frank, Data Mining, second ed.Morgan Kaufmann, 2005.
Index Terms

Computer Science
Information Sciences


Error Prone Software fault prediction software metrics