CFP last date
15 May 2024
Reseach Article

Ensemble-based Predictive Model for Financial Fraud Detection

by V.O. Olaleye, O.A. Odeniyi, B.K. Alese
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 42
Year of Publication: 2024
Authors: V.O. Olaleye, O.A. Odeniyi, B.K. Alese
10.5120/ijais2024451961

V.O. Olaleye, O.A. Odeniyi, B.K. Alese . Ensemble-based Predictive Model for Financial Fraud Detection. International Journal of Applied Information Systems. 12, 42 ( Jan 2024), 54-62. DOI=10.5120/ijais2024451961

@article{ 10.5120/ijais2024451961,
author = { V.O. Olaleye, O.A. Odeniyi, B.K. Alese },
title = { Ensemble-based Predictive Model for Financial Fraud Detection },
journal = { International Journal of Applied Information Systems },
issue_date = { Jan 2024 },
volume = { 12 },
number = { 42 },
month = { Jan },
year = { 2024 },
issn = { 2249-0868 },
pages = { 54-62 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number42/ensemble-based-predictive-model-for-financial-fraud-detection/ },
doi = { 10.5120/ijais2024451961 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-01-27T22:32:21.391180+05:30
%A V.O. Olaleye
%A O.A. Odeniyi
%A B.K. Alese
%T Ensemble-based Predictive Model for Financial Fraud Detection
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 42
%P 54-62
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The financial industry remains a persistent target for fraudulent activities. Challenges to research in this area are due to data privacy concerns and the scarcity of publicly available datasets that contain instances of fraud. Researchers and practitioners have proposed various fraud detection techniques, applying diverse algorithms to uncover fraudulent patterns. To further address this, the study introduces a synthetic fraud-related dataset featuring five distinct fraud scenarios having about 2.5 million transactions. The primary objective is to analyze the intricacies of account transaction behaviour in a financial dataset. The authors propose an ensemble of three gradient boosting algorithms: CatBoost, Extreme Gradient Boosting (XGBoost), and LightGBM; The models developed demonstrate promising results, with several achieving an average Area Under the Curve (AUC) exceeding 0.9 and the ensemble having a predictive accuracy of 98.60%. Further evaluation through an application programming interface indicates a time complexity of less than 300 milliseconds and efficient memory usage, making this approach promising for practical usage in real-world scenarios.

References
  1. D. Prusti and S. K. Rath, "Fraudulent Transaction Detection in Credit Card by Applying Ensemble Machine Learning techniques," 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 2019, pp. 1-6, doi: 10.1109/ICCCNT45670.2019.8944867.
  2. Sánchez-Aguayo, M., Urquiza-Aguiar, L., & Estrada-Jiménez, J. (2022). Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques. Applied Sciences, 12, 3382. https://doi.org/10.3390/app12073382
  3. Paefgen, J., Staake, T., & Thiesse, F. (2013). Evaluation and aggregation of pay-as-you-drive insurance rate factors: A classification analysis approach. Decision Support Systems, 56, 192–201
  4. Baecke, P., & Bocca, L. (2017). The value of vehicle telematics data in insurance risk selection processes. Decision Support Systems, 98, 69–79.
  5. Bian, Y., Yang, C., Zhao, J. L., & Liang, L. (2018). Good drivers pay less: A study of usage-based vehicle insurance models. Transportation Research Part A: Policy and Practice, 107, 20–34.
  6. Pesantez-Narvaez, J., Guillen, M., & Alcaniz, M. (2019). Predicting motor insurance claims using telematics data—xgboost versus logistic regression. Risks, 7(2), 70.
  7. Prates, J. M., Oliveira, L. S., Costa, K. A., & Ludermir, T. B. (2011). Predictive modelling for fraud detection: A data-oriented approach. Decision Support Systems, 51(1), 201-210.
  8. Geetha, G., Navin, J., Sanjeevi, P., & Sivaraj, S. (2023). Driver Driving Performance Analysis And Risk Detection Using Deep Learning. International Journal of Advanced Research in Computer and Communication Engineering, 12(5), 388–394. https://doi.org/10.17148/IJARCCE.2023.12563
  9. A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi,“Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy” in IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, pp. 1–14.
  10. A. Dal Pozzolo, O. Caelen, and G. Bontempi, “When is undersampling effective in unbalanced classification tasks?” in Machine Learning and Knowledge Discovery in Databases. Cambridge, U.K.: Springer, 2015
  11. A. Dal Pozzolo, O. Caelen, R. A. Johnson, and G. Bontempi, “Calibrating probability with undersampling for unbalanced classification,” in Proc. IEEE Symp. Ser. Computat. Intell., Dec. 2015, pp. 159–166
  12. C. Alippi, G. Boracchi, and M. Roveri, “Just-in-time classifiers for recurrent concepts,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 4, pp. 620–634, Apr. 2013.
  13. J. Gama, I. Žliobait˙ e, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Comput. Surv., vol. 46, no. 4, p. 44, 2014.
  14. G. Krempl and V. Hofer, “Classification in presence of drift and latency,” in Proc. 11th Data Mining Workshops, Dec. 2011, pp. 596–603.
  15. J. Plasse and N. Adams, “Handling delayed labels in temporally evolving data streams,” in Proc. Int. Conf. Big Data, 2016, pp. 2416–2424.
Index Terms

Computer Science
Information Sciences
Data mining
Fraud Detection
Financial Industry

Keywords

Machine Learning Synthetic Data Financial Fraud Ensemble Learning Gradient Boosting