CFP last date
15 May 2024
Reseach Article

A Multi-Nodal Implementation of Apriori Algorithm for Big Data Analytics using MapReduce Framework

by Terungwa Simon Yange, Ishaya Peni Gambo, Rhoda Ikono, Hettie A. Soriyan
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 31
Year of Publication: 2020
Authors: Terungwa Simon Yange, Ishaya Peni Gambo, Rhoda Ikono, Hettie A. Soriyan
10.5120/ijais2020451868

Terungwa Simon Yange, Ishaya Peni Gambo, Rhoda Ikono, Hettie A. Soriyan . A Multi-Nodal Implementation of Apriori Algorithm for Big Data Analytics using MapReduce Framework. International Journal of Applied Information Systems. 12, 31 ( July 2020), 8-28. DOI=10.5120/ijais2020451868

@article{ 10.5120/ijais2020451868,
author = { Terungwa Simon Yange, Ishaya Peni Gambo, Rhoda Ikono, Hettie A. Soriyan },
title = { A Multi-Nodal Implementation of Apriori Algorithm for Big Data Analytics using MapReduce Framework },
journal = { International Journal of Applied Information Systems },
issue_date = { July 2020 },
volume = { 12 },
number = { 31 },
month = { July },
year = { 2020 },
issn = { 2249-0868 },
pages = { 8-28 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number31/1089-2020451868/ },
doi = { 10.5120/ijais2020451868 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:10:30.530941+05:30
%A Terungwa Simon Yange
%A Ishaya Peni Gambo
%A Rhoda Ikono
%A Hettie A. Soriyan
%T A Multi-Nodal Implementation of Apriori Algorithm for Big Data Analytics using MapReduce Framework
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 31
%P 8-28
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper developed a distributed algorithm for Big Data Analytics to address the delay in the processing of big data. In order to achieve the aim of this research, an inspection of organizational documents, direct observation and collection of existing data from the National Health Insurance Scheme (NHIS) in Nigeria. The algorithm was formulated using Apriori Association Rule Mining and was specified using the enterprise application diagram. The implementation of the prototype for the algorithm was using MongoDB as the big data storage mechanism for the input. Comma Separated Values (CSV) files was used as the storage facility for the intermediate results generated during processing, and MySQL was used as the storage mechanism for the final output. Finally, Apache MapReduce as the big data multi-nodal processing platform and Java programming language as the implementation technology. This prototype was able to analyze different formats of data (i.e., pdf, excel, csv and images) with high volume and velocity. The result showed that the response time was 0.25 seconds, and the throughput was 8865.29 records per second. The stability of the prototype was also evaluated using the confidence of the rules generated. In conclusion, this research has shown that unnecessary delays in the processing of big data were due to the lack of appropriate data analytics tool to enhance the process. This study eliminated these irregularities which paved the way for quicker disbursement of funds to providers and other stakeholders, as well as, a quicker response to requests on enrollment, update and referral.

References
  1. Mazumder, S., Bhadoria, R. S. and Deka, G. C. 2017. Distributed Computings in Big Data Analytics: Concepts, Technologies and Applications. Springer International Publishing, Gewerbestrasse 11, 6330 Cham, Switzerland.
  2. Yange, S. T., Soriyan, H. A. and Oluoha, O. 2017. Design of a Data Analytics Model for National Health Insurance Scheme. Journal of Health Informatics Africa, 4(1): 42-50.
  3. Mirkin, B. 2010. Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Department of Computer Science and Information Systems, Birkbeck, University of London, Malet Street, London WC1E 7HX UK.
  4. Mouthaan, N. 2012. Effects of Big Data Analytics on Organizations’ Value Creation. Unpublished MSc. A thesis submitted to the Department of Business Information Systems, University of Amsterdam.
  5. Zhang, H., Chen, G., Ooi, B.C., Tan, K.L. and Zhang, M. 2015. In-Memory Big Data Management and Processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27: 1920–1948.
  6. Famutimi, R.F. 2018. Design and Implementation of In-Memory Technique for Managing Big Data Complexities. An Unpublished Ph.D. Thesis Submitted to the Department of Computer Science and Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria.
  7. Yange, S. T., Soriyan, H. A. and Oluoha, O. 2019. A Fraud Detection System for Health Insurance in Nigeria. Journal of Health Informatics Africa, 6(2): 64-73.
  8. Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J. and McPherson, J. 2010. Ricardo: Integrating R and Hadoop. SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA, 987-998.
  9. Saxena, P. and Govil, K. 2013. An Effective Reliability Efficient Algorithm for Enhancing the Overall Performance of Distributed Computing System. International Journal of Computer Applications, 82(5): 30-34.
  10. Chen, C.L.P. and Zhang, C.Y. 2014. Data-Intensive Applications, Challenges, Techniques and Technologies: A survey on Big Data. Information Sciences, 275: 314–347.
  11. Gilbert, S. and Lynch, N. 2002. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. SIGACT News, 33: 51–59.
  12. Kumar, U. and Kumar, J. 2014. A Comprehensive Review of Straggler Handling Algorithms for MapReduce Framework. International Journal of Grid Distribution Computing,7 (4): 139-148.
  13. Davenport, T. 2014. Big Data at Work: Dispelling the Myths, Uncovering the Opportunities. Harvard Business Review Press, Boston, Massachusetts, USA.
  14. Gandomi, A. and Haider, M. 2015. Beyond the Hype: Big Data Concepts, Methods and Analytics. International Journal of Information Management, 35: 137–144.
  15. Etikala, P.R. 2016. Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System. An Unpublished M.Sc. Thesis Submitted to Graduate Faculty of St. Cloud State University.
  16. Sajwan, V. and Yadav, V. 2015. MapReduce: Architecture and Internals. International Journal of Science and Research, 4(5): 774-777.
  17. Kotsiantis, S. and Kanellopoulos, D. 2006. Association Rules Mining: A Recent Overview. International Transactions on Computer Science and Engineering, 32 (1): 71-78.
  18. Babi, C., Rao, M.V. and Rao, V. V. 2017. Study of Association Rule Mining for Discovery of Frequent Item Sets on Big Data Sets. International Journal of Applied Engineering Research, 12(22): 12169-12175.
  19. Agrawal, R. Imielinski, T. and Swami, A. 1993. Mining Association Rules Between Sets of Items in Large Databases. In the Proceedings of ACMSIGMOD Conf. on Management of Data, May 1993, 207-216.
  20. Aggarwal, C. and Yu, P. 1998. Online Generation of Association Rules. In the Proceedings of the 14th Intl. Conf. on Data Engineering, 402-411.
  21. Agrawal, R. 1994. Fast Algorithms for Mining Association Rules in Large Databases. Computer Science and Technology, 15: 487-499.
  22. Singh, S., Garg, R. and Mishra, P. K. 2014. Review of Apriori Based Algorithms on MapReduce Framework. In the Proceedings of 2014 International Conference on Communication and Computing (ICC - 2014), Bangalore, India, 593–604.
  23. Gautam, J. and Srivastava, N. 2015. Analysis of Medical Domain Using CMARM: Confabulation Mapreduce Association Rule Mining Algorithm for Frequent and Rare Itemsets. International Journal of Advanced Computer Science and Applications, 6(11): 224-228.
  24. Saabith, S. A., Sundararajan, E. and Bakar, A. A. 2016. Parallel Implementation of Apriori Algorithms on the Hadoop-MapReduce Platform- An Evaluation of Literature. Journal of Theoretical and Applied Information Technology, 85(3): 321-351.
  25. Oweis, N. E., Fouad, M. M., Oweis, S. R., Owais, S. S. and Snasel, V. 2016. A Novel MapReduce Lift Association Rule Mining Algorithm (MRLAR) for Big Data. International Journal of Advanced Computer Science and Applications, 7(3): 151-157.
  26. Singh, B. and Miri, R. 2016. An Efficient Parallel Association Rule Mining Algorithm based on MapReduce Framework. International Journal of Engineering Research & Technology, 5(6): 236-240.
  27. Ramteke, S. 2016. Association Rule Mining Algorithm Using Big Data Analysis. International Journal on Recent and Innovation Trends in Computing and Communication, 4(5): 73-75
  28. Nancy, J. J., Rani, M. J. and Devaraj, D. 2016. Association Rule Mining in Big Data using MapReduce Approach in Hadoop. GRD Journals|Global Research and Development Journal for Engineering/International Conference on Innovations in Engineering and Technology, 179-186.
  29. Prajapati, D. J., Garg, S. and Chauhan, N. C. 2017. Interesting Association Rule Mining with Consistent and Inconsistent Rule Detection from Big Sales Data in Distributed Environment. Future Computing and Informatics Journal, 2: 19-30.
  30. Bagde, P.R. and Chaudhari, M.S. 2016. Analysis of Fraud Detection Mechanism in Health Insurance Using Statistical Data Mining Techniques. International Journal of Computer Science and Information Technologies, 7 (2): 925-927.
  31. Devi, M. R. and Sarojini, A. B. (2012). Applications of Association Rule Mining in Different Databases. Journal of Global Research in Computer Science, 3(8): 30-34.
  32. Deshpande, D. & Anami, B. S. 2016. Association Rules Based Analysis for Arthritis Patients’ Data. International Journal of Modern Trends in Engineering and Research, 3(3): 388-393.
  33. Kang’ethe, S. M. and Wagacha, P. W. 2014. Extracting Diagnosis Patterns in Electronic Medical Records using Association Rule Mining. International Journal of Computer Applications, 108(15): 19-26.
  34. Al-Maolegi, B. A. 2013. An Improved Apriori Algorithm for Association Rules. International Research Journal of Computer Science and Application, 1: 1-8.
  35. Rao, P. G. 2012. Implementing Improved Algorithm Over Apriori Data Mining Association Rules Algorithm. International Journal of Computer Science and Technology, 3: 489-493.
  36. Woo, J. 2012. Apriori MapReduce Algorithm. International Conference on Parallel and Distributed Processing Techniques and Applications, 20-31.
  37. Mahendra, N. T. 2012. Data Mining for High Performance Data Cloud using Association Rule Mining. IOSR Journal of Computer Engineering, 16-22.
  38. Hegazy, O. Y. O. 2012. Ann Efficient Implementation of Apriori Algorithm Based on Hadoop MapReduce Model. International Journal of Reviews in Computing, 12: 59-67.
  39. Dean, S. G. 2008. MapReduce: Simplified Data Processing on Large Clusters. ACM, 5: 107-113.
  40. Leem, H. K. (2012). Parallel Data Processing with MapReduce: A Survey. ACM, 40: 11-20.
  41. Yang, A. H. 2007. MapReduce Merge: A Simplified Relational Data Processing on Large Clusters. ACM, 1029-1040.
  42. Maitrey, S. and Jha, C. K. 2015. MapReduce: Simplified Data Analysis of Big Data. Procedia Computer Science, 57: 563 – 571.
  43. Kumar, L., Rajawat, S. and Joshi, K. 2015. Comparative analysis of NoSQL (MongoDB) with MySQL Database. International Journal of Modern Trends in Engineering and Research, 2(5): 120-128.
  44. Pothuganti, A. 2015. Big Data Analytics: Hadoop-Map Reduce & NoSQL Databases. International Journal of Computer Science and Information Technologies, 6 (1): 522-527.
  45. Yange, S. T., Soriyan, H. A. and Oluoha, O. 2019. An Implementation of a Repository for Healthcare Insurance Using MongoDB. Proceeding of the 14th International Conference of Nigeria Computer Society (NCS), Gombe, Nigeria, 30: 54-67.
  46. Ananth, G. S. and Raghuveer, K. 2017. A Novel Approach of Using MongoDB for Big Data Analytics. International Journal of Innovative Studies in Sciences and Engineering Technology, 3(8): 7-12.
  47. Yange, S. T., Soriyan, H. A. and Oluoha, O. 2017. A Schematic View of the Application of Big Data Analytics in Healthcare Crime Investigation. Journal of Health Informatics Africa, 4(1): 32-41.
  48. Oyegoke, T.O. 2015. Development of an Integrated Health Management System for National Health Insurance Scheme. An Unpublished M.Sc. Thesis Submitted to the Department of Computer Science and Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria.
  49. Eteng, F.O. & Ijim-Agbor, U. 2016. Understanding the Challenges and Prospects of Administering the National Health Insurance Scheme in Nigeria. International Journal of Humanities and Social Science Research, 2(8): 43-48.
  50. Alimi, O. M., Binuyo, O. G., Gambo I. G. & Jimoh, K. 2016. Realtime National Health Insurance Scheme (RNHIS): Means to Achieve Health for All. International Journal of Computer Science, Engineering and Applications (IJCSEA), 6(2): 1-8.
  51. Oyegoke, T. O., Ikono, R. N. and Soriyan, H. A. 2017. An Integrated Health Management System for National Health Insurance Scheme in Nigeria. Journal of Emerging Trends in Computing and Information Sciences, 8(1): 30-40.
  52. NHIS (National Health Insurance Scheme), 2013. National Health Insurance Scheme Operational Guidelines. Accessed on 01.06.2017 from http://www.nhis.gov.ng/images/stories/hmoregister/NHIS_OPERATIONAL_GUIDELINES.pdf.
  53. Hoyt, R.E. and Yoshihashi, A. (2014). Health Informatics: Practical Guide for Healthcare and Information Technology Professionals, Sixth Edition. Pensacola, FL, Lulu.com.
  54. Ebenezer J. G. A. and Durga, S. (2015). Big Data Analytics in Healthcare: A Survey. ARPN Journal of Engineering and Applied Sciences, 10(8): 3645-3650.
  55. Olaniyan, A. O. 2017. Assessment of the Implementation of National Health Insurance Scheme (NHIS) in South-Western Nigeria. Unpublished PhD Thesis submitted to the Department of Public Administration, Obafemi Awolowo University, Ile-Ife, Nigeria.
Index Terms

Computer Science
Information Sciences

Keywords

MapReduce Node Big Data Analytics MongoDB Apriori