CFP last date
15 May 2024
Reseach Article

An Empirical Exploration of the Yarn in Big Data

by Yusuf Perwej, Bedine Kerim, Mohmed Sirelkhtem Adrees, Osama E. Sheta
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 9
Year of Publication: 2017
Authors: Yusuf Perwej, Bedine Kerim, Mohmed Sirelkhtem Adrees, Osama E. Sheta
10.5120/ijais2017451730

Yusuf Perwej, Bedine Kerim, Mohmed Sirelkhtem Adrees, Osama E. Sheta . An Empirical Exploration of the Yarn in Big Data. International Journal of Applied Information Systems. 12, 9 ( Dec 2017), 19-29. DOI=10.5120/ijais2017451730

@article{ 10.5120/ijais2017451730,
author = { Yusuf Perwej, Bedine Kerim, Mohmed Sirelkhtem Adrees, Osama E. Sheta },
title = { An Empirical Exploration of the Yarn in Big Data },
journal = { International Journal of Applied Information Systems },
issue_date = { Dec 2017 },
volume = { 12 },
number = { 9 },
month = { Dec },
year = { 2017 },
issn = { 2249-0868 },
pages = { 19-29 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number9/1015-2017451730/ },
doi = { 10.5120/ijais2017451730 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:07:38.979349+05:30
%A Yusuf Perwej
%A Bedine Kerim
%A Mohmed Sirelkhtem Adrees
%A Osama E. Sheta
%T An Empirical Exploration of the Yarn in Big Data
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 9
%P 19-29
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The growth in population and progression of internet services, data size is getting increased day by day where 105000s of Trillion of data files are there in cloud available in unstructured nature. The coming times of Big Data are rapidly arriving for just about all industries. The Big Data can help in metamorphose major business processes by advisable and correct analysis of accessible data. Big data have also played an essential role in crime discover. Hadoop is open-source software in the form of an extremely scalable and fault tolerant distributed system which plays a very remarkable role in data storage and its processing. The Apache Hadoop Yarn is an open source framework developed by Apache Software Foundation. It is used for nursing Big Data. It endows storage as well as processing functionality. In this paper, we aimed to demonstrate a close look to about Yarn. The Yarn as a usual computing fabric to support MapReduce and another application instance within of the same kind Hadoop cluster. Yarn allow multiple applications to run simultaneously on the coequal shared cluster and assent applications to negotiate resources based on necessity. In the end, we are in a nutshell discuss about the design, development, and current state of deployment of the next generation of Hadoop's computes platform Yarn.

References
  1. Prof. Dr. Philippe Cudré-Mauroux, “An Introduction to BIG DATA”, June 6, 2013 Alliance EPFL, http://exascale.info/
  2. S. Kaisler, F. Armour, J.A. Espinosa, and W. Money, Big Data: issues and challenges moving forward, in: Proceedings of the 46th IEEE Annual Hawaii international Conference on System Sciences (HICC 2013), Grand Wailea, Maui, Hawaii, January 2013, pp. 995-1004
  3. Dr. Yusuf Perwej, “An Experiential Study of the Big Data,” for published in the International Transaction of Electrical and Computer Engineers System (ITECES), USA, ISSN (Print): 2373-1273 ISSN (Online): 2373-1281, Vol. 4, No. 1, page 14-25, March 2017. DOI:10.12691/iteces-4-1-3
  4. Hu, H., et. al. (2014). Toward scalable systems for Big Data analytics: A technology tutorial. Access IEEE, 2, 652–687.
  5. “Apache Hadoop,” Apache. [Online]. Available: http://hadoop.apache.org/. [Accessed: 18-Oct-2017].
  6. Nikhat Akhtar, Firoj Parwej, Dr. Yusuf Perwej, “A Perusal Of Big Data Classification And Hadoop Technology,” for published in the International Transaction of Electrical and Computer Engineers System (ITECES), USA, ISSN (Print): 2373-1273 ISSN (Online): 2373-1281, Vol. 4, No. 1, page 26-38, May 2017. DOI: 10.12691/iteces-4-1-4.
  7. Chen, R., & Chen, H. (2013). Tiled-MapReduce: Efficient and flexible MapReduce processing on multicore with tiling. ACM Transactions on Architecture and Code Optimization (TACO), 10(1), 3.
  8. Y. Yao, J. Wang, B. Sheng, J. Lin, N. Mi, "HaSTE: Hadoop yarn scheduling based on task-dependency and resource-demand", 2014 IEEE 7th International Conference on Cloud Computing, pp. 184-191, 2014.
  9. Yusuf Perwej, Md. Husamuddin, Fokrul Alom Mazarbhuiya,“An Extensive Investigate the MapReduce Technology “,International Journal of Computer Sciences and Engineering IJCSE) E-ISSN: 2347-2693, Volume-5 , Issue-10 , Page 218-225, Oct -2017. DOI: 10.26438/ijcse/v5i10.218225
  10. Murthy, Arun (2012-08-15). "Apache Hadoop YARN – Concepts and Applications". hortonworks.com. Hortonworks. Retrieved 2017-10 -22.
  11. T. C. Bressoud, Q. Tang, "Results of a Model for Hadoop YARN MapReduce Tasks", IEEE International Conference on Cluster Com-nutine, September 2016.
  12. Vasiliki Kalavri, Vladimir Vlassov, "MapReduce: Limitations Optimizations and Open Issues", Trust Security and Privacy in Computing and Communications (TrustCom) 2013 12th IEEE International Conference on, 2013.
  13. Z. Ren, J. Wan, W. Shi, X. Xu, M. Zhou, "Workload analysis implications and optimization on a production hadoop cluster: A case study on taobao", IEEE Transactions on Services Computing, vol. 7, no. 2, pp. 307-321, April 2012.
  14. Apache. Yarn Scheduler Load Simulator (SLS), [online] Available: http://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html.2016.
  15. V. K. Vavilapalli, A. C. Murthy, C. Douglas et al., "Apache hadoop yarn: Yet another resource negotiator," in Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
  16. Yi Yao, Jiayin Wang, Bo Sheng, Jason Lin, Ningfang Mi, "HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand", Cloud'14.
  17. Rostom Mennour, Mohamed Batouche, Oussama Hannache, "MR-SPS: Scalable parallel scheduler for YARN/MapReduce platform", Service Operations And Logistics And Informatics (SOLI) 2015 IEEE International Conference on, pp. 199-204, 2015.
  18. Kebing Wang, Zhaojuan Bian, Qian Chen, "Millipedes: Distributed and Set-Based Sub-Task Scheduler of Computing Engines Running on Yarn Cluster", High Performance Computing and Communications (HPCC) 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS) 2015 IEEE 17th International Conference on, pp. 1597-1602, 2015.
  19. Yi Yao, Han Gao, Jiayin Wang, Ningfang Mi, Bo Sheng, "OpERA: Opportunistic and Efficient Resource Allocation in Hadoop YARN by Harnessing Idle Resources", Computer Communication and Networks (ICCCN) 2016 25th International Conference on, pp. 1-9, 2016.
  20. M. Isard, V. Prabhakaran, J. Currey et al., "Quincy: fair scheduling for distributed computing clusters", Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 261-276, 2009.
  21. Yi Yao, Jiayin Wang, Bo Sheng, Jason Lin, Ningfang Mi, "HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand", IEEE 7th International Conference on Cloud Computing, pp. 184-191, 2014.
  22. J. Chauhan, D. Makaroff, W. Grassmann, "Simulation and performance evaluation of Hadoop capacity scheduler", 2013.
  23. Apache Hadoop Project, http://hadoop.apache.org
  24. "Apache Software Foundation", Welcome to Apache Tez™, Mar 2017, [online] Available: https://tez.apache.org/.
  25. Apache Hama, Jun. 2016, [online] Available: https://hama.apache.org/.
  26. Abdul Ghaffar Shoro, Tariq Rahim Soomro, "Big Data Analysis: Ap Spark Perspective", Global Journal of Computer Science and Technology: C Software & Data Engineering, vol. 15, no. 1, 2015.
  27. Kenny Ballou, Apache Storm vs. Apache Spark, Apr 2016, [online] Available: http://zdatainc.com/2014/09/apache-storm-apache-spark/
  28. Apache samza: Linkedin's real-time stream processing framework, [online] Available: https://engineering.linkedin.com/data-streams/apache-samza-linkedins-real-time-stream-processingframework.
  29. Apache Accumulo. http://accumulo.apache.org
  30. T. C. Bressoud, Q. Tang, Analysis modeling and simulation of Hadoop YARN MapReduce.
Index Terms

Computer Science
Information Sciences

Keywords

Big Data Yarn Hadoop Yarn Scheduler MapReduce Yarn Frameworks