Muhammed Numan İNCE, Joseph LEDET, Melih GÜNAY

Lightweight distributed computing framework for orchestrating high performance computing and big data

In recent years, the need for the ability to work remotely and subsequently the need for the availability of remote computer-based systems has increased substantially. This trend has seen a dramatic increase with the onset of the 2020 pandemic. Often local data is produced, stored, and processed in the cloud to remedy this flood of computation and storage needs. Historically, HPC (high performance computing) and the concept of big data have been utilized for the storage and processing of large data. However, both HPC and Hadoop can be utilized as solutions for analytical work, though the differences between these may not be obvious. Both use parallel processing techniques and offer options for data to be stored in either a centralized or distributed manner. Recent studies have focused on using a hybrid approach with both technologies. Therefore, the convergence between HPC and big data technologies can be filled with distributed computing machines at the layer described. This paper results from the motivation that there exists a necessity for a distributed computing framework that can scale from SOC (system on chip) boards to desktop computers and servers. For this purpose, in this article, we propose a distributed computing environment that can scale up to devices with heterogeneous architecture, where devices can set up clusters with resource-limited nodes and then run on top of. The solution can be thought of as a minimalist hybrid approach between HPC and big data. Within the scope of this study, not only the design of the proposed system is detailed, but also critical modules and subsystems are implemented as proof of concept.

PDF

___

[1] Saraladevi B, Pazhaniraja N, Victer P, Basha S, Dhavachelvanc P. Big Data and Hadoop-a Study in Security Perspective. Procedia Computer Science 2015; 597-601. doi: 10.1016/j.procs.2015.04.091
[2] Gunay M, Ince M. Building an Open Source Big Data Platform Based on Milis Linux. Trends in Data Engineering Methods for Intelligent Systems 2021; 33-41. doi: 10.1007/978-3-030-79357-9_5
[3] Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop Distributed File System. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) 2010; 1-10. doi: 10.1109/MSST.2010.5496972
[4] Kang S, Lee S, Lee K. Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems. Advances in Multimedia 2015;1-9. doi: 10.1155/2015/575687
[5] Hunt P, Konar M, Junque F, Reed B. ZooKeeper: Wait-Free Coordination for Internet-Scale Systems. Proceedings of the USENIX Conference on USENIX Annual Technical Conference 2010; 11. doi: 10.5555/1855840.1855851
[6] Dean J, Ghemawat G. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 2004;51: 137-150. doi: 10.1145/1327452.1327492
[7] Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O et al. MapReduce and PACT - Comparing Data Parallel Programming Models. BTW 2011;180: 25-44. isbn: 978-3-88579-274-1
[8] Zheng Y, Kamil A, Driscoll M, Shan H, Yelick K. UPC++: A PGAS Extension for C++. IEEE 28th International Parallel and Distributed Processing Symposium 2014; 1105-1114. doi: 10.1109/IPDPS.2014.115
[9] Ayyalasomayajula H, West K. Experiences running different work load managers across Cray Platforms. InCoS2016 and Cray User Group 2017.
[10] Mohd U, Mengchen L, Min C. Job schedulers for Big data processing in Hadoop environment: Testing real-life schedule with benchmark programs. Digital Communications and Networks 2017; 1105-1114. doi: 10.1016/j.dcan.2017.07.008
[11] Yoo A, Jette M, Grondona M. SLURM: Simple Linux Utility for Resource Management. Job Scheduling Strategies for Parallel Processing 2003; 44-60. doi: 10.1007/10968987_3
[12] Ullah S, Awan D, Khiyal M. Big Data in Cloud Computing: A Resource Management Perspective. Scientific Programming 2018; 5418679:1-5418679:17. doi: 10.1155/2018/5418679
[13] Vavilapalli V, Murthy A, Dougkas C, Agarwal S, Konar M et al. Apache Hadoop YARN: Yet Another Resource Negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing 2013; 5 (16). doi: 10.1145/2523616.2523633
[14] Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph A et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation 2011; 14: 295-308. doi: 10.5555/1972457.1972488
[15] Rahman M, Lu X, Islam N, Rajachandrasekar R, Panda D. High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA. IEEE International Parallel and Distributed Processing Symposium 2015; p291-300. doi: 10.1109/IPDPS.2015.83
[16] Renker G, Stringfellow N, Howard K, Sadaf R, Trofinoff S. Deploying SLURM on XT , XE , and Future Cray Systems. Cray User Group 2011.
[17] Sudsee B, Kaewkasi C. An Improvement of a Checkpoint-based Distributed Testing Technique on a Big Data Environment. 21st International Conference on Advanced Communication Technology (ICACT) 2019; p1081-1090. doi: 10.23919/ICACT.2019.8702037
[18] Hargrove P, Duell J. Berkeley lab checkpoint/restart (BLCR) for Linux clusters. Journal of Physics: Conference Series 2006; 46: 494-499. doi: 10.1088/1742-6596/46/1/067
[19] Ansel J, Arya K, Cooperman G. DMTCP: Transparent checkpointing for cluster computations and the desktop. IEEE International Symposium on Parallel Distributed Processing 2009; 46: 1-12. doi: 10.1109/IPDPS.2009.5161063
[20] Pickartz S, Eiling N, Lankes S, Razik L, Monti A. Migrating LinuX Containers Using CRIU. ISC High Performance 2016; 674-684. doi:10.1007/978-3-319-46079-6_47
[21] Hintjens P. ZeroMQ: Messaging for Many Applications. 2013.
[22] Ahn D, Garlick J, Grondona M, Lipari D, Springmeyer B et al. Flux: A Next-Generation Resource Management Framework for Large HPC Centers. Proceedings of the 2014 43rd International Conference on Parallel Processing Workshops 2014; 9: 9-17. doi: 10.1109/ICPPW.2014.15
[23] Ahn D, Bass N, Chu A, Garlick J, Grondona M et al. Flux: Overcoming scheduling challenges for exascale workflows. Future Generation Computer Systems 2020; 110: 202-213. doi: 10.1016/j.future.2020.04.006
[24] Kamburugamuve S, Fox G. Designing Twister2: Efficient Programming Environment Toolkit for Big Data. 2017. doi: 10.13140/RG.2.2.18363.31524
[25] Kamburugamuve S, Kannan G, Wickramasinghe P, Abeykoon V, Fox G. Twister2: Design of a Big Data Toolkit. Concurrency and Computation: Practice and Experience 2019. doi: 10.1002/cpe.5189
[26] Wickramasinghe P, Kamburugamuve S, Govindarajan K, Abeykoon V, Widanage C. Twister2: TSet HighPerformance Iterative Dataflow. International Conference on High Performance Big Data and Intelligent Systems (HPBD IS) 2019; p55-60. doi: 10.1109/HPBDIS.2019.8735495
[27] Fox G, Qiu J, Kamburugamuve S, Jha S, Luckow A. HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack. 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 2015; 1057-1066. doi: 10.1109/CCGrid.2015.122
[28] Ekanayake S, Kamburugamuve S, Fox G. SPIDAL Java: High Performance Data Analytics with Java and MPI on Large Multicore HPC Clusters. 24th High Performance Computing Symposium (HPC) 2016. doi: 10.13140/RG.2.1.3991.6568
[29] Madhavapeddy A, Scott J. Unikernels: Rise of the Virtual Library Operating System: What If All the Software Layers in a Virtual Appliance Were Compiled within the Same Safe, High-Level Language Framework?. Association for Computing Machinery 2013; (15): 30-44. doi: 10.1145/2557963.2566628
[30] Selvaganesan M, Liazudeen M. An Insight about GlusterFS and Its Enforcement Techniques. International Conference on Cloud Computing Research and Innovations (ICCCRI) 2016; 120-127. doi: 10.1109/ICCCRI.2016.26
[31] Saha P, Beltre A, Govindaraju M. Exploring the Fairness and Resource Distribution in an Apache Mesos Environment. IEEE 11th International Conference on Cloud Computing (CLOUD) 2018; 434-441. doi: 10.1109/CLOUD.2018.00061