Task graph scheduling in the presence of performance fluctuations of computational resources

Task graph scheduling in the presence of performance fluctuations of computational resources

Most of the existing work in the area of task graph scheduling considers resources with fixed processingcapacity. The algorithms in these works rely on an estimation of the execution times of tasks on different resources.However, in practice, due to fluctuations in performance of cloud resources, these algorithms have challenges in theseenvironments. In this paper, we focus on the problem of fault-tolerant scheduling of task graphs in the presence ofperformance fluctuations of computational resources. With the aim of reducing the adverse impacts of both soft errorsand resource performance degradations, we propose an opportunistic task replication scheme that uses idle durationsof resources for replicating tasks. Unlike the previous works, the proposed algorithm does not rely on estimation oftask execution times for finding idle resources. We introduce the notion of concurrency graphs and propose a graphtheory-based algorithm for finding the number of idle resources during the execution of a set of tasks. The appropriateredundancy for each task is chosen with respect to the number of idle resources and the characteristics of the set of tasksthat are being processed concurrently. Simulation experiments show that, in most situations, the proposed algorithmoutperforms the previous algorithms in terms of average execution time and cost.

___

  • [1] Chandrashekar DP. Robust and fault-tolerant scheduling for scientific workflows in cloud computing environments. PhD, University of Melbourne, Melbourne, Australia, 2015.
  • [2] Calheiros RN, Buyya R. Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE T Parall Distr 2014; 25: 1787-1796.
  • [3] Pacheco-Sanchez S, Casale G, Scotney B, McClean S, Parr G, Dawson S. Markovian workload characterization for QoS prediction in the cloud. In: 2011 IEEE International Conference on CLOUD; 4–9 July 2011; Washington, DC, USA. pp. 147-154.
  • [4] Abrishami S, Naghibzadeh M, Epema DHJ. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener Comp Sy 2013; 29: 158-169.
  • [5] Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ. Performance analysis of high performance computing applications on the Amazon web services cloud. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science; 30 November–3 December 2010; Indianapolis, IN, USA. pp. 159-168.
  • [6] Plankensteiner K, Prodan R. Meeting soft deadlines in scientific workflows using resubmission impact. IEEE T Parall Distr 2012; 23: 890-901.
  • [7] Benoit A, Hakem M, Robert Y. Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IEEE International Symposium on Parallel and Distributed Processing; 14–18 April 2008; Miami, FL, USA. pp. 1-8.
  • [8] Zheng Q, Veeravalli B. On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices. J Parallel Distrib Comput 2009; 69: 282–294.
  • [9] Das A, Sarkar AD. On fault tolerance of resources in computational grids. International Journal of Grid Computing and Applications 2012; 3: 1-10.
  • [10] Aupy G, Benoit A, Casanova H, Robert Y. Checkpointing strategies for scheduling computational workflows. International Journal of Networking and Computing 2016; 6: 2-26.
  • [11] Fischer W, Meier-Hellstern K. The Markov-modulated Poisson process (MMPP) cookbook. Perform Evaluation 1993; 18: 149-171.
  • [12] Gu Y, Wu C, Liu X, Yu D. Distributed throughput optimization for large-scale scientific workflows under faulttolerance constraint. Grid Computing 2013; 11: 361-379.
  • [13] Girault A, Kalla H, Sighireanu M, Sorel Y. An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In: 2003 International Conference on Dependable Systems and Networks; 22–25 June 2003; San Francisco, CA, USA. pp. 159-168.
  • [14] Topcuoglu H, Hariri S, Wu M. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE T Parall Distr 2002; 13: 260-274.
  • [15] Lin C, Lu S. SCPOR: An elastic workflow scheduling algorithm for services computing. In: IEEE International Conference on Service-Oriented Computing and Applications; 12–14 December 2011; Irvine, CA, USA. pp. 1-8.
  • [16] Ang TF, Ling TC, Phang KK. Adaptive QoS scheduling in a service-oriented grid environment. Turk J Electr Eng Co 2012; 20: 413-424.
  • [17] Sajid M, Raza Z. Energy-aware stochastic scheduling model with precedence constraints on DVFS-enabled processors. Turk J Electr Eng Co 2016; 24: 4117-4128.
  • [18] Bougeret M, Casanova H, Rabie M, Robert Y, Vivien F. Checkpointing strategies for parallel jobs. In: Conference on High Performance Computing Networking, Storage and Analysis; 12–18 November 2011; Seattle, WA, USA. pp. 33:1-33:11.
  • [19] Benoit A, Cavelan A, Robert Y, Sun H. Two-level checkpointing and verifications for linear task graphs. In: IEEE International Parallel and Distributed Processing Symposium Workshops; 23–27 May 2016; Chicago, IL, USA. pp. 1239-1248.
  • [20] Zhang Y, Mandal A, Koelbel C, Cooper KD. Combined fault tolerance and scheduling techniques for workflow applications on computational grids. In: IEEE/ACM International Symposium on Cluster Computing and the Grid; 18–21 May 2009; Shanghai, China. pp. 244-251.
  • [21] Rajabi A, Wong JW. MMPP characterization of web application traffic. In: IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; 7–9 August 2012; Washington, DC, USA. pp. 107–114.
  • [22] Rajabi A, Wong JW. Provisioning of computing resources for web applications under time-varying traffic. In: IEEE International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems; 9–11 September 2014; Paris, France. pp. 152–157.
  • [23] Ryden T. An EM algorithm for estimation in Markov-modulated Poisson processes. Comp Stat Data Anal 1996; 21: 431-447.
  • [24] Dong F. Copula theory and its applications in computer networks, PhD, University of Victoria, Victoria, Canada, 2017.
  • [25] Eblen JD, Phillips CA, Rogers GL, Langston MA. The maximum clique enumeration problem: algorithms, applications and implementations. Lect Notes Bioinformat 2011; 13: 306-319.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK