A data-aware cognitive engine for scheduling data intensive applications in a grid

A data-aware cognitive engine for scheduling data intensive applications in a grid

Data-intensive applications produce huge amounts of data that need to be stored, analyzed, and interpreted. A data grid serves as a cost-effective infrastructure for solving these data-intensive applications. Existing scheduling strategies are best suited for handling compute-intensive applications, although they lack in performance while handling dataintensive applications. In this work, a novel mechanism of incorporating cognitive science in a data grid is proposed for scheduling data-intensive workflows. A unique model is derived in which a cognitive engine (CE) is built into the middleware of the data grid. The intelligent agents present in the CE handle the request for data sets and use the LTP algorithm (learning, thinking, and perception) to effectively schedule the tasks using three phases. The CE also finds a unique solution for placing data sets dynamically nearer to the execution site based on network resource considerations by reducing the waiting time and data availability time for I/O-intensive jobs. The performance of the CE is validated by simulation and compared with that of existing scheduling strategies. The results of the simulation show that CE optimizes the data availability time, waiting time, data transfer time, and makespan.

___

  • [1] Foster I, Kesselman C, Tuecke S. The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform C 2001; 15: 200-222.
  • [2] Pandey S, Buyya R. Scheduling of scientific workflows on data grids. In: IEEE International Symposium on Cluster Computing and the Grid (CCGRID); 2008; pp. 548-553.
  • [3] Foster I, Kesselman C. The Grid: Blueprint for a New Computing infrastructure. San Fransisco, CA, USA: Morgan Kaufmann, 1999.
  • [4] Yu J, Buyya R, Ramamohanarao K. Workflow scheduling algorithms for grid computing. Metaheuristics for Scheduling in Distributed Computing Environ ments, Berlin, Germany: Springer 2008, pp. 173-214.
  • [5] Friedenberg J, Silverman G. Cognitive Science: An Introduction to the Study of Mind. Thousand Oaks, CA, USA: Sage, 2012.
  • [6] Ko lodziej J, Xhafa F, Barolli L, Kolici V. A taxonomy of data scheduling in data grids and data centers: problems and intelligent resolution techniques. In: IEEE International Conference on Emerging Intelligent Data and Web Technologies (EIDWT); 2011; pp. 63-71.
  • [7] Yu J, Buyya R. A taxonomy of workflow management systems for grid computing. J Grid Comput 2005; 3: 171-200.
  • [8] Kwok YK, Ahmad I. Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distr Com 1999; 59: 381-422.
  • [9] Schopf JM. Ten actions when grid scheduling. In: Grid Resource Management: State of the Art and Future Trends. Norwell, MA, USA: Academic, 2003. pp. 15-23.
  • [10] Ranganathan K, Foster I. Decoupling computation and data scheduling in distributed data-intensive applications. In: IEEE International Symposium on High Performance Distributed Computing (HPDC); 2002; pp. 352-358.
  • [11] Rehn J, Barrass, Bonacorsi D, Hernandez J, Semeniouk I, Tuura L, Wu Y. PhEDEx high-throughput data transfer management system. In: International Conference on Computing in High Energy and Nuclear Physics (CHEP); 2006; pp. 173-177.
  • [12] Mohamed, Hashim H, Epema DH. An evaluation of the close-to-files processor and data co-allocation policy in multiclusters. In: IEEE International Conference on Cluster Computing; 2004; pp. 287-298.
  • [13] Cameron DG, Carvajal-Schiaffino R, Paul Millar A, Nicholson C, Stockinger K, Zini F. Evaluating scheduling and replica optimisation strategies in OptorSim. In: IEEE International Workshop on Grid Computing; 2003; pp. 52-59.
  • [14] Shibata T, Choi S, Taura K. File-access characteristics of data-intensive workflow applications. In: IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid) ; 2010; pp. 746-755.
  • [15] Lin YF, Liu P, Wu JJ. Optimal placement of replicas in data grid environments with locality assurance. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS); 2006; pp. 736-744.
  • [16] Kumar S, Kumar N. Network and data location aware job scheduling in grid: improvement to GridWay meta scheduler. Int J Grid Distr Comput 2012; 5: pp. 87-100.
  • [17] McClatchey R, Anjum A, Stockinger H, Ali A, Willers I, Thomas M. Data intensive and network aware (DIANA) grid scheduling. J Grid Comput 2007; 5: pp. 43-64.
  • [18] Mansouri N, Dastghaibyfard GH, Mansouri E. Combination of data replication and scheduling algorithm for improving data availability in data grids. J Netw Comput Appl 2013; 36: 711-722.
  • [19] He X, Sun XH. Incorporating data movement into grid task scheduling. In: International Conference on Grid and Cooperative Computing (GCC); Berlin, Germany: Springer, 2005. pp. 394-405.
  • [20] Wolski R, Spring NT, Hayes J. The network weather service: a distributed resource performance forecasting service for metacomputing. Future Gener Comp Sy 1999; 15: 757-768.
  • [21] Buyya R, Murshed M. GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr Comp Pract E 2002; 14: 1175-1220.