A data-aware cognitive engine for scheduling data intensive applications in a grid

Data-intensive applications produce huge amounts of data that need to be stored, analyzed, and interpreted. A data grid serves as a cost-effective infrastructure for solving these data-intensive applications. Existing scheduling strategies are best suited for handling compute-intensive applications, although they lack in performance while handling data-intensive applications. In this work, a novel mechanism of incorporating cognitive science in a data grid is proposed for scheduling data-intensive workflows. A unique model is derived in which a cognitive engine (CE) is built into the middleware of the data grid. The intelligent agents present in the CE handle the request for data sets and use the LTP algorithm (learning, thinking, and perception) to effectively schedule the tasks using three phases. The CE also~finds a unique solution for placing data sets dynamically nearer to the execution site based on network resource considerations by reducing the~waiting time and data availability time for I/O-intensive jobs. The performance of the CE is validated by simulation and compared with that of existing scheduling strategies. The results of the simulation show that CE optimizes the data availability time, waiting time, data transfer time, and makespan.