Today, data-intensive applications rely on geographically distributed systems to leverage data collection, storing and processing. Data locality has been seen as a prominent technique to improve application performance and reduce the impact of network latency by scheduling jobs directly in the nodes hosting the data to be processed. MapReduce and Dryad are examples of frameworks which exploit locality by splitting jobs into multiple tasks that are dispatched to process portions of data locally. However, as the ecosystem of big data analysis has shifted from single clusters to span geo-distributed data centers, it is unavoidable that data may still be transferred through the network in order reduce the schedule length. Nevertheless, there is a lack of mechanism to efficiently blend data locality and inter-data center data transfer requirement in the existing scheduling techniques to address data-intensive processing across dispersed data centers. Therefore, the objective of this work is to propose and solve the makespan optimization problem for data-intensive job scheduling on geo-distributed data centers. To this end, we first formulate the task placement and the data access as a linear programming and use the GLPK solver to solve it. We then present a low complexity heuristic scheduling algorithm called GeoDis which allows data locality to cope with the data transfer requirement to achieve a greater performance on the makespan. The experiments with various realistic traces and synthetic generated workload show that GeoDis can reduce makespan of processing jobs by 44% as compared to the state-of-the-art algorithms and remain within $$91\%$$ 91 % closer to the optimal solution by the LP solver.
Computing – Springer Journals
Published: Jul 20, 2017
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.
All for just $49/month
It’s easy to organize your research with our built-in tools.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud