Purpose – eScience workflows use orchestration for integrating and coordinating distributed and heterogeneous scientific resources, which are increasingly exposed as web services. The rate of growth of scientific data makes eScience workflows data‐intensive, challenging existing workflow solutions. Efficient methods of handling large data in scientific workflows based on web services are needed. The purpse of this paper is to address this issue. Design/methodology/approach – In a previous paper the authors proposed Data‐Flow Delegation (DFD) as a means to optimize orchestrated workflow performance, focusing on SOAP web services. To improve the performance further, they propose pipelined data‐flow delegation (PDFD) for web service‐based eScience workflows in this paper, by leveraging from the domain of parallel programming. Briefly, PDFD allows partitioning of large datasets into independent subsets that can be communicated in a pipelined manner. Findings – The results show that the PDFD improves the execution time of the workflow considerably and is capable of handling much larger data than the non‐pipelined approach. Practical implications – Execution of a web service‐based workflow hampered by the size of data can be facilitated or improved by using services supporting Pipelined Data‐Flow Delegation. Originality/value – Contributions of this work include the proposed concept of combining pipelining and Data‐Flow Delegation, an XML Schema supporting the PDFD communication between services, and the practical evaluation of the PDFD approach.
International Journal of Web Information Systems – Emerald Publishing
Published: Aug 23, 2013
Keywords: eScience; Scientific workflow; SOAP web services; Orchestration; Pipelining; Data management; Work flow