RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms
Executing scientific workflows with heterogeneous tasks on HPC platforms poses several challenges which will be further exacerbated by the upcoming exascale platforms. At that scale, bespoke solutions will not enable effective and efficient workflow executions. In preparation, we need to look at ways to manage engineering effort and capability duplication across software systems by integrating independently developed, production-grade software solutions. In this paper, we integrate RADICAL-Pilot (RP) and Parsl and develop an MPI executor to enable the execution of workflows with heterogeneous (non)MPI Python functions at scale. We characterize the strong and weak scaling of the integrated RP-Parsl system when executing two use cases from polar science, and of the function executor on both SDSC Comet and TACC Frontera. We gain engineering insight about how to analyze and integrate workflow and runtime systems, minimizing changes in their code bases and overall development effort. Our experiments show that the overheads of the integrated system are invariant of resource and workflow scale, and measure the impact of diverse MPI overheads. Together, those results define a blueprint towards an ecosystem populated by specialized, efficient, effective and independently-maintained software systems to face the upcoming scaling challenges.
READ FULL TEXT