An integrated approach to the prediction and diagnosis of extreme-scale scientific workflows is being taken that focuses on the problems of complexity and of contention. Our approach will enable both explore-in-advance and optimization-at-runtime of workflows and their components. Underlying this is a multi-scale view that enables fine-grained analysis of workflow components using simulation-based tools that allow in-depth analysis and, through suitable abstractions, end-to-end workflow analysis and prediction using analytical prediction for extended workflow coverage as well as for rapid evaluation. Fine-level analyses enable individual workflow components to be explored for a multitude of resource use scenarios and provide insights necessary to guide their run-time optimization. Coarse-grained analyses enable an entire workflow to be explored, and to allow for its initial optimization. Machine learning, based on adaptive sampling, will focus analysis on the parameter space that has the greatest influence on performance. Workflow specific benchmarking will provide input to our mod/sim techniques. Provenance information, collected through instrumentation and monitoring of actual workflow execution, will provide empirical bounds on the expected performance.
The following five key challenges will be addressed in this research:
- High fidelity multiscale approach is a core concept of IPPD that is able to abstract details at varying degrees of resolution and accuracy in space and time, thereby delivering high-fidelity prediction at the workflow component level while still enabling rapid solution critical for the analysis of complex configurations at the system level.
- Analytical modeling of workflow performance will allow designers, researchers, and run-time optimizing tools to make informed predictions about the impact on potential workflow performance caused by the current state of a distributed system. New techniques will be developed to accurately predict the performance of distributed workflows using analytical modeling.
- Co-operative Simulation of workflow components will allow in tandem simulation of storage, system buses, and the communication network. This will overcome the current limitation of typically having dedicated simulators for each component, and will naturally lead into our development of an extensible platform for modeling and simulation.
- Verification and validation of models using a combination of provenance information and predictive analysis on several experimental testbeds. This in itself is a challenge due to a lack of realistic data and the inability of current network simulators to scale beyond moderate network sizes lead to difficulties in validation.
- Usable predictive tools will be developed that will allow the optimization of workflow execution as well as to assist in diagnosing performance issues. In addition we intend to create an open, modular and flexible modeling, analysis and simulation.