To support real-time adaptation of distributed LEAD workflows, we are developing a new monitoring and orchestration infrastructure to collect workflow status, application performance, web service monitoring and resource monitoring data and apply that data for adaptive control. The monitoring infrastructure utilizes local sensors that collect data and monitor hardware and software resources for significant changes that should trigger adaptation. The resource sensors collect data on resource queues and status through Globus services and the Network Weather Service (NWS).
Sensors embedded in LEAD application services collect real-time load data, and the workflow monitor tracks workflow progress. This data is analyzed locally to detect anomalies, and decisions are broadcast to other components where adaptation is required. Actuators at critical infrastructure points (e.g., the workflow engine, service factory and resource broker) implement adaptations based on the policy rules. The components communicate through the LEAD event broker, and all relevant components can subscribe to monitoring data and react to the event based on the stated policy. For example, if a resource fails, the resource monitoring component might broadcast a message that would trigger the workflow engine and associated services to react and take appropriate action locally.
We are also constructing a performance model of the LEAD workflow to estimate resource requirements and understand workflow behavior on a diverse resource set, including the resources in the NSF TeraGrid. Using the monitoring data, the performance model and failure models, the adaptive LEAD infrastructure can assess and detect system performance anomalies, ensure recovery and guarantee continued operation of weather forecasting. Monitoring also enables the LEAD system to respond to additional resource requests based on varying weather conditions.
Acknowledgements: LEAD is funded by the National Science Foundation under the following Cooperative Agreements: ATM-0331594 (University of Oklahoma), ATM-0331591 (Colorado State University), ATM-0331574 (Millersville University), ATM-0331480 (Indiana University), ATM-0331579 (University of Alabama in Huntsville), ATM03-31586 (Howard University), ATM-0331587 (University Corporation for Atmospheric Research), and ATM-0331578 (University of Illinois at Urbana-Champaign)
Supplementary URL: