1.3
Modernizing the Operational Workflow and Automation of the NCEP Hurricane Weather Research and Forecast (HWRF) Modeling System using Python and Rocoto
Due to the nature of incremental enhancements and accumulation of legacy codes and scripts, the HWRF workflow design has become over-complicated, had insufficient fault tolerance and was a major drain on manpower; and was sorely in need of replacement. HWRF scripts existed in three different forms: an operational system run by NCO for real-time forecasts, a similar system run by EMC for development, retrospective testing and evaluation, and a less capable but more portable system in the community modeling framework maintained and supported by the NOAA Developmental Testbed Center (DTC) in Boulder. The system used by NCO and EMC had occasional failures both in development parallels and in operations. Furthermore, the system was growing with more than 38,000 lines for the base workflow, and tens of thousands more for graphics and automation, making it increasingly difficult for debugging and troubleshooting. To solve these problems, the EMC and DTC HWRF groups have rewritten the ksh-based HWRF systems to create a Python-based system that is simpler, easier to reconfigure, debug and develop, more fault tolerant and is less error-prone.
This major overhaul of the end-to-end HWRF modeling system has thus far been a successful transition, resulting in a faster, simpler and more configurable system after much effort. This talk will examine the driving forces behind the need for this transition, along with complications that arose due to cultural and technical problems. We will also discuss the flaws in the design of Python and how we worked around them. The custom-made automation tools developed at EMC for conducting real-time and large-scale retrospective forecasts are being transitioned to Rocoto, formerly known as the NOAA Workflow Manager. One of the advantages of Rocoto is its ability to automatically resubmit jobs and track complex dependencies. The Python rewrite will make it feasible for transitioning the automation of HWRF parallels to Rocoto.
We finish with a look at the future of what we suggest should be the path forward for automation systems in operational forecasting and for facilitating more flexible and efficient Operations to Research (O2R) and Research to Operations (R2O) processes. Experience from the Python based HWRF system also has a potential for providing guidance on the design requirements for NWP Information Technology Environment (NITE) project undertaken by DTC to facilitate building and expanding capabilities to easily prepare and run research experiments using NCEP's operational models.