4.2 Considerations for Running the NCEP Production Suite on a Heteorgeneous Supercomputing Platform

Thursday, 26 January 2017: 1:45 PM
Conference Center: Chelan 2 (Washington State Convention Center )
Rebecca L. Cosgrove, NCEP, College Park, MD; and S. Earle and M. Kane

The National Weather Service's National Centers for Environmental Prediction Central Operations (NCO) runs the set of weather and environmental models that comprise the NCEP Production Suite (NPS) on a High-Performance  Computing system - the Weather and Climate Operational Supercomputing System (WCOSS).  Typically NCO enters into a 10 year contract with a vendor to provide systems that meet the specified requirements, with provisions for upgrades in compute capacity roughly every 2 years.  In years past, we operated under a "forklift" paradigm -- new systems would be brought in and the NPS as a whole was ported to the new system without disruption to the operational NPS, and then old system was removed.  The paradigm in the current 10-year contract has been that each increase in compute capacity is added onto the existing system while it is running the operational NPS.  There is no wholesale conversion of the NPS, and the system has evolved into a collection of heterogeneous systems integrated into one functional supercomputer.

This presentation will discuss the current configuration of the operational WCOSS -- a system comprised of both IBM and Cray compute components.  We will outline the benefits and challenges associated with the two upgrade methodologies,  and the process of upgrading a system while it continues to deliver real-time operational guidance 24x7x365.  We will present the approach to data handling and resource management on a heterogeneous system, and the methodology used to transition the NPS over a period of time.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner