4.1
HPC Design for Estimating High Resolution Weather Conditions

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Tuesday, 19 January 2010: 8:30 AM
B217 (GWCC)
Dorren R. Schmitt, The Weather Channel, Atlanta, GA; and J. Matthews III

Presentation PDF (333.3 kB)

The purpose of this paper is to describe the operational infrastructure used to estimate current weather conditions know as High Resolution Assimilation of Data (HiRAD) and how the hardware and software specifications were measured. Previous papers have described the meteorological, verification, and data-denial aspects of HiRAD. This paper will describe the methods of integrating commercially available software in a non-traditional manner to provide the basic distributive infrastructure to allow HiRAD to run in near real time.

HiRAD is a parallel high performance compute cluster designed to distribute the work of computing the current condition variables across 124 CPUs. This is accomplished through employing scheduling job software (PBSPro and Moab), maximizing Network File Systems (NFS), and the scaling of the HiRAD software. The system is designed to maximize the server hardware for deriving the observation data and then submitting additional jobs to synthesize the data for graphical systems and distribute the data to the other systems for displaying the data. The end result is 1.9 million observations across the Conterminous United States (CONUS) in less than 6 minutes. The HiRAD system synthesizes the raw data into 61 fields and distributes all but 18 for each of the 1.9 million points.

Designing of the HPC comprised several key aspects to build a robust system to handle the demands of producing results three times an hour, 24 hours a day, 365 days a year. The first design requirement was to produce results as quickly and reliably as possible. Additionally, the HPC needed to allow for expansion of domain, variables, and data. And finally, the HPC needed to ensure that if hardware failure occurred that there was sufficient computing power to automatically recover from a realistic set of failure. The HPC was designed with the latest technology available in 2005. And with the exception of faster CPUs and more cores per chip available today, this design is the most efficient utilization of computer hardware.