4.1
HPC Design for Estimating High Resolution Weather Conditions
Presentation PDF (333.3 kB)
HiRAD is a parallel high performance compute cluster designed to distribute the work of computing the current condition variables across 124 CPUs. This is accomplished through employing scheduling job software (PBSPro and Moab), maximizing Network File Systems (NFS), and the scaling of the HiRAD software. The system is designed to maximize the server hardware for deriving the observation data and then submitting additional jobs to synthesize the data for graphical systems and distribute the data to the other systems for displaying the data. The end result is 1.9 million observations across the Conterminous United States (CONUS) in less than 6 minutes. The HiRAD system synthesizes the raw data into 61 fields and distributes all but 18 for each of the 1.9 million points.
Designing of the HPC comprised several key aspects to build a robust system to handle the demands of producing results three times an hour, 24 hours a day, 365 days a year. The first design requirement was to produce results as quickly and reliably as possible. Additionally, the HPC needed to allow for expansion of domain, variables, and data. And finally, the HPC needed to ensure that if hardware failure occurred that there was sufficient computing power to automatically recover from a realistic set of failure. The HPC was designed with the latest technology available in 2005. And with the exception of faster CPUs and more cores per chip available today, this design is the most efficient utilization of computer hardware.