Optimizing Storage Performance of Existing Reproducible Workflows in a Shared HPC Cluster

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Thursday, 8 January 2015: 1:45 PM
128AB (Phoenix Convention Center - West and North Buildings)
Patrick Calhoun, University of Oklahoma, Norman, OK; and B. S. Herzog, K. H. Knopfmeier, and H. Neeman

As Ken Batcher, emeritus professor of Computer Science at Kent State University, famously said, "A supercomputer is a device for turning compute-bound problems into I/O-bound problems." Performance of legacy code on shared high performance computing (HPC) clusters grows increasingly I/O bound with subsequent hardware generations. Challenges to optimizing workloads can include a lack of feasibility of restructuring I/O patterns in inherited source code, reproducibility constraints that forbid modifying existing I/O patterns, and I/O patterns that may be defined by a software layer outside of the scope of a researcher's project and/or expertise. This study explores methods of optimizing overall research productivity by overcoming I/O bottlenecks of real world workflows via load balancing and taking advantage of opportunistically available higher throughput devices. Two testbed workflows are used. The first workflow employs the Warning Decision Support System -- Integrated Information (WDSS-II) software, centering on "embarrassingly parallel" compute loads with heavy intermediate storage I/O patterns, but with no message passing requirements. The second workflow incorporates the community Weather Research and Forecast (WRF) model. In both cases, we identify potential areas where I/O performance improvements can be achieved.