Monday, 11 January 2016
Hall E ( New Orleans Ernest N. Morial Convention Center)
The Climate Data Management System is an object-oriented data management system, specialized for organizing multidimensional, gridded data used in climate analyses for data observation and simulation. The basic unit of computation in CDMS2 is the variable, which consist of a multidimensional array that represents climate information in four dimensions corresponding to: time, pressure levels, latitudes, and longitudes. As model become more precise in their computation, the volume of data generated becomes bigger and difficult to handle due to the limit of computational resources. Model today can produce data a time frequency of one hourly, three hourly, or six hourly for spatial footprint close to satellite data used run models. The amount of time for scientists to analyze the data and retrieve useful information is more and more unmanageable. Parallelizing libraries such as CMDS3 would ease the burden of working with such big datasets. Multiple approaches of parallelizing are possible. The most obvious one is embarrassingly parallel or pleasingly parallel programming where each computer node processes one file at a time. A more challenging approach is to send a piece of the data to each node for computation and each node will save the results at its right place in a file as a slab of data. This is possible with Hierarchical Data Format 5 (HDF5) using the Message Passing Interface (MPI). A final approach would be the use of Open Multi-Processing API (OpenMP) where a master thread is split in multiple threads for different sections of the main code. Each method has its advantages and disadvantages. This poster bring to light each benefit of these methods and seek to find an optimal solution to compute climate data analyses in a efficient fashion using one or a mixtures of these parallelized methods.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner