Making "biggish" dense data easy to use

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Monday, 5 January 2015: 4:00 PM
129B (Phoenix Convention Center - West and North Buildings)
Bill Little, Met Office, Exeter, United Kingdom; and R. Hattersley

Meteorological and oceanographic scientists regularly work with multi-dimensional, dense, numeric arrays of data presented in a variety of data formats. The efficient analysis of these multi-GB datasets on standard multi-core hardware is already a challenge, and rapidly increasing HPC resources producing ever more model data are widening the gap. Ad-hoc solutions tend to involve explicit exposure to data access patterns and/or complex concurrency techniques, distracting from the logical task at hand.

This talk will describe how the Met Office is addressing these issues by developing the Biggus package, which provides support for virtual arrays of unlimited size and lazy operations, including indexing, arithmetic and statistical functions. These features are presented with a simple, conventional syntax that mimics the style of the NumPy package.

By providing a logical model which matches users' existing experiences we allow an easy transition from in-memory usage with minimal learning. By adopting lazy evaluation we widen the scope available for optimisation, both for large-scale I/O access patterns and in-memory computation. As a result Biggus is able to match the performance characteristics of existing best-of-breed climate analysis tools for simple cases, and is able to demonstrate significant improvements for more complex cases.

We have found the expressiveness of Python allows us to implement complicated concurrent evaluation techniques, which can be allied with existing optimised array tools, such as numexpr and numba, to provide an easy-to-use, high-performance platform for out-of-core processing.