At the usability end of the spectrum is the numerical Python community and the NumPy array. The NumPy array is an extremely useful data structure for high level array-based computations, but it does not support distributed data-parallel operations that exist in other high-level languages (e.g. Chapel, Titanium, and, to a limited extent, Matlab and Julia). Projects exist that combine distributed data parallelism with NumPy array syntax to varying degrees of success; none have been widely adopted. These solutions either do not cover enough use-cases to be generally useful, they suffer from significant performance degradation, or they do not integrate well with other distributed array paradigms. We present Optimized Distributed NumPy (ODIN), a project that aims to address all of these limitations, thereby bringing the ease of use of NumPy to data-parallel high-performance computing. Importantly, ODIN arrays will be compatible with the Trilinos project's distributed data structures, allowing an end user access to the proven performance of Trilinos' library of HPC solvers.
ODIN arrays, simply, allow one to create, manipulate and compute with NumPy arrays in a distributed fashion. All of NumPy's core features will be supported: slicing, ufuncs and broadcasting, simple and heterogeneous datatypes. To ease algorithm development and to minimize effort for users who are not parallel programming specialists, the syntax and semantics of serial NumPy operations are preserved to the extent possible. When node-level control of computations is required, ODIN allows an end user to write a procedure that will run at the local level on the local segment of an array. These user-provided, local-level functions are available for use by the ODIN infrastructure at the global level, as if it were a built-in operation. This feature provides the capability to integrate existing serial algorithms or parallel code with ODIN-based computations in a natural way. When using ODIN arrays, the user can optionally turn on parallel performance feedback to help understand which NumPy operations result in poor scaling with respect to communication patterns, which will aid porting a serial computation to use ODIN arrays.
For several usecases, precise control of the domain decomposition is required: unstructured meshes, finite differencing with halo vectors, adaptive mesh refinement, etc. It is also necessary to have control over the locality of computation, especially when load balancing is a required feature of a computation. Therefore, allowing the user to control how an array is distributed and how many nodes are used for a particular array computation will be useful in a number of situations, giving the end-user who needs full control the ability to specify these aspects of an ODIN calculation.
In terms of optimizations, ODIN will support array expression analysis to optimize the communication required to compute with distributed arrays. Also, performance-oriented decorators and context managers will provide runtime profiling, restrict the allowed operations for performance-critical applications, and will allow control of the locality of an operation.
ODIN aims to be a comprehensive HPC framework for distributed computing with Python. It will bring the flexibility and expressiveness of Python to HPC simulation and modeling as well as the analysis of large datasets.
This work will be carried out in partnership with Sandia National Laboratories as part of a DOE-funded SBIR grant, currently in Phase I.