Scaling Scientific Python (Core Science Keynote) (Invited Presentation)

Rocklin, Matthew; Rocklin, Matthew

Infrastructural libraries like NumPy and Pandas enabled an ecosystem of scientific software in Python that is intuitive, performant, and interoperates smoothly. This ecosystem established a tradition of providing efficient computation to a broad base of scientists who have only modest training in computer programming. However, as computer architecture evolves towards multi-core, many-core, distributed and cloud systems, infrastructural libraries like NumPy and Pandas start to show their age and call into question the longevity of the scientific Python stack. Fortunately recent advances in compilers, data formats, and distributed task schedulers within the Python ecosystem have also evolved to meet these challenges. This talk describes new advances of recent years to scale the scientific Python stack to parallel architectures while still trying to maintain the dual traditions of efficiency and accessibility.

After a broad overview we will focus on Dask, a library for parallel computing that targets multi-core and distributed systems, and how it has been scaling workloads in array and geospatial computing. We'll discuss both the wins and losses of this system as it has been applied to a variety of geophysical applications.

- Indicates paper has been withdrawn from meeting

- Indicates an Award Winner

337859 Scaling Scientific Python (Core Science Keynote) (Invited Presentation)