6.2 Python as a Self-Teaching Tool: Insights into Gaussian Process Modeling Using Python Packages

Wednesday, 9 January 2019: 9:00 AM
North 129B (Phoenix Convention Center - West and North Buildings)
Daniel Gilford, Rutgers, The State Univ. of New Jersey, New Brunswick, NJ

Handout (2.3 MB)

It can be daunting to teach oneself a new statistical or computational method, starting from the fundamentals and following it through to a specific application. Suitable hands-on tools and examples are critically important for the learning process. This presentation details how a reluctant Python adopter developed key insights into Gaussian process (GP) modeling, using tools from several open source Python packages. GP modeling is a non-parametric supervised machine-learning technique which maps inputs (e.g. a forcing or model parameters) to target outputs (e.g. sea-level contributions from an Antarctic ice-sheet); a key advantage of GP modeling is that uncertainty is inherently estimated. I highlight the benefits of using Python a self-teaching tool, namely its simplicity, transparency, and flexibility. The learning process is illustrated with three examples from my sea-level research, which progress in complexity and have specific challenges, pitfalls, and solutions in Python. First, prediction uncertainties and covariance structures are explored by fitting a tide-gauge sea-level timeseries with a GP model constructed of multiple simple covariance functions. Next I perform GP regression between geodetic datums—coordinate system measures for comparing relative sea-level between tide gauge sites—and discuss how applying GP modeling on a sphere helped to build my intuition for length scales in the GP framework. Finally, I use GP modeling to construct a statistical emulator of 21st century Antarctic contributions to sea level rise, trained on 196 ensemble members of an ice-sheet model. Emulator development taught me careful consideration of dimensionality, scale, and computing time in GP modeling. The ice-sheet model emulator is sampled to produce probability distributions that fill intermediate gaps between discrete ice-sheet model outcomes, permitting inversion of sea-level contributions in 2100 to explore 21st century evolution pathways. I close with a reflection on the roles of Python in the learning process and the value of open source scientific tools for the rapid adoption of new methods.

Supplementary URL: https://github.com/dgilford/ams2019

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner