Monday, 23 January 2012: 11:30 AM
A System for Storing and Analyzing a Massive Climatological Database of Modeled Air Mass Trajectories
Room 346/347 (New Orleans Convention Center )
Poster PDF (415.3 kB)
To characterize the frequency and patterns of transport from the United States to the Arctic Circle, we employed Python to develop a system for querying pre-generated air mass trajectories from the National Oceanic and Atmospheric Administration's Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model. We generated 600 million forward trajectories (140 billion data points) covering the conterminous United States at multiple starting heights over a 30-year time period. To allow for efficient access and processing of the data set, we developed a new packed file format for trajectories (PFFT). To query, aggregate, regrid, and display results, we developed a flexible command-line interface (make_grid.py). The PFFT format was designed to preserve the details of HYSPLIT's textual output in a space efficient binary format, while also enabling rapid indexing and single-pass analysis. To enable a wide range of analysis queries, we developed a command-line analysis tool that makes use of a map/reduce technique. As the PFFT files are processed in a single pass, the data points are spatially projected to the desired output grid and inductively reduced to a single value at each grid point. The make_grid.py tool accepts query predicates, projections, and reductions in the form of Python lambda expressions, which allowed us to quickly leverage the full Python syntax as part of our query mechanics. The tool allows the analyst to perform queries such as, “In March, what is the average travel time for trajectories from the United States to reach the Arctic Circle?” and receive mapped results. Python was a natural fit for this project due to its easy readability, together with its combination of functional programming (FP) and object-oriented (OO) features. These features made it possible for us to structure the file format code in an OO way while using FP features for querying purposes.
Supplementary URL: