Abstract: A random-forest turbulence prediction algorithm (87th AMS Annual Meeting)

Monday, 15 January 2007: 11:15 AM

A random-forest turbulence prediction algorithm

210B (Henry B. Gonzalez Convention Center)

Andrew Cotter, NCAR, Boulder, CO; and J. K. Williams, R. K. Goodrich, and J. A. Craig

Poster PDF (198.8 kB)

Unlike traditional pilot reports, in-situ EDR reports of atmospheric turbulence from commercial aircraft contain both positive and negative instances, are reported regularly, and have accurate positions and timestamps. These data therefore make it feasible to perform more sophisticated analyses of the causes of atmospheric turbulence than were formerly possible. Several real-time gridded products derived from satellite, radar and numerical weather model data that represent storm location and intensity currently exist. These include quantities such as echo tops, vertically integrated liquid (VIL), and wind direction and velocity. In this paper, the authors present a methodology used to develop a machine-learning algorithm that predicts in-situ EDR based on the values of VIL and echo tops in a spatial neighborhood extending approximately 300 kilometers around the measurement point. To summarize the values of the gridded products associated with each in-situ EDR measurement, a set of quantities including distances to grid points with data over certain thresholds, maximum data values within each subregion and the proportion of grid points over various thresholds within each subregion were computed. A set of the most useful features for turbulence prediction was then determined using a large-scale automated feature selection algorithm. First, an estimate of the "value" of each candidate feature was calculated by training a large number of decision trees on small random subsets of candidate features and comparing their consensus performance on a testing set both with and without the feature in question. Then, a linear programming problem was formulated in which the "best" subset of features was chosen under the constraint that no two selected features for a given data source could overlap. The selected features and a large training set were then used to train a random forest as a predictive algorithm. Finally, the performance of the random forest on an independent testing set was compared to other turbulence-prediction products.

Supplementary URL: