Gene-expression programming—a new tool for creating NWP ensemble averages and probability forecasts

Stull, Roland B.; Stull, Roland B.

Finding averages and probabilities from an ensemble of NWP outputs is an exercise in function fitting. Most function-fitting approaches assume a functional form (such as a polynomial or neural network) is chosen a-priori by the user. The corresponding regression or error-minimization algorithms are used only to find the parameters or weights in those functions. But how do you know that you picked the best function to start with? Gene expression programming (GEP) assumes that neither the functional form nor the weights/parameters are known. Instead, it uses a computational version of natural selection to test a wide variety of functional forms and weights until it finds a relatively good one. To do this, it first creates a somewhat random population of different candidate functions and weights, and then it evaluates each candidate against the "training set" of data to fine which ones give the best verification scores. Many of less skillful formulations are deleted, the best ones are retained, and some new formulations and weights are created as mutations of the old ones. This new generation of candidate functions is again tested against the training data set, and the natural selection process is invoked again. After many generations, the surviving functions and weights are ones that fit the training set quite well. The winner of this evolutionary competition is then verified against an independent "testing set" of data. This machine-learning approach is called symbolic regression. We show via a multi-year case study for precipitation in complex mountainous terrain that GEP can be used to find a bias-corrected ensemble average from multi-model NWP outputs. We also show how GEP-bias-corrected ensemble members can be combined into a probability forecast.

1B.4 Gene-expression programming—a new tool for creating NWP ensemble averages and probability forecasts