“Consider mechanically integrating judgmental and statistical forecasts instead of making judgmental adjustments to statistical forecasts …Judgmental adjustment (by humans) of (automatically generated statistical forecasts) is actually the least effective way to combine statistical and judgmental forecasts … (because) judgmental adjustment can introduce bias (Mathews and Diamantopoulos, 1990) (Stern (1996) documents forecaster over-compensation for previous temperature errors)

…The most effective way to use (human) judgment is as an input to the statistical process … Cleman (1989) reviewed over 200 empirical studies on combining and found that mechanical combining helps eliminate biases and enables full disclosure of the forecasting process. The resulting record keeping, feedback, and enhanced learning can improve forecast quality …” (Sanders and Ritzman, 2001)

Some 30 years ago, Snellman (1977) lamented that whereas the initial impact of guidance material was to increase the accuracy of predictions on account of a healthy human/machine 'mix', operational meteorologists were losing interest and that the gains would eventually be eroded by what he termed the 'meteorological cancer'. Snellman suggested that producing automated guidance and feeding it to the forecaster who 'modifies it or passes it on', encourages forecasters 'to follow guidance blindly' and concluded by predicting an erosion of recent gains. Hindsight informs us from forecast verification statistics that the erosion of gains did not take place.

In fact, the accuracy of forecasts continued to increase. Nevertheless, evidence is emerging that the increasing skill displayed by the guidance material is rendering it increasingly difficult for human forecasters to improve upon that guidance (Baars and Mass, 2005; Ryan, 2005).

Stern (1980) outlines a possible approach to the determination of an optimal human-machine mix. He refers to Sanders (1973), who investigated the skill displayed by daily temperature and precipitation forecasts made in the Department of Meteorology at the Massachusetts Institute of Technology. Sanders found that “few if any individuals who made a substantial number of forecasts outperformed consensus on the average.”

Stern also refers to Thompson (1977), who, noting Sanders' work, suggested that “an objective and quantitative method” be used to reach a consensus (bearing in mind) … the incontrovertible fact that two or more inaccurate but independent predictions of the same future events may be combined in a very specific way to yield predictions that are, on the average, more accurate than either of them taken individually.” Danard (1977) comments on Thompson's method, whilst Danard et al. (1968) discusses the subject of optimally combining independent estimates from a numerical analysis perspective.

Both Thompson (1977) and Danard (1977) discuss how one may optimally combine two forecasts if the assumption that they are independent is made. If this assumption is made, it is implied that the correlation coefficient (σ) between errors produced by the two forecasting methods, is equal to zero. Methods of forecasting a particular weather element are usually based on similar physical principles and therefore the sets of errors produced by the methods tend to be quite highly correlated. Indeed, Danard (1977) acknowledges “that σ=0 is likely to be more valid for two measurements than for two predictions.” Therefore, the validity of the approaches of Thompson (1977) and Danard (1977) is limited by their assumption that σ=0.

With this in mind, Stern (1980) suggests that the approach maybe applied under the assumption that σ is not equal to zero by applying multiple linear regression to forecast verification data, in order to minimise forecasts errors. Stern (1980) and Stern and Dahni (1981 & 1982) (refer also to Dahni et al. (1984)) subsequently demonstrated that forecasts would be improved were one to simply average predictions from different sources. In the context of the foregoing, this assumes that the predictions from the different sources are equally skilful. This is not an unreasonable assumption - to justify unequal weights there needs to be 'strong evidence to support unequal weighting' (Armstrong, 2001).

Indeed, a common method for combining individual forecasts is to calculate an equal weighted average of individual forecasts' (Stewart, 2001). Combining forecasts by mathematically aggregating a number of individual forecasts increases the reliability of forecasts (Kelley, 1925; Stroop, 1932) and averages out unsystematic errors (but not systematic biases) in cue utilization. Nevertheless, Krishnamurti et al. (1999) found that weather forecasts based on a combined forecast using weights based on regression were more accurate than combined forecasts with equal weights.

In recent years, there has been considerable effort directed towards how to optimally combine forecasts from different sources (for example, refer to Aksu and Gunter (1992), Vislocky and Fritsch (1995), Brown and Murphy (1996), Ebert (2001), Etherton (2004), Ryan (2005), Woodcock and Engel (2005), and Stern (2006)).

Sanders and Ritzman (2001) highlight the difficulty associated with utilising (human) judgment as an input to the statistical process 'when the (human) forecaster gets information at the last minute'.

The paper describes a system that mechanically combines judgmental (human) and computer generated forecasts.

__ FIGURE. Combining human (official) and computer generated (statistical) forecasts of Probability of Precipitation: __

· Firstly, the estimate from a statistical model (of 62%) is averaged with the implied estimate from the NOAA Global Forecasting System (GFS) (of 100%) to yield 81% (note that the “implied estimate” from the NOAA GFS is taken to be 100%, where a rainfall amount of at least 0.2 mm is indicated by the NOAA System, 50%, where a rainfall amount of 0.1 mm is indicated, and 0%, where no rainfall is indicated);

· Secondly, this 81% outcome is then averaged with the previous estimate (generated ‘yesterday') by the knowledge based system (of 65%) to yield 73% (the benefit of this step lies in it preserving some “memory” of the previous forecast, and hence results in a more consistent series of forecasts between Day-7 and Day-1; note that, for the “previous forecast” for Day-7, the “previous forecast” is taken to be the climatological normal); and,

· Finally, this 73% is then averaged with the implied estimate from the human (official) forecast (of 47%) to yield 60%.

Supplementary URL: http://www.weather-climate.com