Performance Diagrams Based on Skill Scores

Baldwin, Michael; Baldwin, Michael

Given the high degree of complexity and dimensionality of forecast verification information, visual diagrams which display multiple measures of quality, such as ROC curves (Mason 1982) and performance diagrams (Roebber 2009; also known as precision-recall diagrams), can be beneficial and are frequently used. However, the behavior of various measures of forecast performance can vary as event frequency (base rate) changes. In ROC curves for example, false alarm rate (or probability of false detection) remains at low values for rare events across a wide range of decision thresholds, compressing the useful information to one side of the diagram. And for performance diagrams the ‘no skill’ portion of the chart is often omitted. Since expected values of probability of detection (recall) and success ratio (precision) for random forecasts are proportional to the base rate, for events with relatively high base rates forecasts generated from random guesses will appear to have good performance. To address these issues, performance diagrams can be created using skill scores (using random forecasts as the reference forecast) as the coordinate system. Other useful measures of accuracy, association, bias, and value can be derived as functions of these skill scores, allowing for the display of the geometric relationships between these aspects of performance on a single diagram (shown below). At the conference, these issues will be discussed and examples from operational numerical weather prediction output and machine-learning models will be presented.

15.5 Performance Diagrams Based on Skill Scores