6.3 Bayesian Verification of Warnings (2010

Consider a system that issues warnings of intermittent hazards (e.g., severe thunderstorm warnings, tornado warnings), or deterministic forecasts of intermittent events (e.g., heavy precipitation, strong wind, blizzard). The performance of such a warning system is usually evaluated in terms of a probability of detection (POD) and a false alarm rate (FAR). Oftentimes, the two measures are aggregated into a critical success index (CSI). These verification measures are examined from the viewpoint of a rational decider who, following the precepts of Bayesian theory, uses warning to choose optimal action by minimizing the expected disutility of outcome. In the long-run, the performance of this warning-decision system is evaluated in terms of the integrated minimum expected disutility (D). Then, given two warning systems, A and B, system A is said to be more informative than system B, if D_A≤ D_B for every disutility function of outcome (equivalently, for every rational decider).

A proof is presented that, while POD and FAR are sufficient verification measures (in the sense that POD_A ≥ POD_B and FAR_A ≤ FAR_B imply that A is more informative than B), the CSI is not. In fact, the CSI is a misleading measure because there exist cases (theoretically an infinite number of them) with CSI_A > CSI_B (implying A is better than B), but D_A> D_B (implying A results in a greater integrated disutility than B does, and, therefore, A is less preferred than B from the viewpoint of a rational decider).

A new aggregate measure, called the characteristic utility score (CUS) is derived. With respect to the informativeness relation, CUS is a necessary measure (A being more informative than B implies CUS_A≥ CUS_B), and a conditionally sufficient measure (CUS_A ≥ CUS_B implies that A is more informative than B for a class of deciders with a particular ratio of disutility differences). In conclusion, if management wants to provide an incentive for improving the warning system in a way that increases its socio-economic benefits to users, than the correct verification measures are POD, FAR, and CUS.