The initial results show that probabilistic forecasts for these special events have quite limited skill. The average precision score is only about 5-6%, while for generic up-ramp and down-ramp events it can reach moderate values around ten times larger (50-60%). At the optimal decision-making threshold, we observe low equitable threat scores near 3% because even though probabilities of detection are above 70%, the false alarm ratios are even larger (above 90%). Only within a narrow range of critical thresholds (the 0.5-7.5% interval) can the forecast be used to economic advantage, where value scores reach up to a level of 40-45%, but drop off quickly on either side of this peak. This means that for cost-loss ratios less than 0.5%, a decision maker should always choose to protect against the event happening, despite its rare nature. For cost-loss ratios greater than 7.5%, a decision maker should never choose to protect against the event happening since losses aren’t too expensive compared to the cost of protection. Because of the relatively infrequent nature of the example event type and the small sample size of observed events during the four-month reforecast period available in this study, it’s likely these results are too pessimistic and can be improved with the use of the full year-long historical WFIP2 reforecast data set that is now available.
Figure: Value score curves as a function of the cost-loss ratio for the WFIP2 control and experimental HRRRNEST (∆x = 750 m) classifier model forecasts of the Bonneville Power Administration (BPA) fleet aggregated power up-ramp events associated with cold-pool mix-outs, zoomed in to the cost-loss ratio range between 0 and 0.1.