The non-systematic nature of PIREPs creates the greatest difficulty for verification; because observations are not consistently available at the same time and location, the sample of reports does not provide a representative sample of the forecast grid. Impacts of this characteristic have been taken into consideration in most recent studies of the quality of icing and turbulence forecasts. In particular, most studies have limited the statistics computed, and have not computed the False Alarm Ratio (FAR) and other statistics that require stratifying the data by the forecast type. However, questions frequently are raised regarding why it is inappropriate to compute these statistics using PIREPs. The purpose of this paper is to clearly demonstrate why FAR and other statistics should not be computed when PIREPs are used as verifying observations for forecasts across a grid or large domain. In addition, some examples of the impacts of computing these statistics are presented.
The primary analytical result of this study is a simple demonstration that FAR is strongly related to the relative frequencies of Yes and No PIREPs. Thus, when either the number of Yes or No PIREPs is changed, the FAR also changes. In contrast, other statistics (based on stratifying by the observation type), such as PODy and PODn, change very little when the numbers of PIREPs change. This effect is demonstrated for turbulence by supplementing the No observations with AVAR observations and for icing by supplementing the no-icing PIREPs with PIREPs reporting clear above. In both cases, the FAR estimate increases dramatically.
These results are supplemented by a simple simulation study, in which PIREPs are randomly eliminated from a verification analysis. Results of these simulations suggest that when the PIREPs to be removed are randomly selected from both the Yes and No PIREP subsets, there is little effect on any of the verification statistics. However, when the eliminated PIREPs are selected from either the Yes or No subsamples only, the FAR and other statistics are affected. Other statistics that are impacted include the Bias and Critical Success Index, and the Heidke and Gilbert skill scores.
The results of this study firmly demonstrate the limitations of verification statistics computed using PIREPs. In addition, the study emphasizes the value that could be attained through systematic collection of pilot reports.