A variety of statistical modeling techniques is attempted initially, but many are found to be inappropriate. The best results are obtained by creating four quartile groups of flash count based on climatology, and then using binary logistic regression to develop three prediction equations for each domain, one giving the conditional probability of a quartile one (Q1) lightning event, another for the probability of a quartile three (Q3) or greater event, and a third equation giving the probability of a quartile four (Q4) lightning event. Principal component analysis is used to select a subset of non-redundant predictors that have the greatest physical relevance to convection and lightning in South Florida. The final candidate sounding predictors are the vector mean 1000-700 hPa cross-shore wind component and speed, the K-index, modified Lifted Index, and the temperature at 900 hPa. Non-linear effects are considered by including second, third, and fourth order terms as additional candidate predictors. A combination of stepwise screening and cross-validation is used to select the variables that best generalize to independent data. To determine the most likely quartile of lightning activity, a decision tree scheme is constructed using probability thresholds for the three equations. Finally, the resulting prediction schemes are tested independently using k-fold cross-validation.
The dominant effect in each of the equations is the component of the wind perpendicular to the coastline which is found to have a significant non-linear relationship with lightning activity. Other important variables are the K-index and modified Lifted Index. Day number, persistence, and same day morning activity also are selected as important indicators of afternoon lightning in the two domains.
When each year is treated independently, the Miami-Dade scheme correctly forecasts the quartile ~ 37% of the time and is correct to within one quartile of the observed ~ 79% of the time. The scheme for eastern Broward County forecasts the correct quartile ~ 36% of the time and is correct to within one quartile ~ 77% of the time. The prediction schemes generally are superior to persistence and climatology for both the dependent data and during k-fold cross-validation. Thus, they possess real forecast skill. For example, when forecasting the correct quartile, these results are a ~ 4-6 percentage point improvement over persistence, and ~ 11-12 percentage point improvement over climatology. In terms of correctly predicting to within one quartile of the observed, the two schemes are an improvement over persistence by ~ 6-8 percentage points and over climatology by ~ 14-17 percentage points. Further analysis shows that the two schemes rarely forecast the upper two quartiles when no activity is observed. Additionally, correct predictions of Q4 events are shown to increase with flash count within the Q4 category. Overall, the cross-validation results show only a 1-2% reduction in skill from what is obtained for the fourteen years of dependent data, demonstrating that the two schemes are statistically robust, and can be expected to achieve similar results when implemented operationally.
This paper is a companion to one being submitted by Winarchick and Fuelberg and optimally should be placed in the same session.
Supplementary URL: http://bertha.met.fsu.edu