A Classification Model for Daily Lightning Intensities in Alaska

Hostler, Joshua; Hostler, Joshua

In 2015 Alaska experienced one of the most extreme early fire seasons on record with 99% of acres burned attributable to lightning. Subsequently, the 2022 fire season became the earliest on record to reach 1 million acres burned – again largely driven by lightning activity. Fire management would benefit from skillful outlooks on lightning likelihood to use for planning. We use Self-Organizing Maps (SOMs) as an input layer for a random forest classifier to predict daily lightning likelihoods.

A Self-Organizing Map is an unsupervised clustering algorithm that, when applied to gridded atmospheric variables, determines common daily weather patterns (the attached figure shows a trained SOM for 500hPa geopotential height anomalies). We train SOMs on ECMWF Reanalysis v5 (ERA5) daily anomalies for the months June and July from the years 1959 to 2022 for the following variables: 500hPa geopoential height, sea level pressure, 2 meter temperature, 850hPa temperature, convective available potential energy, and total column water vapor.

In this study, we classify days as low, medium, and high by tercile of daily lightning counts computed from the Alaska Lightning Detection Network historical lightning dataset. Each daily record is associated with a weight in the 2D SOM network. This reduces the dimensionality of the raw gridded data from about 200,000 - 34,001 pixels and 6 variables - to just 12. These results are then used to train a random forest classifier which uses an 80-20 train test split and 5-fold cross-validation for hyperparameter tuning. Table 1 is the confusion matrix for the test dataset.

		Predicted
		Low	Middle	high
Actual	Low	73	28	24
	Middle	34	37	47
	High	12	19	81

Our model shows skill in classifying low and high tercile lightning days, with mean AUROC and F-1 scores of 0.7 and 0.53, respectively (climatology results in scores of 0.5 and 0.33 respectively). Classification of middle tercile days marginally outperforms the baseline scores. True upper tercile are correctly predicted at the highest rate. Table 2 summarizes the classification metrics for each class.

Class	Precision	Recall	F1-Score	AUROC
Low	0.613	0.584	0.598	0.767
Medium	0.440	0.314	0.366	0.559
High	0.533	0.723	0.614	0.785
Mean	0.529	0.540	0.526	0.704

Test scores improved over validation suggesting the model will perform similarly given new data. Future work will conduct an in-depth model evaluation including the examination of feature importance and identifying sources for model error. We also plan to apply this methodology to classify lightning-days from days without lightning. Finally, we plan to employ this model with seasonal dynamical forecasts to construct a multi-model seasonal outlook.

11.4 A Classification Model for Daily Lightning Intensities in Alaska