Toward Prediction of Pyrocumulonimbus with Machine Learning

Nguyen, Chuyen; Nguyen, Chuyen

Under favorable atmospheric conditions, intense heat from a large and hot wildfire can generate deep, smoke-infused storms resembling conventional thunderstorms that are known as pyrocumulonimbus (pyroCb). PyroCbs are capable of releasing a large quantity of smoke particles into the lower stratosphere, often above the tropopause as well as above jet aircraft cruising altitudes by several kilometers. PyroCbs can be accompanied by strong and erratic inflow, potentially dangerous downbursts, and lightning strikes. These extreme events can increase fire spread rates and intensity, cause sudden changes in fire spread directions, and ignite additional fires. These storms are especially dangerous for fire fighters and others involved in disaster response. In this research, we have developed a machine learning system for understanding and detecting the atmospheric potential of wildfires in producing proCb as Wildfire–driven Thunderstorms. This is challenging because pyroCbs are extreme events: there are ~500 incidents during the 2013 – 2021 period. Machine learning typically struggles to learn from relatively few examples and highly imbalanced datasets. Our pipeline involves fusing input data sources to assemble a training dataset, applying feature selection and data balancing techniques, and training with several machine learning algorithms. Finally, we perform initial eXplainable AI (XAI) experiments to analyze learned model strategies.
The first stage of the pipeline is to create a labelled training dataset where each row corresponds to a fire, labelled as pyroCb or non-pyroCb. This is done by aligning expert-identified pyroCb coordinates with fire locations automatically detected from the MODIS instrument onboard the Terra and Aqua satellites. PyroCbs are matched to the nearest fire within an acceptable time window and distance. Unaligned fires near the pyroCb are considered ambiguous cases, possibly related to the event, and are rejected from training. The remaining satellite observations make up the non pyroCb cases. We choose time and distance thresholds based on experiments and feedback from domain experts. We analyze the
entire set of unaligned pyroCbs to identify all sources of error, including the fires obscured by cloud cover from the pyroCb itself.
We then strategically prune the selection of non-pyroCb cases based on the distribution of pyroCb fire data. This is done both to reduce the class imbalance as well as to remove potentially problematic cases.The training data contains many weak fires, whereas the pyroCb cases are exclusively strong fires. PyroCb development is related to the atmospheric conditions as well as the strength of the fire. But because of cloud cover, many strong fires appear in the database with low fire radiative power and few detected fire pixels. This introduces confusing cases into the training dataset. Our main goal is to learn the atmospheric conditions that are favorable to pyroCb develop. So, we use pyroCb data to threshold the non-pyroCb cases to only those with fire strength comparable to pyroCbs. Since the fire variables are known to be unreliable, we train with only the atmospheric features. Atmospheric conditions were estimated using nearest-point 0-12 hour forecasts from the Navy’s NAVy Global Environmental Model (NAVGEM), and the National Centers for Environmental Prediction Global Forecast System (GFS). Atmospheric model errors further complicate the training problem since the simulated atmosphere may not fully represent the pyrocb environment. Comparing the differences between models trained separately with GFS and NAVGEM helps estimate the impact of these errors.

4.6 Toward Prediction of Pyrocumulonimbus with Machine Learning