Causal Feature Selection for Tropical Cyclone Intensity Forecasting

Beucler, Tom; Beucler, Tom

When developing empirical schemes for statistical forecasting, domain knowledge is often insufficient to accurately choose all drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Traditional feature selection algorithms, which select drivers based on goodness of fit, ignore confounders and learn mappings that may generalize poorly to unseen weather conditions. In contrast, causal discovery, which learns cause-and-effect relationships from data, can select drivers that are causally related to the forecasted variables and improve the empirical scheme’s robustness.

In this two-part presentation, we demonstrate the added value of causal feature selection for the statistical forecasting of tropical cyclone intensity. The first part is published in Environmental Data Science [1] while the second part is ongoing research.

First, we establish the superiority of multidata causal feature selection over standard feature selection using 260 tropical cyclone cases from the Western Pacific Ocean basin. Multidata causal discovery draws statistical strength from an ensemble of time series datasets to robustly estimate a single optimal set of causal drivers. Using the M-PC1 algorithm [1], which infers parts of the causal graph through iterative conditional independence tests, we filter out causally spurious links before passing the remaining causal features as inputs to ML models (multiple linear regression and random forest) that predict the targets. Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, improving the prediction scheme’s generalizability. Machine learning algorithms using M-PC1 causal feature selection outperform noncausal baselines, including Long Short-Term Memory algorithms, and other feature selection methods (lagged correlation, lasso-based, XAI-based, and random). The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of tropical cyclone intensification.

These new potential drivers are explored in this presentation’s second part, which revisits the features of the proven Statistical Hurricane Intensity Prediction Scheme (SHIPS) using causal discovery. Here, we recreate the operational SHIPS’ drivers from ERA5 and try to find the optimal drivers to add and the best way to reduce their dimensionality. Preliminary results using ~50 cases from the North Atlantic Ocean basin (1) suggest that inner-core surface vorticity is the most causally-relevant driver that can be added to existing SHIPS predictors; (2) confirm the causal relevance of some existing SHIPS predictors, such as shear variables and the number of cold GOES infrared pixels at 50-200 km radii; and (3) point to potential improvements in SHIPS’ empirical potential intensity, which uses an empirical formula based on sea surface temperature. Ongoing research aims at consolidating these preliminary results through SHIPS’ standard validation procedure, which for the period 2005-2020 represents over 10,000 hindcasts in the North Atlantic Ocean basin.

In summary, we can inform and enrich empirical schemes by uncovering new drivers while improving generalization by eliminating confounders, highlighting the potential of causal feature selection for a broad range of statistical forecasting problems.

Reference: [1] Ganesh et al. (2023): Selecting robust features for machine-learning applications using multidata causal discovery. Environmental Data Science, 2, E27. doi:10.1017/eds.2023.21

J1B.2 Causal Feature Selection for Tropical Cyclone Intensity Forecasting