The Utility of Domain Knowledge When Developing Deep Learning Models to Predict Coastal Fog

Collins, Waylon; Collins, Waylon

Coastal fog has an adverse impact on marine transportation along the Middle Texas coast. For example, the closure of the Port of Corpus Christi (4th largest port in the United States based on tonnage) due to fog results in daily losses on the order of millions of US dollars. Accurate and skillful predictions of coastal fog can help mitigate losses. A deep learning model was developed to predict fog occurrence at specific locations along the Texas coast. Output from the High-Resolution Rapid Refresh (HRRR) modeling system serves as input features, and visibility output from Automatic Surface Observing System (ASOS) and Automatic Weather Observing System (AWOS) sensors represent the target. The variational autoencoder (VAE) deep learning technique was used to generate a lower dimensional representation of the input features which includes a non-linear combination of the most salient features. This new representation served as input into a logistic regression (LR) model to train a fog classifier. The model is thus referred to as VAE-LR. An earlier version of a VAE-LR (using output from the North American Mesoscale modeling system as input features) to predict fog at AWOS station KRAS (Mustang Island Airport in Port Aransas) performed superior to the High-Resolution Ensemble Forecast (HREF) system.

This study investigates and compares two fundamentally different feature selection strategies that were used to determine the subset of all available HRRR variables that serve as input features for the VAE. One strategy involved the sole use of domain knowledge to determine the subset of HRRR variables. The domain knowledge was based on the expert knowledge of meteorologists and results from peer-reviewed research. The second strategy involved a two-step process to determine the reduced feature set. First, all available variables from the HRRR were used as input features to train the VAE-LR (the “kitchen sink” approach). Next, Explainable AI (XAI) techniques were used to progressively remove features that were not considered important to model performance. The reduced feature set was then used to train a new VAE-LR. Performance comparisons (using performance metrics such as the Pierce Skill Score) between the “domain knowledge” and “kitchen sink-XAI” VAE-LR models allowed for an assessment of the utility of a human expert when determining the features to serve as input into a deep learning model.

This study is a collaborative effort between the National Weather Service Weather Forecast Office in Corpus Christi, Texas, the Conrad Blucher Institute for Surveying and Science at Texas A&M University–Corpus Christi, and The Weather Company (an IBM Business.)

11B.4 The Utility of Domain Knowledge When Developing Deep Learning Models to Predict Coastal Fog