Experiments on Machine Learning Post-processing Models Applied to Probabilistic Wave Forecasts

Martins Campos, Ricardo; Martins Campos, Ricardo

Probabilistic wave forecasts are a crucial component of oceanic hazard outlooks, particularly for forecast ranges extending beyond 7 days. Currently, a statistical model that generates probability maps for week 2 is under development and evaluation, based on the NCEP’s Global Ensemble Forecast System (GEFS) version 12. The GEFSv12 ensemble comprises 30 perturbed members along with the control member. The GEFSv12 delivers wave forecasts on a spatial resolution grid of 0.25° x 0.25° and with a forecast range of up to 16 days. A 20-year GEFSv12-waves reforecast spanning from 2000 to 2019 has been generated. It includes 5 members with one cycle per day, and once a week it expands to 11 members up to 35 days of forecast.

While the wave component of GEFSv12 provides reliable wave forecasts, particularly within the first week, there are still biases and significant scatter errors observed in forecast ranges beyond 10 days. These issues can be attenuated through post-processing based on data-driven algorithms. This study aims to explore the potential for enhancing the accuracy of wave forecasts by employing post-processing machine learning models. The methodology involves a series of experiments that are assessed, compared, and discussed. The target variable for this study is the significant wave height (Hs). Statistical modeling for post processing has been divided into two parts. The first part is dedicated to a simple univariate linear regression approach using the Quantile Mapping Method, which is applied to each ensemble member individually. Subsequently, the second part utilizes multivariate non-linear models that consider the entire ensemble forecast. For this step, several models are implemented and compared: Support Vector Regression, Random Forest, Gradient Boosting (XGBoost), multilayer perceptron neural networks, and long short-term memory networks.

The methodology employs supervised learning, with models being trained and validated using observations from NDBC buoys located in deep waters. The primary objective is to offer accurate estimates of Hs and to provide optimal probabilities for wave forecasts exceeding predefined levels: 4, 6, 9, and 14 meters. The results are assessed using ensemble-specific metrics, including the Continuous Ranked Probability Score (CRPS), rank histograms, reliability diagrams, and the Brier score.

2.1 Experiments on Machine Learning Post-processing Models Applied to Probabilistic Wave Forecasts