While the wave component of GEFSv12 provides reliable wave forecasts, particularly within the first week, there are still biases and significant scatter errors observed in forecast ranges beyond 10 days. These issues can be attenuated through post-processing based on data-driven algorithms. This study aims to explore the potential for enhancing the accuracy of wave forecasts by employing post-processing machine learning models. The methodology involves a series of experiments that are assessed, compared, and discussed. The target variable for this study is the significant wave height (Hs). Statistical modeling for post processing has been divided into two parts. The first part is dedicated to a simple univariate linear regression approach using the Quantile Mapping Method, which is applied to each ensemble member individually. Subsequently, the second part utilizes multivariate non-linear models that consider the entire ensemble forecast. For this step, several models are implemented and compared: Support Vector Regression, Random Forest, Gradient Boosting (XGBoost), multilayer perceptron neural networks, and long short-term memory networks.
The methodology employs supervised learning, with models being trained and validated using observations from NDBC buoys located in deep waters. The primary objective is to offer accurate estimates of Hs and to provide optimal probabilities for wave forecasts exceeding predefined levels: 4, 6, 9, and 14 meters. The results are assessed using ensemble-specific metrics, including the Continuous Ranked Probability Score (CRPS), rank histograms, reliability diagrams, and the Brier score.

