This study investigates the effectiveness of various machine learning (ML) methods in improving ERA5 winds during the warm season (May-September). In order to quantify ERA5 errors, observed hourly winds were obtained every 20 m from 20-200 m above the ocean surface from two NYSERDA floating lidars in the New York Bight region (~142 km southeast of NYC) from May to September of 2019 to 2022. ERA5 data at several levels from 850-1000 hPa at the lidars and at an eastern Long Island location (KOKX NWS site) were used as inputs for the ML models. In addition, surface atmospheric data (2-meter temperature, boundary layer height, mean sea level pressure, and total cloud cover) and horizontal spatial data over the Northeastern United States at 950 hPa were used to relate the large-scale flow patterns and ERA5 errors in the ML models. Four ML models–the Support Vector Machine (SVM), Random Forest Regressor (RFR), Feed-forward Neural Network (FNN), and Convolutional Neural Network (CNN)–were employed to improve the ERA5 winds.
The RFR exhibited superior performance across various wind speed error metrics, followed by the CNN, suggesting that spatial patterns might not exert a substantial influence on biases in ERA5 data. The RFR indicated that the 1000 hPa u and v wind components at the lidar site, the 1000 hPa u component of wind at the KOKX site, and the boundary layer height at the lidar site contributed most to explaining the variance in the ERA5 wind bias. A similar methodology was followed for improving NOAA’s High Resolution Rapid Refresh model, an operational forecast, for wind power forecasting applications.

