Creating Bias-Corrected Global Radiation Datasets from Climate Reanalysis Products Using Supervised Learning

Chakraborty, Tirthankar; Chakraborty, Tirthankar

The components of the Earth's radiative budget strongly modulate the planet's weather and climate. Correctly estimating the magnitude of these components is difficult, owing to the complexities of radiative transfer equations, and the strong dependence on latitude and the highly variable composition of gases, aerosols, and clouds in the atmosphere. Of these, diffuse radiation, an important, albeit understudied, component of the budget, which impacts ecosystem response and solar energy generation capacity, shows even higher variability due to the strong modulation by aerosol and cloud properties. Since all components of the radiative budget are not measured with uniform global coverage, climate models are frequently used to calculate them. Unfortunately, climate models have very coarse parameterizations for clouds and aerosols, which lead to uncertainties in the computed radiation values within and between models. Reanalysis datasets reduce these uncertainties by constraining some of these variables using observations. However, clouds are still parameterized and observed aerosol optical properties are assimilated in only one such reanalysis (MERRA-2).

In this study, we evaluate the components of the surface radiative budget in the three reanalysis datasets that currently include diffuse radiation, namely NCEP/NCAR, MERRA-2, and the recently released ERA5, using measurements from the Global Energy Balance Archive (GEBA). At the monthly scale, incoming shortwave radiation is overestimated in both MERRA-2 (mean bias error (MBE) = 22.7 W m^-2) and NCEP/NCAR (MBE=42.2 W m^-2) datasets, while incoming longwave radiation is underestimated (MBE=-18.6 W m^-2 for MERRA-2 and -22.1 W m^-2 for NCEP/NCAR). On the contrary, ERA5 shows underestimation of shortwave (MBE=-5.1 W m^-2) and overestimation of longwave (3.6 W m^-2) radiation. Diffuse radiation is slightly overestimated in NCEP/NCAR (MBE=5.7 W m^-2) and slightly underestimated in ERA5 (MBE=-4.8 W m^-2). MERRA-2-simulated diffuse radiation is much lower than GEBA-observed values (MBE=-21.6 W m^-2), which, combined with the overestimation of shortwave radiation in this dataset, leads to underestimation of the diffuse fraction of radiation by almost 33%. While ERA5 shows the least mean bias error among the three reanalysis products considered, it has the lowest coefficients of determination (r²=0.72, 0.66, and 0.81 for incoming shortwave, incoming longwave and incoming diffuse radiation, respectively), with MERRA-2 performing the best on that front (r²=0.93, 0.89, and 0.86 for incoming shortwave, incoming longwave and incoming diffuse radiation, respectively). Given the higher correlation with observed values in the MERRA-2-simulated surface radiation components and the inclusion of an observation-constrained aerosol field in this dataset, we test several statistical techniques, from multiple linear regressions to random forests to fix the biases in the MERRA-2 dataset using a subset of the GEBA observations as training features. The random forest regression leads to the greatest improvement for all the radiation components (MBE=0.5 W m^-2; r²=0.93 for diffuse radiation and MBE=0.7 W m^-2; r²=0.98 for shortwave radiation after bias correction). We extend this methodology using the hourly observations from the Baseline Surface Radiation Network (BSRN) to create a bias-corrected, global, gridded radiation database, which can be used, among other things, to improve weather forecasting and hydrological modeling, calculate solar energy generation capacity, and predict agricultural productivity. Once trained, this correction algorithm can be applied to any gridded product without relying on observed data, making it more feasible for global scale studies than previous similar corrections for shortwave radiation. Finally, we discuss possible implementation of this methodology to improve partitioning between diffuse and direct beam radiation in offline land-surface models, which continue to perform poorly in model inter-comparison studies.

1A.3 Creating Bias-Corrected Global Radiation Datasets from Climate Reanalysis Products Using Supervised Learning