Classification and retrieval of cloudy and clear scenes from CERES TOA radiances using the Random Forest method

Vengasseril Thampi, Bijoy; Vengasseril Thampi, Bijoy

Clouds and the Earth's Radiant Energy System (CERES) provides top of the atmosphere radiative flux estimates on a global scale. This is achieved by combining broadband footprint radiance measurements from CERES instrument with radiances from a high-resolution, multispectral imager on the same spacecraft. To estimate TOA radiative fluxes, CERES radiances are converted to fluxes using empirical angular distribution models (ADMs). ADMs represent a collection of anisotropic correction factors defined for a specific scene type and they depend on variables like viewing geometry, surface type and atmospheric conditions. However, in the absence of imager coverage over the CERES footprints or unavailability of imager data (e.g., due to malfunction of the instrument), accurate scene identification and subsequent estimation of TOA fluxes are difficult. It is observed that 5.6% of all CERES Terra/Aqua footprints contains missing imager information or insufficient MODIS imager data for a reliable scene ID and it can reach up to 50% of data for a specific scene type (Minnis et al., 2003). The unavailability of imager data leads to a situation similar to that encountered in Earth Radiation Budget Experiment (ERBE) TOA flux measurements. The empirical ADMs developed for the estimation of ERBE TOA flux depends on the radiometer geometry and broadband radiance measurements (Barkstrom, 1984) and have relatively large errors compared to CERES ADMs (Loeb et al., 2009). In this study, our objective is to develop a statistical methodology based on the most accurate modern technique to solve the CERES ERBE like problem. The motivation for this study is to develop an ensemble learning method for an improved estimation of CERES scene types using TOA radiance measurements from CERES without the support of any imager data. The methodology will be useful in the estimation of TOA fluxes when there is insufficient imager coverage or complete failure of imager and can be used to classify scene type using CERES radiance and available ancillary data.

Random forests (RF) are an ensemble learning method for classification and regression (Breiman, 2001). They use decision tree classifiers as the base learner, which can be represented using a flow-chart-like tree structure. Random forests operate by constructing a multitude of decision trees at training time and outputting the class that gets maximum number of votes from the forest. Each decision tree in the forest is formed a) by selecting at random from a small group of input variables to split on at each tree node and b) by estimating the best possible split based on these variables. Main advantages of RF method are i) they have faster runtimes and ii) they can deal with unbalanced and missing data. Objective of the study is to classify the CERES scene types using CERES radiances (TOA LW and SW) and ancillary variables with the help of RF method into clear and cloudy scenes without using any imager information. For this purposes we have used the CERES radiances and ancillary variables available from the CERES Single Scanner Footprint TOA/Surface Fluxes and Clouds (SSF) dataset for a single month (July) from 2003 and 2004. Input variables can be split in to two groups; CERES variables and ancillary variables. CERES variables used in the analysis are solar zenith angle (SZA), viewing zenith angle (VZA), relative azimuth angle (RAZ), CERES LW and SW broadband radiances and IGBP surface type. Ancillary variables used in the analysis are LW surface emissivity, broadband surface albedo, surface skin temperature, column averaged relative humidity, surface wind speed and precipitable water. Our primary goal was to test the efficiency of RF method in classifying the CERES radiances in to clear and cloudy classes. For this purpose, a training and test dataset is developed using the multi-year SSF data. A typical monthly CERES SSF dataset contains millions of CERES footprints spread all over the globe and are very difficult to process simultaneously. In order to create more compact training and test dataset, the SSF dataset is stratified in the variable of interest (SZA, VZA, RAZ, cloud fraction, etc.) and corresponding mean values are used in the analysis. The training dataset is classified in to clear and cloudy classes and labeled while the test dataset is unlabelled. Using the Random forest algorithm and training dataset, a trained forest of decision trees are built and saved. Using this trained forest algorithm, the test dataset is classified in to clear and cloudy classes belonging to various surface types. RF classification of CERES scene types in to clear and cloudy classes show very good results with average classification error being < 5% for most of the surface types. The classification error shows considerable increase (> 10%) for most surface types when ancillary variables are removed and only CERES variables alone are used (ERBE like approach). This study shows that using Random forest method, it is possible to successfully classify the CERES scene type into clear and cloudy scenes without using any imager information.

138 Classification and retrieval of cloudy and clear scenes from CERES TOA radiances using the Random Forest method