In this study, we explore the potentiality of a neuronal classification method (SOM: Kohonen, 1982, 2001) for filling the gaps due to clouds in order to improve daily maps of chlorophyll-a (CHL) estimates in an ocean basin. The SOM is an algorithm for the extraction and classification of features, such as trends, in (and between) input variables. It uses an unsupervised (no need for a priori, empirical or theoretical description of the input – output relationships) learning process, enabling us to identify relationships among the state variables (input variables) of the phenomena under analysis, where our understanding of these is insufficient to be fully described using mathematical equations.
The main idea of the present paper is simple: Chlorophyll-a at ocean surface varies in time and space in complex and large ways. Otherwise, it is mainly coupled to ocean dynamics at time scale of some weeks (Abraham et al .2000, Lévy et al. 2001; 2010, Martin et al.2002 ; 2003, Léhan et al 2007). Since remote sensing has immensely contributed to our ability to obtain physical (dynamics) and biological (CHL) information at comparable temporal and spatial scales, the missing data in ocean color images (biological information) can be modeled by combining at best the knowledge given by ocean dynamics state variables and that brought by the corresponding CHL values. In deed, if we assume, that the ocean state can be locally defined by typical situations involving dynamics and biological information, the principle here is to learn using the SOM algorithm a data set of a large variety of possible ocean state situations and thereafter cluster it into a large number of significant classes representing at best the learn data set. Each class belongs to a specific spatio-temporal environment. A CHL missing value belonging to one ocean state situation is therefore affected to one class and takes its value. The ocean dynamics state variables that were selected to incorporate the reconstruction process are sea surface temperature (SST) and sea surface height (SSH). These variables can easily be observed by ocean satellite remotely sensing and provide information on both horizontal and vertical water mass advection.
The triplet (CHL, SST, SSH) defines thus one input vector of the network to be trained and the reconstruction process is thereafter performed in two phases: 1. A learning process using the SOM algorithm that leads to a topologically ordered mapping of the input vectors (Learning data set L). Similar patterns on the learning data set are mapped onto neighbouring regions on the map, while dissimilar patterns are mapped further apart. In this way, the input vectors coding is crucial in the Learning process and therefore for the reconstruction performances. 2. A decoding process performed on data test sets T (situations of (CHL, SST, SSH) not learn) that leads, knowing the class to which belongs a given input vector of T, to provide an estimate of the related CHL value.
However, according to the cloud coverage on CHL and SST images, the SOM inputs vectors may have a reduced number of valid components (components that no contain missing values), leading to an inaccurate classification during the learning phase and thereafter a bias in the determination of the CHL missing value during the decoding phase. That's why, we proposed, to improve the “standard” SOM learning, an iterative SOM based algorithm that aims to fill in an iterative way missing values belonging to the learning data set. The objective is to obtain a more representative map of the phenomena under analysis (situations of (CHL, SST, SSH)) and more adapted to the quality of real data (missing data, noise). The successive iterations led to transform gradually the learning data set. We decided to use the percentage of CHL missing data as termination criterion. Initially, the percentage of CHL missing data was 60%. After the third iteration, the percentage of CHL missing data reached 15%. We therefore decided to stop the iterative process as we considered that we had enough valid data (components not affected by clouds) for the Learning process and that the classification would be more accurate.
The present work was performed on the Mauritano-Senegalo upwelling region (24°-33°W, 19°-24°N) where the Chlorophyll concentration is high and the seasonal variations are relatively small. It was focused on December, January and February 2003, period for which the cloud coverage is quite weak.
Several experiences simulating different situations of cloud coverage have shown the robustness of the reconstruction method proposed especially in resolving compact clouds of large size and persisting over time (clouds coverage of 450 km * 450km lasting until 5 days).
Supplementary URL: