Classical methods impose computational burdens when dealing with large quantities of data, historically limiting the scale of applications. For this reason, EOFs are usually only calculated on a small subset of the data (i.e., a 2-D slice), limiting their physical interpretability to the variability at one depth or elevation. Here, we implement a parallelized approach to factorizing an entire 3-D dataset that scales to allow the analysis of climate modes in 3-dimensions, making physical interpretability more transparent.
We apply this method to the full ocean temperature data from CSFR, a global high-resolution reanalysis product spanning 31 years. All 41 levels of ocean subsurface and surface temperatures at each observation interval are flattened into a single vector, omitting missing readings. These vectors are then stacked, one for each six hour period in the observation period from January 1, 1979 to December 31, 2010, resulting in a 1 Terabyte matrix with dimensions of 6.3 million columns (observation sites) and 4.6 thousand rows (observation intervals). We extract the first 20 EOFs corresponding to the columns by parallelizing the computation of the eigenvalue decomposition of an implicit representation of the covariance matrix of the columns, then projecting the observations onto this low-dimensional space to find the EOFs of the rows.
We compare the EOFs computed on the fully 3D field to those computed on simply the surface temperature field to quantify the costs, both computationally and in terms of physical interpretability, of computing the EOFs on only a 2-D slice.