Monday, 12 January 2009: 5:00 PM
Cluster analysis of kernel-based principal components
Room 125A (Phoenix Convention Center)
Michael B. Richman, Univ. of Oklahoma, Norman, OK; and I. Adrianto
Cluster analysis (CA) has been applied to meteorological and climatological research for over three decades. In the classical CA, clustering is performed by partitioning data into different groups of similar patterns or structures in the input space. Principal Component Analysis (PCA) has been used for efficiently extracting structures of the data before performing CA. However, traditional PCA is assumes the inputs are linearity related. If the data are nonlinearly related, this assumption is violated. Recently, kernel methods have been used extensively for solving nonlinear problems by mapping the data into a high dimensional feature space. In this work, we utilize kernel methods by computing principal components in the feature space, known as kernel-based PCA (KPCA). By doing so, the linearity assumption is relaxed. Once a suitable set of kernelized PC scores are identified, a CA is applied to examine the cohesion and separation of the clusters.
To test the validity of the approach, NCEP/NCAR reanalysis data are examined, consisting of daily sea-level pressure over 225 gridpoints in North America for Januaries and Julys 1948 to 1994. Both S-mode and T-mode PCA are used to provide the basis for identification of PC scores with (1) time series with spatial coherence (S-mode) versus (2) snapshots of atmospheric flow that are similar (T-mode). Hierarchical (average/Ward's methods) and nonhierarchical (k-means) clustering are used to examine the cohesion and separation. Results indicate that cluster analysis, based on kernel PCA has improved cohesion and similar separation, compared to traditional PCA. Both S- and T-mode patterns will be presented to determine how such nonlinear analyses are interpreted.
Supplementary URL: