Tuesday, 8 January 2013
Exhibit Hall 3 (Austin Convention Center)
The state of California, nearly 800 miles long north to south and 250 miles wide east to west, is divided into seven NOAA NCDC Climate Divisions. Based on areal-averaging techniques, single-valued month-by-month precipitation statistics have been compiled, by division, since 1895. With such a huge distance between the northern to southern borders, and the great topographical variation over the State, it would seem likely that the character of rain year (July-June) relative precipitation anomalies may not be consistent, division-to-division, from one year to the next. The degree and nature of these relative anomaly contrasts, their frequencies, and possible relationships to such phenomena as El Nino and La Nina should make more for interesting study. To this end, the existence and relative frequencies of California Climate Division rain year variability modes is investigated utilizing K-means clustering analysis integrated with the V-Fold Cross Validation Algorithm. Period of record is 1895-96 thru 2011-12, some 117 seasons to be examined.
One sticking-point associated with traditional K-means is that the researcher has to guess how many clusters there are in advance, the ultimate choice of how many there are requiring trial-and-error iterations combined with subjective judgment. Recent statistical methodological advances in the data-mining field, however, have resulted in adaptation of the V-fold Cross-Validation Algorithm , a training-sample type procedure which incorporated into K-Means allows for a more objective determination of the optimal number of clusters.
Results, using normalized precipitation values and application of the squared Euclidean distance option, resolved six clusters.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner