76
Cluster Analysis of Preferred Month-to-Month Precipitation Anomaly Patterns for Los Angeles/San Diego and San Francisco with Bayesian Analyses of Their Occurrence Probabilities Relative to El Nino, Neutral, or La Nina Episodes
Cluster Analysis of Preferred Month-to-Month Precipitation Anomaly Patterns for Los Angeles/San Diego and San Francisco with Bayesian Analyses of Their Occurrence Probabilities Relative to El Nino, Neutral, or La Nina Episodes
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Monday, 5 January 2015
Long-term monthly averages are a standard means of characterizing climatological precipitation variability over the course of a rain year for given meteorological stations. Frequently based on the 30-year period of record, they serve as "normals" which are the basis for anomaly calculations. Such "normals", of course, are only statistical idealizations, and actual individual years' month-to-month patterns invariably depart from “normal” configurations in a variety of ways, not necessarily randomly. Inherent tendencies, for example, may exist for occasional clustering of wet or dry anomalies over multi-month sequences, or alternatively, progressions to oppositely signed deviations, reflecting trough (ridge) to ridge (trough) propagations. Perhaps also there may be distinctive ENSO (El Nino, Neutral, or La Nina) influences on the nature and frequency of the anomaly patterns. To explore these possibilities, this study investigates the existence and relative frequencies of month-to-month precipitation anomaly modes for three California stations with lengthy periods of record: the Downtown stations of San Francisco, Los Angeles, and San Diego. The K-Means clustering analysis methodology integrated with the V-Fold Cross-Validation Algorithm is utilized to resolve the modes. As applied to K-Means, the V-Fold Algorithm is an iterative training sample type procedure that tends to optimize the number of clusters created, depending on the choice of statistical distance metric and percent improvement cutoff threshold. In this analysis the Squared Euclidean metric is utilized along with the 5 percent improvement cutoff threshold. Periods of record examined for all three stations are the 1877-78 thru 2013-14 (July-June) rain seasons. Given the winter rainfall maximum/summer drought character of California coastal stations, the monthly selection includes October-November, December, January, February, March, and April-May. Also, given the close proximity of Los Angeles and San Diego (120 miles apart) and their very similar rainfall climatologies, data for the two stations are merged into a single data base. Thus, the San Francisco cluster analysis is a six-dimensional one, the combined Los Angeles/San Diego one, 12-dimensional. Following application of the K-Means Clustering procedures and identification of the "optimal" number of K clusters for San Francisco and Los Angeles/San Diego, a Bayesian statistical analysis for each is performed that addresses the following: given an impending El Nino, Neutral, or La Nina episode, what are the conditional probabilities that each of the K anomaly patterns will be expressed for the July-June rain season. Assignment of the ENSO Episode types, El Nino, Neutral, or La Nina, by rain year, was based on two lists available from the online NOAA Climate Prediction Center site which covered all rain years back to 1877-78. Actual number of clusters created was four for Los Angeles/San Diego, and six for San Francisco.