Abstract: Identification of Diurnal Wind Pattern Modes, Optimally Numbered, for Four Coastal Southern California Stations Utilizing the V-Fold Cross-Validation Algorithm Applied to k-Means Clustering Analysis (92nd American Meteorological Society Annual Meeting (January 22-26, 2012))

Monday, 23 January 2012

Identification of Diurnal Wind Pattern Modes, Optimally Numbered, for Four Coastal Southern California Stations Utilizing the V-Fold Cross-Validation Algorithm Applied to k-Means Clustering Analysis

Hall E (New Orleans Convention Center )

Charles J. Fisk, Naval Base Ventura County, Point Mugu, CA

Manuscript (913.7 kB)

Climatological wind variability is an important meteorological element to be considered in planning and decision-making endeavors in addition to being an interesting descriptive climatological topic apart from any specific operational concerns. Wind rose diagrams, for example, can provide insights into the wind character for an individual hour of interest by depicting the most favored compass directions and associated speeds. Resultant wind calculations can be valuable in producing distilled single-value statistics derived from many different individual observations.

Also useful and interesting from a descriptive climatological point of view should be information on the most prominent contiguous hour-to-hour wind patterns that occur climatologically at specified times of the year. Such information would be a useful complement to the more individual-hourly focused statistics. In the same manner as there are favored individual hourly directions and related speeds, there are undoubtedly preferred hour to hour patterns or “modes”. Resolving modes of this kind could be considered a clustering problem, and k-Means Clustering Analysis is a frequently used technique for this purpose. One sticking-point or “nuisance factor” involved with traditional k-Means Analysis is that the researcher has to guess how many clusters in advance, the ultimate choice of how many there “are” usually requiring trial-and-error iterations and subjective judgment. Recent developments from the Data Mining field, however, have resulted in adaptation of the V-fold Cross-Validation Algorithm, which incorporated into k-Means Analysis allows an objective determination of the optimal number of clusters or “modes”.

Utilizing these tools, identification of the optimal number of diurnal wind modes is done for four Southern California Coastal stations: Santa Barbara, Los Angeles Int'l Airport (LAX), Long Beach, and North Island (near San Diego), for the months June-August covering the period 1949 to 2010, inclusive. Data input consist of the u and v components for each of the days' 24 wind observations, plus hourly surface relative humidities, creating a clustering problem in 72-dimensional space. Relative humidity data potentially separate out diurnal wind regimes associated with fair or cloudy conditions. After generation of the clusters, the respective cluster (“modal”) mean u's , v's, and are reconstructed into hourly resultant wind statistics (direction, speed, constancy), along with their associated relative humidities, facilitating intra- and inter-station descriptions and comparisons. In data mining terminology, this is an “unsupervised” application, as the number of clusters is not known in advance.

Supplementary URL: