Also useful and interesting from a descriptive climatological point of view should be information on the most prominent contiguous hour-to-hour wind patterns that occur climatologically at specified times of the year. Such information would be a useful complement to the more individual-hourly focused statistics. In the same manner as there are favored individual hourly directions and related speeds, there are undoubtedly preferred hour to hour patterns or “modes”. Resolving modes of this kind could be considered a clustering problem, and k-Means Clustering Analysis is a frequently used technique for this purpose. One sticking-point or “nuisance factor” involved with traditional k-Means Analysis is that the researcher has to guess how many clusters in advance, the ultimate choice of how many there “are” usually requiring trial-and-error iterations and subjective judgment. Recent developments from the Data Mining field, however, have resulted in adaptation of the V-fold Cross-Validation Algorithm, which incorporated into k-Means Analysis allows an objective determination of the optimal number of clusters or “modes”.
Utilizing these tools, identification of the optimal number of diurnal wind modes is done for four Southern California Coastal stations: Santa Barbara, Los Angeles Int'l Airport (LAX), Long Beach, and North Island (near San Diego), for the months June-August covering the period 1949 to 2010, inclusive. Data input consist of the u and v components for each of the days' 24 wind observations, plus hourly surface relative humidities, creating a clustering problem in 72-dimensional space. Relative humidity data potentially separate out diurnal wind regimes associated with fair or cloudy conditions. After generation of the clusters, the respective cluster (“modal”) mean u's , v's, and are reconstructed into hourly resultant wind statistics (direction, speed, constancy), along with their associated relative humidities, facilitating intra- and inter-station descriptions and comparisons. In data mining terminology, this is an “unsupervised” application, as the number of clusters is not known in advance.