87th AMS Annual Meeting

Tuesday, 16 January 2007: 9:00 AM
Anticipating the formation of tornadoes through data mining
210B (Henry B. Gonzalez Convention Center)
Amy McGovern, Univ. of Oklahoma, Norman, OK; and D. H. Rosendahl, A. Kruger, M. G. Beaton, R. A. Brown, and K. K. Droegemeier
Poster PDF (1.9 MB)
We propose to enhance our understanding of the formation of severe weather events, specifically focusing on tornadoes, through data mining/knowledge discovery techniques. The process of knowledge discovery is about making sense of data. Generally, the data are too complex for humans to quickly understand and identify the important patterns. Instead, knowledge discovery techniques can be used to highlight salient patterns. We are developing new data mining techniques for use on mesoscale weather data.

Severe weather phenomena such as tornadoes, hail, and floods, annually cause significant loss of life, property damage, and disruption of the transportation systems. Their annual economic impact is estimated to be greater than $13B (Pielke and Carbone, 2002). The current techniques for detecting/predicting these severe weather events are tied to algorithms designed for a particular radar system. Each new radar system requires the development of its own unique detection/prediction algorithms.

Current research on data assimilation (Tong and Xue, 2005; Xue et al., 2006) will lead to a real-time 4-dimensional gridded data set containing information from all possible sensing systems (i.e. radar, surface, sounding, plane, etc). This gridded data set will give a more comprehensive and representative analysis of real-time atmospheric conditions. It will include all fundamental and derived meteorological fields at each grid point across the entire atmospheric domain. A single detection/prediction algorithm can therefore be applied to the gridded data set for each type of hazardous weather with no alterations needed when additional sensing systems are introduced. Our research focuses on developing new data mining techniques that will assist in creating these new detection/prediction algorithms.

Because the technology to create four-dimensional assimilated data from actual observations is currently under development, we are using simulated storm data produced from the Advanced Regional Prediction System (ARPS), which is one of the top weather forecasting systems for mesoscale data (Xue et al., 2000, 2001, 2003). With the initial focus on detection/prediction of tornadoes, data mining techniques are used to identify the precursors of strong low-level rotation within hundreds of simulated supercell storms.

Mesoscale weather data poses several challenges to current data mining techniques. The sheer size of the data available is more than many current techniques can handle. A reasonable simulation can generate data every 30 seconds for a 100km by 100km by 20km domain with 500m horizontal grid spacing. This quickly produces a very large data set. In addition to dynamic, the data are continuous and multi-dimensional. Even with a propositional representation, identifying patterns in continuous data is difficult. We are currently using the SAX algorithm (Lin et al., 2003) for creating discrete data from continuous data. The multi-dimensional aspect to the problem only makes it more challenging. There is recent work addressing this issue but how to best mine multi-dimensional time series is still an open problem.

Given a set of fundamental and derived meteorological quantities that can be measured every 30 seconds, we generate a more feasible time-series data set. We then identify rules of the form “if the time series data for feature A fits the characteristic shape X and the time series data for feature B fits the characteristic shape Y within 5 minutes of the match on feature A, then the probability of a tornado occurring within Z minutes is P.” Our approach to identifying these rules draws from Oates & Cohen (1996) and McGovern & Jensen (under review).

This research is a part of the Collaborative Adaptive Sensing of the Atmosphere (CASA) Engineering Research Center. This center is developing new low-powered X-band radars that will sense the lowest 3 km of the atmosphere that the current NEXRAD radars miss (McLaughlin et al., 2005). These radars will dynamically adjust their scanning strategies to the current weather situation. The fact that the radars will be able to scan the lower regions of the atmosphere and that they can refocus the beam every 30 seconds will create valuable data capable of observing previously undetected storm structure. This new data necessitates the development of new detection/prediction techniques.


Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

McLaughlin, D.J., V. Chandrasekar, K. Droegemeier, S. Frasier, J. Kurose, F. Junyent, B. Philips, S. Cruz-Pol, and J. Colom, 2005: Distributed Collaborative Adaptive Sensing (DCAS) for Improved Detection, Understanding, and Prediction of Atmospheric Hazards. 9th Symp. Integrated Obs. Assim. Systems - Atmos. Oceans, Land Surface (IOAS-AOLS), Amer. Meteor. Soc., San Diego, CA.

Oates, T. and Cohen, P. R. (1996). Searching for structure in multiple streams of data. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 346–354. Morgan Kauffman.

Pielke, R. and Carbone, R. (2002). Weather impacts, forecasts, and policy. Bulletin of the American Meteorological Society, 83:393–403.

Tong, M. and Xue, M. (2005). Ensemble kalman filter assimilation of doppler radar data with a compressible nonhydrostatic model: OSS experiments. Monthly Weather Review, 133:1789–1807.

Xue, M., Droegemeier, K. K., and Wong, V. (2000). The advanced regional prediction system (ARPS) - A multiscale nonhydrostatic atmospheric simulation and prediction model. part I: Model dynamics and verification. Meteorology and Atomospheric Physics, 75:161–193.

Xue, M., Droegemeier, K. K., Wong, V., Shapiro, A., Brewster, K., Carr, F., Weber, D., Liu, Y., and Wang, D. (2001). The advanced regional prediction system (ARPS) - a multiscale nonhydrostatic atmospheric simulation and prediction tool. part II: Model physics and applications. Meteorology and Atomospheric Physics, 76:134–165.

Xue, M., Wang, D., Gao, J., Brewster, K., and Droegemeier, K. K. (2003). The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorology and Atomospheric Physics, 82:139–170.

Xue, M., M.Tong, and Droegemeier, K. K. (2006). An OSSE framework based on the ensemble square-root kalman filter for evaluating impact of data from radar networks on thunderstorm analysis and forecast. J. Atmos. Ocean Tech, 23:46–66.

Supplementary URL: