There have been numerous studies linking past observations of the solar events and IMF-induced storms to ionosphere anomalies. However, these anomalies often seemed different for different events. In fact, the identification of an ionosphere anomaly was usually made by a comparison to the condition of the ionosphere a few days before and after the solar or IMF event. A natural question for forecasters of ionosphere anomalies is what kind of anomalies are we intending to forecast? A probably too simplistic answer might be: any condition that is “unusual” compared with conditions that exist most of the time. Even after given a precise definition for an ionosphere anomaly, earlier analyses of solar and IMF data have shown that even some highly significant disturbances in these data often do not lead to obvious anomalies in the ionosphere. This indicates that there may be multiple connections between solar and IMF events and ionosphere anomalies and the relationship among them may be highly complex.
Recent advances in data mining techniques motivated by the development of artificial intelligence had considerable success in character selection in large quantity of data and in extracting previously unknown connections in massive amounts of data. In the past three years, our team has developed techniques to systematically identify anomalous conditions in the Earth’s ionosphere from historical archives of several years of observational data. We have also performed regression analyses to characterize relationships between available solar, IMF and other space weather observations to the ionosphere anomalies. Our analyses show that a forecasting system for ionosphere anomalies based on data mining techniques can be constructed using currently available data sets.
Our primary data set for representing the condition of the ionosphere is made up of the data driven Global Ionosphere Maps (GIM) that have been produced daily over that past 20 years by the Jet Propulsion Laboratory. A GIM provides a map of Vertical Total Electron Content (VTEC) over the globe at one-degree latitude and one-degree longitude resolution every 15 minutes. The raw data used in the GIM construction consists of slant TEC (STEC) observations from a network of ground GPS receivers. We are aware of many issues with using the GIMs as a representation of the ionosphere conditions. On the other hand, it constitutes an uninterrupted quality controlled stream of data appropriate for systematic analyses. For observation of sol,ar and IMF, we use reprocessed solar wind velocity, IMF vectors, sunspot, F10.7 index and a large number of indices representing geomagnetic conditions such as Dst, Ap, Kp. In addition to the raw solar, IMF and geomagnetic data, for validation of our work, we have also relied on community lists of major solar events such as CMEs.
Our analyses begin with an unsupervised anomaly identification. The approach consists of singling out data that are substantially isolated from clusters of “normal data”. We rely on a collection of measures, or metrics in mathematical terms, to represent the degree of difference between two GIMs. An example of such a metric would be the maximum absolute difference between the VTEC in the two GIMs. Alternatively, we could also compute the maximum absolute difference between the latitudinal gradient of the two GIMs. For some of the metrics, the differences are only evaluated over a specific region of the globe. In our study, we considered a total of 16 different metrics. For each of these metrics, the difference between any two GIMs for the same UT in a data set was evaluated. The maximum distance between the closest GIMs to a given GIM is defined as the -cluster radius of the given GIM. If the ionosphere condition on a given day is very much similar to many other days, the -cluster radius would be relatively small. On the other hand, if the ionosphere condition is highly unusual for a given day, we expect the -cluster radius to be large relative to most other days. A comparison of n-cluster radius for all 16 metrics with a list of major solar events indicates that there exists a statistically significant correlation exists between days with large -cluster radii and certain known solar disturbances.
In addition to serving as an indicator of outliers, the -cluster radius also serves as an extracted feature of the ionosphere condition. Feature extraction plays a crucial role in data mining in part because it allows for the reduction of a large volume of data into a few information-rich features. However, unlike the most common feature extraction that consists of a linear projection of the data onto a one-dimension subspace, the -cluster radius is derived from the comparison of a data point with a large collection of other data points. Regression analyses between these features and space weather data has the potential to be much more effective than regressions between GIM and the space weather data sets. In our case, we have performed regression analyses between solar, IMF and geomagnetic data and the -cluster radius. The data showed a high degree of statistical correlations between the space weather observations and the -cluster radii associated with some of the metrics. In fact, using the correlation coefficients derived from one year of data, we can produce predictions of ionosphere -cluster radii for another year of data with a high rate of success.
Our results demonstrate that advanced data mining techniques have the potential to produce a statistically reliable forecast for ionosphere anomalies based on solar and space observations. We are now developing a statistical forecast system test-bed to be delivered to The Community Coordinated Modeling Center (CCMC)that will enable users to evaluate the effectiveness of forecasting ionosphere anomalies based on solar and space weather observations via data mining techniques.