NOAA Observing System Integrated Analysis - II: The Characterization of Interconnectivity
In addition to making the best use of NOAA's observing capability acquisition budget to meet its strategic goals, NOAA also seeks to economize by maximizing symbiotic interdependencies in services and products through the Civil Observing Capability Network. TPIO has collected the NOSIA-II data sets to support this goal.
These data have a large number of features, some of which are highly correlated. Such correlations increase the complexity of any treatment that has to be applied to it in order to arrive at observing capability investment decision recommendations; however, they often reveal interesting patterns and useful information hidden in the data. Although correlations within NOSIA-II may appear high dimensional, they may be governed by a few simple attributes.
NOAA's observing capabilities are organized by connections between its mission service areas, Line Office products and services, and various external stakeholder requirements and congressional mandates. To understand this interconnectedness, clustering analysis may help us characterize how many of NOAA's interdependent products and services are not only connected but also self-organizing and mutually constructed.
This paper will qualitatively illustrate how interdependence is significant in building an economical, coherent and sustainable network of observing capabilities. The interdependence of NOAA's services and products constitute plural pathways where observation information is transmitted through several possible connecting points as opposed to more vertical and horizontal forms in hierarchies. The resulting nodal complexity of NOAA's services and products become interdependent based upon complementary observing capabilities that obligate one stakeholder to another. The interdependence that emerges from this organization indicates complex, flexible and nonlinear modes of self-organization.
Although the NOSIA-II data sets behave in a logarithmic scaling across observing capabilities impacts, these impacts are acutely interdependent and not random. Interdependencies that behave this way qualitatively imply a stable fractal dimension. This interdependence arises naturally from services and products that not only rely on observing capabilities directly but also on other products that directly or indirectly rely on those same observing capabilities.
The interdependence of NOAA's services and products on a particular observing capability varies dramatically over a wide range of scales. The driving forces of stakeholder requirements for NOAA's key products and services interact with NOAA's resources to dominate this variability. The additional feature of self-organizing indicates an inherent stability residing within NOAA's observing capability network.
The objective of this project is to examine the role of interdependence in establishing, maintaining, and enhancing an observing capability network. We wish to highlight the importance of the entire networked system as a unit of analysis. To do this, we seek to characterize the patterns of interconnection and movement of observing data throughout NOAA. We do this by characterizing the “fractal” dimension of the NOSIA-II data set, which provides a means for measuring how well NOAA's observing capabilities are self-organizing, and we show how this can be used to aid in several data mining tasks. Through the lens of fractal characterization, the NOSIA-II data will reveal the transforming power of observing network organization and interdependence.
Fractal dimension can be used as a tool to solve problems of sample homogeneity, which is a measure of how well the relationships are scalable. Low fractal dimension implies high efficiency and high interconnectedness. A decrease in the fractal dimension may be translated into higher inter-connectedness. The performance of NOAA's key products and services may also be linked to the data's fractal dimension. By selecting a method of fractal dimension characterization that is scalable to larger data sets, we will help to economize the need for continuing to survey more and more products and services.
Clustering is a widely used knowledge discovery technique. A fractal clustering technique can deal with large data sets with high dimensionality and noise and can recognize clusters of arbitrary shape. We can use clustering techniques to identify outliers and to determine the representativeness of the data at higher scales. The method must be resistant to noise, capable of finding clusters of arbitrary shape and capable of dealing with points with high dimensionality.
A few simple techniques for fractal clustering of the NOSIA-II data are proposed. These techniques assess the characteristic fractal dimension of the NOSIA-II data by evaluating a surveyed product's number of dependencies and their impacts from each unique observing capability. The overall objective of clustering is to distinguish random connectedness (very high or infinite fractal dimension) from deterministic connectedness (very low fractal dimension). We can use the fractal dimension of the NOSIA-II data to measure the probability that two products chosen at random will behave a certain level of interconnectedness. Fractal dimension can be used to detect anomalies. In other words, it can detect when self-similarity breaks down.
The fractal dimension of the NOSIA-II observing capability impact data will be estimated from the slope of log-transformed variograms from clustering algorithms. A stable fractal dimension for the NOSIA-II impact data will be invariant under various scales. This makes the fractal dimension meaningful as an experimental observable in decisions to adjust the observing capability network. In evaluating fewer, more, or improved observing capabilities, a disruption to the pattern of connections will alter the fractal dimension of the system. The assessment of fractal dimension at various scales can separate normal from suspicious or outlier connectivity. Drastic changes in the fractal dimension should point out anomalous trends or uncover anomalous patterns in the dataset.