Probably the most significant challenge is that this data has fundamental differences from the observational data sets that have provided the foundation of the Enterprise since its inception. The Enterprise cut its teeth and achieved admirable improvements in analysis and forecasting using a relatively small (when compared to the geographic area being covered) number of in situ observations, mostly collected by a set of fixed observing stations characterized by standardized equipment, siting, and installation.
The sensors used to generate these new data sets are almost completely opposite of that: hundreds of manufacturers are fielding thousands of different sensors, each with their own specifications. Many of these sensors are simply embedded in smartphones, while other standalone sensors are being deployed wherever they are needed, in the best location available to the fielder. Few, if any, of these sensor locations conform to the WMO standards of earlier generations, and the users of these sensors are often on their own, with no standardized procedures or guidance on how and where to install and operate them.
Nonetheless, with the number of new observation sources being at least three orders of magnitude greater than the baseline governmental observational networks (hundreds of millions versus thousands), the sheer volume of data potentially available demands that the Enterprise work hard to be able to make productive use of it. Acknowledging the increased variability inherent in such large data sets (due to variations in the factors noted above), at the very least the data sets used to provide statistically valuable estimates of selected data values and to understand overall trends in the data.
The Enterprise is in its infancy in learning how to take advantage of this data, but two conclusions are clear. The first is that the role of metadata will increase dramatically. When observing stations are part of the same network, using the same equipment and installed to the same specification, metadata is important to track but because of the homogeneity of the network, that metadata rarely captures operationally significant differences. In the evolving "network" of mobile and connected devices, however, the huge variation in sensor characteristics, installation, and operation make metadata absolutely critical - without strong metadata, the data is severely limited in its application.
Likewise, new algorithms and techniques are needed to allow the normalization and assimilation of the data from these wildly disparate data sources and to account for the significantly increased variability inherent in these new data sets. Given that so much of this data comes from the private sector, it is reasonable to expect that private sector organizations will play a leading role in these efforts.
And in fact, harnessing Big Data techniques from other sectors, such private sector efforts are appearing with increasing regularity, committing significant resources to put these new data sources to work. As these efforts increase in number, the development of industry standards, protocols, and best practices should be encouraged so as to facilitate maximum collaboration and transparency wherever possible. Professional organizations like AMS can play a crucial role in such coordination, which will serve to spread critical knowledge and accelerate advances across the sector.