1.2 Data Prospecting – a new approach to address “big data” exploitation challenges in Earth Science?

Monday, 7 January 2013: 1:45 PM
Room 18A (Austin Convention Center)
Rahul Ramachandran, Univ. of Alabama, Huntsville, AL; and J. Rushing, A. Lin, and K. S. Kuo

There are typically two categories of data analysis, namely, data exploration and data mining. Data exploration focuses on manual methods brought to bear on data analysis such as standard statistical analysis and visualization. Data exploration usually requires small datasets. Data mining, on the other hand, uses automated algorithms to extract useful information. Humans guide these automated algorithms and specify algorithm parameters (training samples, clustering size, etc.). Large datasets typically require data mining.

A new approach for exploiting "big data" is now possible with the availability of high performance computing and the advent of new techniques for efficient distributed file access. This new approach coined as “data prospecting” combines methods from both data exploration and mining. Just as prospecting focuses on locating the site within the vast land and determining the type of deposit that is located at that site. Data prospecting focuses on finding the right subset of data amongst all the data files and determining the value of the information contained within the subset. An initial prototype was developed to explore the viability of an interactively exploring large Earth science data. The Special Sensor Microwave Imager (SSM/I) gridded products available publicly from Remote Sensing Systems (http://www.ssmi.com) were chosen for this study. This paper will describe the “data prospecting” prototype, it's data exploration capabilities and performance results from this initial work.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner