Abstract: HDF5 for NPP Sensor and Environmental Data Records (92nd American Meteorological Society Annual Meeting (January 22-26, 2012))

Wednesday, 25 January 2012

HDF5 for NPP Sensor and Environmental Data Records

Hall E (New Orleans Convention Center )

Richard E. Ullman, NASA, Lanham, MD; and M. J. Denning

The Joint Polar Satellite System (JPSS) is the next generation of low earth orbiting environmental satellites. The JPSS and the National Polar-orbiting Operational Environmental Satellite System (NPOESS) Preparatory Project (NPP) satellites are sun-synchronous polar orbiters with a period of approximately once every 100 minutes. Together with the Interface Data Processing Segment (IDPS), the system will provide global monitoring environmental conditions by collecting, disseminating, and processing data about the Earth's weather, atmosphere, oceans, land, and near-space environment with precision and detail never before achieved by operational weather satellites. This volume of data will allow scientists and forecasters to monitor and predict weather patterns with greater speed and accuracy. NPP data products are delivered as Hierarchical Data Format 5 (HDF5) files. HDF5 is a general-purpose file format and library designed and developed by the National Center for Supercomputing Applications (NCSA). HDF5 was developed to provide flexible, portable, and efficient storage and retrieval of scientific datasets. NPP uses the HDF5 structure to implement a specific data model for the NPP data products without use of any required extensions to the native library. Some advantages of using HDF include efficient storage and I/O, including parallel I/O and the fact that it is free, open source software, available for use on multiple platforms. Utilizing mature technology standards from both HDF5 and Extensible Markup Language (XML), NPP data products provide platform mobility and accessibility by a diverse set of users. In keeping consistency in mind for general application and framework development by the varied community of users, the NPP has strived to provide a common, consistent structure to the NPP data products HDF5 organization. Additionally the structure and individual NPP data products are fully described in publicly available documentation as well as machine-readable XML files, referred to as Product Profiles. The NPP products contain metadata (including real-time quality information), structured dataset arrays, aggregations of granules, and geolocation information. The granule is the atomic unit, or smallest subset of data, for the NPP data products. Granules are time-based durations of the data produced from the sensor output. Taking advantage of the hierarchical nature of HDF5, NPP data product granules can be aggregated together in a single HDF5 file without modification to the base structure of the HDF5 implementation designed for NPP data products. Aggregation of granules is performed in the “along-track” dimension by simply extending along that dimension. A pointer structure provides the ability to access the data in an aggregation by way of individual granules or as the whole dataset. Metadata is facilitated through detailed documentation ranging from Algorithm Theoretical Basis Documents and Data Format Control Books, describing the more static, consistent attributes of the data, to the attributes provided in the HDF5 files, describing the dynamic product instance. In addition to the product documentation and dataset attributes (field metadata), NPP data products also include quality flags (element metadata) in bit-fields co-aligned with the datasets. NPP Product Profiles are encoded in XML and are distributed with the NPP documentation set, along with a style sheet for rendering the profile in a web browser. Each instance of a product type has a separate profile that is linked to the data granule through a metadata reference within the HDF5 product file. The XML files provide detailed information such as units of measure, dimension names, and legend entries that do not change from instance to instance. Environmental Data Record (EDR) and Sensor Data Record (SDR) profiles follow an XML schema, facilitating machine parsing for extracting desired elements. Each NPP data product is processed and packaged in HDF5 in structured arrays. Fields are stored as HDF5 datasets using HDF5 native data types and explicit array dimensions. Datasets within a single product that contain common dimensions are related by congruency. The dimensions and other static attributes of the arrays are provided via the XML Product Profiles. Geolocation data for NPP data products are constructed using the same paradigm and conventions as the science/sensor data they are associated with. These data, like the quality flags and other datasets in an NPP data product, have a congruent relationship with the same dimensions as the datasets to which they apply. Based on the HDF5 design employed for NPP data products, geolocation data can be stored in the same HDF5 file as the science/sensor data in either a separate HDF5 group or in a separate file.

Supplementary URL: