George Wilkinson1, Rich Baker1, Tom Kowalski1, Mike Brogan1, Ed Richards1, Steve Walsh1, Ron Niemann1, and Dan Beall1
1Solers, Inc.
The Product Distribution and Access (PDA) application within the Environmental Satellite Product Distribution System (ESPDS) is a data-driven decision engine that provides the National Oceanic and Atmospheric Administration’s (NOAA’s) National Environmental Satellite, Data, and Information Service (NESDIS) with a flexible mechanism for distributing a wide range of satellite data products from a variety of sources to a broad spectrum of consumers based on an operator-configurable rule set. PDA allows for the definition of a set of data products, with a collection of associated file and metadata properties, which are then used to drive system behaviors for data acquisition, data characterization, persistence to short-term storage, matching to user-defined subscription and search criteria, and routing to consumers.
PDA consumers, through the use of a web-based portal, are able to specify a set of subscriptions based on metadata characteristics associated with the individually-defined products that indicate which products are to be received, when they are to be received, the mechanism through which they are to be delivered, and a set of options to manipulate the data contents according to a pre-defined set of allowable tailoring mechanisms such as geographic subsetting, projection remapping, bit depth and scale modification, conversion to other imagery formats, and various forms of compression.
PDA is a Java-based expert system application that is deployed as a collection of individual, autonomous, stateless services, each of which is geared toward a specific aspect of the product distribution process. These services perform data flow and control flow operations using the choreography model of service oriented architecture through the use of an enterprise service bus, using a well-defined communications protocol. PDA services are implemented using an API/Listener/Manager model that allows for consistent communication between services to take place by using Java libraries to abstract the communication details. PDA achieves scalability and redundancy through the deployment of a collection of services of each type, each of which listens on a shared message queue.
PDA was developed using the choreography model of service oriented architecture, primarily due to system latency concerns, but also due to the fact that the data distribution process lends itself to a mostly-serial processing order. In the choreography model, each service is aware of its role and position in the data distribution sequence, and uses characteristics of data being processed and user-defined parameters to route the files to the next appropriate service in the processing chain. System latency concerns also drove a decision to reduce or eliminate database interactions where possible. This increased the burden on message passing, because data is passed by value between services rather than by reference to the database. Using choreography removes the extra processing burden of marshalling and un-marshalling sometimes-large inter-service message traffic on the message bus.
PDA is able to receive data using (S)FTP(S) protocols, HTTP(S) pull, and delivery using NFS mount directly to the PDA file system. The direct delivery mechanism includes an associated API for providing metatdata characteristics of delivered files. Data providers can optionally provide an associated checksum value with each provided file for performing data integrity checks. Similarly, PDA can deliver data to consumers using (S)FTP(S), by HTTP(S) pull, and by direct NFS mount to the PDA file system for select consumers. Users can select an optional checksum file for each file, and may choose to be notified of file availability by e-mail.
The PDA system’s design allows it to be scaled up or down to accommodate a range of file count and volume requirements, and provides the flexibility to accommodate a broad range of data types as well as data receipt and distribution protocols. Built-in fault tolerance mechanisms allow the system to recover from network and other system outages and to obtain and distribute data that was missed during outage periods.