Sophisticated and complex data accounting systems have been developed to report and monitor data availability and latency technical performance metrics. These data accounting systems are based on the basic principles of data warehousing: collect the datum, transform it, load it a well-defined database and produce various canned reports. This approach has many problems, namely:
- Only a small fraction of all the system performance data is actually collected and analyzed.
- The use of structured data leads to inflexible systems that are not able respond to business change requests.
- Data structures must be defined very early in the system lifecycle resulting on a data accounting system that is inflexible and not easily extensible.
- The operational processes/states of the data accounting system become coupled to the ability of the data processing system (data ingest, product generation and dissemination) to generate complete and well formatted data.
- Malfunctions of the data accounting system put into question the integrity of the transactional data it contains.
These data accounting systems can only store a limited amount of data due to performance and capacity limitations. In addition, correlations between system performance and data availability and/or data latency metrics are not offered. These systems are designed to answer pre-defined questions with no ability to dynamically incorporate and use new sources of data.
This paper will present a proposed approach for using big data principles and technologies to collect, store and report on life of the mission data. The usage of proven big data technologies like, Hadoop File System (HDFS), Map Reduce and other big data tools and techniques, can bring significant benefits to the data accounting systems that support the new generation of satellite ground systems.
The main benefit of this new approach is that we do not need to define the questions early in the system lifecycle; we just need to identify the sources of data and collect them in their raw form.
Additionally, the use of big data principles, tools and approaches provide:
- Flexibility - the system can collect and report on a large variety of data sources. It is even possible to have a single data accounting system simultaneously support multiple missions.
- Extensibility - big data technologies are designed and implemented to address the data Velocity, Variety and Volume (V3) problem. Incorporating big data technologies into a data accounting system allows the addition of new sources of data which have very different data formats, large volumes and long retention times.
- Completeness – many system activity logs could be collected and stored on a single data accounting system. As an example, the data production logs and the system logs can be combined to produce reports to support anomaly resolution, performance tuning, capacity planning, as well as the basic TPM reports.
- Cost Effective Solution –big data solutions are less expensive as a result of the ability to build using commodity hardware and open source tools. By consolidating multiple data accounting system into a single enterprise system we can realize significant cost savings.
- Alignment with Federal, NASA and NOAA direction - there are various efforts within the federal government, and within NASA/NOAA in particular, to adopt cost-cutting solutions, embrace big data technologies and provide enterprise solutions for enterprise problems.
We will explore the characteristics and basic architecture of such a system. We will present an approach for its implementation. We will identify and analyze the predominant issues and suggest future activities to further this research.