NABU: A Distributed, Parallel, Data Processing Platform

Lorenzo, Antonio T.; Lorenzo, Antonio T.

The Renewable Power Forecasting group at the University of Arizona provides operational renewable energy forecasts to three electric utilities in the Southwest. These forecasts rely on Python for data collection, analysis, visualization, and dissemination. nabu was created to facilitate communication between different processing steps in a easy to use, decoupled way.

In nabu, each processing step is a dedicated process that has defined input data and then broadcasts the result. For example, one step might take the output of a numerical weather model as input, pull out the data for select weather station locations, and then broadcast the point forecasts to other steps that may include plotting or analysis versus actual data. The processing is highly parallel because each step is a dedicated process. This can be further parallelized by using Celery so that each step can have any number of dedicated processes working at the same time. Communication between steps is facilitated with RabbitMQ and only the keys to the data located in a Redis database are transmitted. This enables intermediate results to be computed once and stored for use in other steps. Furthermore, the publish-subscribe nature of the communications means that a step needs no information about other steps except for the nature of the data to be processed.

In addition to decoupling data processing for operational use, nabu can also be used for reproducible research processing. nabu keeps track of the input data and the functions applied to that data at each step. Thus, for example, one can determine that a given graph was produced by applying functions x, y, and z to inputs i, j, and k.

320 NABU: A Distributed, Parallel, Data Processing Platform