Handout (1.9 MB)
In nabu, each processing step is a dedicated process that has defined input data and then broadcasts the result. For example, one step might take the output of a numerical weather model as input, pull out the data for select weather station locations, and then broadcast the point forecasts to other steps that may include plotting or analysis versus actual data. The processing is highly parallel because each step is a dedicated process. This can be further parallelized by using Celery so that each step can have any number of dedicated processes working at the same time. Communication between steps is facilitated with RabbitMQ and only the keys to the data located in a Redis database are transmitted. This enables intermediate results to be computed once and stored for use in other steps. Furthermore, the publish-subscribe nature of the communications means that a step needs no information about other steps except for the nature of the data to be processed.
In addition to decoupling data processing for operational use, nabu can also be used for reproducible research processing. nabu keeps track of the input data and the functions applied to that data at each step. Thus, for example, one can determine that a given graph was produced by applying functions x, y, and z to inputs i, j, and k.
Supplementary URL: https://forecasting.energy.arizona.edu