A Procedure for Neuro-Fuzzy Dynamic-Statistical Data Modeling With Predictor Selection

Burrows, William R.; Burrows, William R.

In most meteorological and climatological forecasting activities there is a demand for dynamic-statistical models (DSM) to analyze and predict environmental elements when deterministic models do not predict the predictand or the existing prediction accuracy can be increased. Prediction models are generated with past observations of a predictand and associated predictors, then run in a forecast mode using predictors derived from basic elements forecast by a deterministic model. Examples are the "perfect prog" and "model output statistics" forecasts of basic weather elements produced routinely for guidance to operational weather forecasters, and "downscaling" methods used for modeling regional climates from large scale data. A DSM is an inexpensive yet accurate method compared with running a pure dynamical high-resolution model in terms of computer time in development and use, and thus the popularity of the method. Many environmental elements have complicated non-linear relationships with predictors in a DSM, thus multiple linear regression (MLR) is not suitable. A primary requirement is a data modeling method suitable for relatively large data sets and potentially many predictors, requiring minimal computational burden both in development and use. A procedure for this is presented. Due to its relative efficiency it is particularly suitable in situations when many models are required. Classification and Regression Trees (CART) is used for selecting predictors from large set of potential predictors and for stratifying the data if need be. CART is an efficient non-parametric data analysis algorithm that develops a decision-tree data partitioning structure which minimizes residual variance of the predictand in subsets of its total distribution, essentially by developing a tree-based structure that clusters the data into a set of "terminal nodes". CART identifies the relevant predictors from a much larger pool of potential predictors in a data set by assigning an "importance rank" to potential predictors. Relevant predictors are those that appear in decision nodes plus additional ones that are assigned greater than zero "importance" by CART due to their selection as surrogate predictors. CART output is piecewise continuous and can be used alone for problems where the predictand is categorical, such as precipitation type. However, a model giving continuous output is needed for predictands which are continuous in time and space. This is obtained with a "neuro-fuzzy inference system" (NFIS), using the relevant predictors found by CART. Here multivariate cluster centers are found in the training data by "subtractive clustering" and used as the basis for a fuzzy rule-based inference system. NFIS develops a highly optimized data model in one-pass that usually needs no further tuning, thus it has modest computational requirement. The method is equivalent to a radial basis-function (RBF) neural network where the clusters are the neurons. If desired, small improvement can be obtained by the adaptive neuro-fuzzy inference system (ANFIS) algorithm. The modeling procedure has been successfully used to model ocean surface winds measured by buoys off-shore from British Columbia, fog-water deposition from high-elevation cloud, ground-level ozone, and work on other predictands is underway.

1.4 A Procedure for Neuro-Fuzzy Dynamic-Statistical Data Modeling With Predictor Selection