Probabilistic Cloud Forecasting using Logistic Regression

Kemp, Eric M.; Kemp, Eric M.

We discuss the development of objective probabilistic cloud forecasts over areas viewable by a single ground observer (~2700 km^2). Our approach uses logistic regression, a statistical technique for relating predictor variables to a binary outcome. Unlike the more commonly used linear regression, logistic regression output ranges continuously from 0 to 1, and is thus more easily interpretable as a probability. In addition, errors in the logistic regression are modeled using a binomial distribution, which is more appropriate than the normal distribution employed with linear regression.

In this application, predictors are supplied from the Weather Research and Forecasting (WRF) model, which is run with a 36-km resolution domain over the continental United States (CONUS). WRF is initialized directly from the operational North American Mesoscale (NAM) analysis produced at 0000 and 1200 UTC by the National Weather Service. Column maximum relative humidity is used as the predictor, but other variables such as wind, divergence, and warm air advection are being tested.

Cloud information is derived from a 4-km resolution CONUS cloud mask created by a Cloud Mask Generator (CMG). The CMG ingests multi-spectral imagery from the Geostationary Operational Environmental Satellites (GOES), and applies a series of single- and multi-spectral tests to detect clouds. CMG pixels across the region of interest and within a 4-hour time window are treated as truth data, and are used to train and validate the regression equations. As a result, the regression forecasts can also be interpreted as a spatial/temporal average cloud fraction.

A 1-year dataset has been collected to generate hourly forecasts from 0-24 hours. Validation statistics -- including Brier Scores, Relative Operating Characteristic curves, and reliability diagrams -- show that the forecasts have skill over simpler methods (20-day persistence, random forecasts, all cloudy/clear, etc.) However, results also show significant variability in forecast performance from day-to-day. Some of the errors have been attributed to the NAM analysis, which often has significant differences in moisture profiles compared to rawinsonde data. Other errors are under investigation; corrections and results will be reported at the conference.

Session 8A.8 Probabilistic Cloud Forecasting using Logistic Regression