##
6.5
Statistical forecasting of rainfall from radar reflectivity in Singapore

Handout (918.1 kB)

Z = a R^b

where $a$ and $b$ are constants to be estimated. The initial derived values were $a=220$ and $b=1.6$, and are still used as default values today. A common estimation method used is to calibrate the above relationship to the rainfall measured at rain gauges. In many instances, experts use observed hourly rainfall and reflectivity scans at a constant altitude above with different $(a,b)$ values until a chosen error metric is minimised. The purpose of this analysis is to make optimal predictions of rainfall from reflectivity fields in a nowcasting scenario in Singapore, i.e. up to 60 minutes aggregated predictions.

---- Data ---- As the focus of this analysis are heavy rain events, the following criterion was used to identify such events: \begin{quote}The occurrence of any single rain gauge on Singapore reporting a 2-hourly rainfall above 50 mm. \end{quote} This criterion was applied to hourly rain gauge data from April 2010 to August 2012, and resulted in the identification of 309 storm periods. Pseudo-CAPPI scans of reflectivity at heights of 1, 2, 3, 4, and 5 kilometres for each of these storm periods were extracted from the single Doppler radar situated at Changi meteorological station. The range of these scans was fixed at 120km.

Rain gauge data from up to 78 monitoring stations for the period April 2010 to August 2012 was used for training the models here. This data was at the five minute resolution in time, and 0.2 mm in rainfall measurement.

The radar scans are arranged on a Cartesian grid of 480 by 480 pixels, with their top left corner given by lat-long equal to (102.892, 2.42799) and lower right corner given by (105.052, 0.269748). In order to smooth out noise in the radar echoes and the point-wise rain gauge measurements, both temporal and spatial aggregations of the scans were investigated.

------ Models ------ For each of the spatio-temporal resolutions, the goal is to use the radar scans to estimate the total rainfall on the ground at each grid square, over the time interval dictated by the resolution.

Here is a description of the models that were applied. \begin{itemize} \item Marshall-Palmer relationship. This model uses the default Marshall-Palmer relationship ($a=220, b=1.6$) to estimate the rainfall amount. \begin{equation*} Z = a R^b \Rightarrow R_1 = \frac{1}{12} \left( \frac{1}{a} 10^{dBZ/10} \right)^{1/b} \end{equation*} where $R_1$ is the estimated amount of rain that has fallen in 5 minutes within a grid of interest. This value is then compared to the average of the rain gauge readings within that grid, for those 5 minutes.

\item Log-transformed linear model fit. This fits a least-squares estimate to the following equation, which arises from taking logarithms in the Marshall-Palmer equation. \begin{equation*} \log_{10} (R_2) = \beta_0 + \beta_1 dBZ + \epsilon \end{equation*} This model can be viewed as a calibrated Marshall-Palmer model.

\item Smoothing splines. Smoothing splines use basis functions, but do away with the problem in regression splines of specifying the number and location of knots. Instead, it solves the following problem by imposing a smoothness criteria on the overall fit.

\item Local linear regression. Locally weighted regression solves a separate weighted least squares problem at each target point. Typically a Gaussian kernel is used to weight the neighbours.

\item Random forest algorithm. A decision tree is a machine learning technique that recursively partitions the predictor space into rectangles. Each subsequent partition is obtained by assessing which predictor provides the best reduction in mean squared error. By resampling the original data, a sequence of trees can be grown, resulting in a random forest. \end{itemize}

The Marshall-Palmer model was used as the reference model. The second model is still based on that equation, but calibrates the coefficients to the data. The remaining three allow the data to dictate the form of the relationship. The smoothing spline model is a global model, with constraints on it's smoothness. The local regression method is a more local method. It can be viewed as an improved k-nearest neighbour technique. Also the latter method does not assume a constant variance across the support of the predictor. The final method, the random forest, is widely used in predictive analytics. It does not assume linearity in the predictors.

------- Summary ------- The goal of this analysis was to investigate if the default Marshall-Palmer relationship for predicting rainfall from reflectivity in Singapore can be improved, and to what extent. Consider the following table, that demonstrates the improvement of the smoothing splines model over the Marshall-Palmer relationship, in terms of RMSE (mm of rainfall). We note that spatial aggregations improve the error, while temporal aggregations worsen it.

\begin{table}[ht] \begin{center} \begin{tabular}{|l|l|l|l|l|} \hline & 5-min & 15-min & 30-min & 60-min \\ \hline \multicolumn{5}{|c|}{Default Marshall-Palmer model} \\ \hline 2-km & 1.75 & 3.79 & 5.73 & 7.80 \\ \hline 4-km & 1.53 & 3.42 & 5.21 & 7.22 \\ \hline \multicolumn{5}{|c|}{Smoothing splines model} \\ \hline 2-km & 1.56 & 3.44 & 5.14 & 7.11 \\ \hline 4-km & 1.36 & 3.09 & 4.65 & 6.34 \\ \hline \end{tabular} \end{center} \end{table}

Each of the models applied has different assumptions, and hence has different performance at different reflectivity ranges. We thus also investigate what combination of the above models is optimal for prediction, and what the final uncertainty around predictions will be.