Prediction of skew surge by a fuzzy decision tree

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner
Wednesday, 20 January 2010: 9:15 AM
B204 (GWCC)
Samantha J. Royston, University of Bristol / Proudman Oceanographic Laboratory, Liverpool, Merseyside, United Kingdom; and K. Horsburgh and J. Lawry

Presentation PDF (1.1 MB)

Storm surge resulting from mid-latitude weather systems can lead to considerable damage and fatalities as a result of coastal flooding. The real-time, accurate prediction of storm surge is of primary importance in flood forecasting and warning systems. Skew surge is the most useful measure of storm surge and is defined as the difference in elevation between the maximum observed water level in a tidal cycle and that predicted by tide tables.

The skew surge at Sheerness, on the east coast of the UK, is predicted using a probabilistic data-driven technique specifically, a fuzzy, linguistic, entropy-based decision tree. The application site is of particular importance since predicted extreme sea levels here determine the closure of the Thames Barrier, which provides flood protection for central London. Along the east coast of the UK coastally-trapped gravity waves (tides and surges) propagate from north to south. Model inputs consist of predicted tidal and observed skew surge elevation and time delay or advance in peak water levels for five tide gauges located along the UK east coast with wave travel times to Sheerness in the approximate range of 8 to 14 hours.

Fuzzy discretization of the data creates a probability matrix of mass assignments for each variable. An entropy-based decision tree algorithm (referred to as LID3) incorporating fuzzy labels and based on the ID3 algorithm, is developed on the initial 80% of the fuzzy set data, obtained for years 1980 to 2008. The resulting tree structure provides a predicted probability distribution of the fuzzy sets of the skew surge at Sheerness, given the unseen latter 20% of input data. The predicted probability distributions can subsequently be 'defuzzified' using the mean or modal values of these fuzzy sets to give real-valued predictions.

The real-valued predictions from the decision tree model are comparable in accuracy to predictions from a linear least squares regression model, with a root mean square error (RMSE) of 0.125m for the tree model compared with 0.118m for least squares regression. A comparison can be made against a reference, persistence forecast, which assumes the same observed skew surge from the previous tidal cycle persists at Sheerness. The RMSE for the persistence forecast is 0.198m corresponding to a Mean Square Error skill score for the decision tree model against this persistence forecast of 0.605, representing a 60.5% reduction in variance. Furthermore, the tree structure can be interrogated as sets of rules, allowing insight into the key physical drivers of surges at this critical location. Most information relating to skew surge at Sheerness is gained from the observed skew surges at the sea level measuring stations to the north, for the input data set chosen.

The rule leading to the highest probability of a large positive skew surge at Sheerness can be interpreted as an observed skew surge amplifying as it progresses from north to south corresponding with an advance in the time of expected peak water level, which itself is due to the increase in wave celerity with increased depth caused by the surge. The propagation speed (c) of all shallow water waves is given by c = (gh) where g is Earth's gravitational constant and h is water depth. The tree predictions and interpretation are encouraging for further development and application of this technique to this problem, and other tidal and sea level applications.

The accuracy of the method is comparable to alternative prediction methods whilst offering the major benefit of transparency. Work in progress involves further interrogation of the tree structure, investigation of fuzzy discretization methods and the inclusion of atmospheric pressure and wind field data in the model inputs. The probabilistic approach avoids optimization procedures, and it also allows null values in some or all of the input variables, where measured data may sometimes be missing for operational reasons. The method is fast to implement on standard PCs and could therefore be utilized in a real-time application.