Addressing wind direction uncertainty in source estimation through dynamic time warping

Cervone, Guido; Cervone, Guido

This paper addresses the challenges associated with the problem of estimating the source of atmospheric pollution. The goal of source estimation is to identify the contributing source of pollution and its characteristics, taking into account the effects of atmospheric circulation such as wind direction and velocity. Potential pollution sources include fossil fuel burning power plants, urban areas, industrial complexes, forest fires, and dust storms. Source estimation is a crucial problem of legal and political relevance especially in the cases of transnational pollution.

In iterative methods, the source characteristics are derived by performing numerical simulations from multiple candidate solutions, and by then comparing the concentrations calculated by the models with the observed measurements. Previous work employed evolutionary algorithms to drive a search process, and it was shown that source characteristics were correctly identified for synthetic cases and for a controlled field experiment.

Different measures of the error between the simulated and observed values were investigated to quantify the performance of the new candidate solutions. Examples of error functions considered by researchers include Euclidean Distance and Normalized Mean Square Error (NMSE). The choice of error function is central to the problem, since it is the only feedback that the algorithm receives on the quality of the newly generated solutions. It is usually identified as error or fitness function, and its value is also called the skill score.

The correct wind direction is paramount to source estimation problems. It was observed that errors in wind direction of only a few degrees lead to poor results. Even when the wind direction is carefully measured at the time of the release, as for example in a field experiment, the wind variability over the time of the release can be very large leading to large uncertainty and noise in the data.

To address this problem, previous research investigated two different approaches. The first method consisted choosing an error function that compares the distributions for the simulated and observed values, without taking into account their spatial distribution. In general the method performed poorly because the spatial location of the concentration plays a crucial role in correctly identifying the characteristics of the source. A second approach consisted making the wind direction an unknown in the source estimation problem. This method generated good results, at the cost of significantly increasing the complexity of the search process.

In this work, we propose to use Dynamic Time Warping (DTW), a technique well known in the signal processing and time series data mining communities, to measure the error between simulated values and observed measurements. Unlike Euclidean distance (or more generally, Lp-norm distances), for which sequences are aligned in a point-to-point fashion, DTW uses dynamic programming techniques to determine the best alignment that minimizes the distance (error) between two sequences. Its non-linear mapping of one sequence to another allows meaningful matching of similar but locally shifted sequences. The parameter, warping length, determines how much warping is allowed to find the best alignment. A large warping window causes the search to become prohibitively expensive, as well as possibly allowing meaningless matching between points that are far apart. On the other hand, a small window might prevent us from finding the best solution. Euclidean distance can be seen as a special case of DTW, where there is no warping allowed.

The figure below demonstrates the difference between the two distance measures. The top and bottom sequences appear to have similar shapes. In fact, the sequence below is the shifted version of the sequence above. However, using Euclidean distance, the slight shifts along the time axis will result in a large error between the two sequences. More specifically, with Euclidean distance (Left), the dips and peaks in the sequences are mis-aligned and therefore not matched, whereas with DTW (Right), the dips and peaks are aligned with their corresponding points from the other sequence.

Since errors in wind direction cause similar shifting effects in simulated data, DTW is a suitable choice that will compensate for such errors. In this paper, we will demonstrate that adapting DTW as the error or fitness function in iterative methods produces better results than existing approaches.

2.5 Addressing wind direction uncertainty in source estimation through dynamic time warping