Machine Learning for the Source Detection of Atmospheric Emissions
There are currently no established methodologies for the satisfactory solution to the problem of detecting the sources of atmospheric releases, and there is a great degree of uncertainty with respect to the effectiveness and applicability of existing techniques. Several methodologies have been proposed to use evolutionary algorithms to find the characteristics of unknown sources. These methods perform multiple forward simulations from tentative source locations, and use the comparison of simulated concentration with sensor measurements to implement an iterative process that converges to the real source. The strength of the approach relies in the domain independence of the evolutionary algorithms, which can effectively be used with different error functions without major modifications to the underline methodology. The error functions quantify the difference between simulated and observed values.
A machine learning based methodology is presented to identify the characteristics of an unknown atmospheric emission using limited ground sensor measurements and numerical transport and dispersion atmospheric models. Forward numerical simulations are performed for each candidate solution, and the resulting concentrations are compared with the observed ground measurements. The goal is to minimize the error between simulated and observed values.
The main difference between the evolutionary algorithms and the proposed methodology is the way new individuals are generated. While evolutionary algorthms use non-deterministic operators such as mutation and recombination as the main engine of the evolutionary process, the proposed methodology employes a machine learning rule induction algorithm to learn attributional rules which discriminate between best and worst performing candidate solutions. Therefore, each time the machine learning program is applied, it generates hypotheses indicating the areas in the search space that are likely to contain high-performing individuals.
New individuals are generated according to inductive hypotheses discovered by the machine learning program. The individuals are thus genetically engineered, in the sense that the values of the variables are not randomly or semi-randomly assigned, but set according to the rules discovered by the machine learning program.
To understand the advantage of using machine learning to generate new individuals, compared to using the traditional Darwinian operation, it is necessary to take into account both the evolution length, defined as the number of function evaluations needed to determine the target solution, and the evolution time, defined as the execution time required to achieve this solution. The reason for measuring both characteristics is that choosing between the machine learning based evolution and Darwinian algorithms involves assessing tradeoffs between the complexity of the population generating operators and the evolution length. The machine learning operations of hypothesis generation and instantiation used are more computationally costly than operators of mutation and/or crossover, but the evolution length is typically much shorter than that of Darwinian evolutionary algorithms.
Therefore the use of machine learning as engine of evolution is only advantageous for problems with high objective function evaluation complexity. The problem of source detection of atmospheric pollutants described in this paper is an ideal such problem due to the complexity of the function evaluation which requires running complex numerical simulations.
The proposed methodology was used to identify the source characteristics of the Prairie Grass field experiment. Results show that the new methodology ouperforms traditional evolutionary algorithms for large problems.