Using a Feed-Forward MLP Neural Network to Fill Gaps in N<sub>2</sub>O Emission Data

Fehr, Benjamin Matthew; Fehr, Benjamin Matthew

Agriculture is a significant contributor to global nitrous oxide (N₂O) emissions. In order to meet growing demands for food worldwide, farmers must supplement soils with nitrogen. Nitrogen fertilization stimulates the production of N₂O, a strong greenhouse gas and lead ozone depleting substance. Measuring N₂O fluxes from agricultural soils helps researchers determine annual emission rates and the efficacy of different emission reducing strategies. However, sites often collect incomplete flux data. This could be due to limited time and human resources, logistical constraints of the site that interfere with sampling, or equipment malfunction. These gaps are problematic as N₂O emissions are highly variable over time. Flux rates are dependent on oxygen, nitrogen, and carbon concentrations, soil moisture, and temperature among other factors. The most common way of filling gaps is to do a simple linear interpolation between missing data points. When the length or frequency of gaps increases, this method becomes less effective due to the high variability in N₂O fluxes. A better method is to use meta-data, such as temperature, rainfall, and soil moisture, to make predictions about missing N₂O fluxes. For this project, we used a Feedforward Multi-Layer Perceptron (MLP) neural network trained using backpropagation to predict N₂O emissions using this meta-data. These networks are comprised of three layers: an input layer, a hidden layer, and an output layer. The number of nodes in the input and output layers correspond to the number of input and output variables, while the number of nodes in the hidden layer is determined during calibration. This is a relatively new method for filling gaps in greenhouse gas data, and it is important to learn from other researchers about the cutting edge of artificial intelligence in environmental science to ensure that it reaches its fullest potential.

This project had three main objectives. The first was to assess performance of this type of neural network on difference sites from around the globe. The second was to determine how the amount of available data and the number of input variables affected performance, with the aim to determine what minimum data or meta-data type is required. Finally, we used a trained neural network to predict missing N₂O emissions and estimate annual fluxes.

Sites had varying levels of input meta-data and number of sample days (ranging from 7-20 meta-data variables and 70-350 days). However, for the site comparison, we limited the inputs to only those common between the sites. We examined the data used for each model at individual year time steps. In addition to the environmental inputs from each site, we added a continuous numeric time of year and binary season variable to each data set. These variables were added in order to capture daily and seasonal patterns. We then simultaneously selected the initial weights and the number of hidden nodes for each model through a trial and error process to achieve the lowest root mean square error. The models were trained using the NeuralNet package in R and cross-validated using k-fold cross validation with k=10. To measure the final model’s accuracy, we plotted a known set of data against a predicted set and calculated the R² value. To test how adding inputs would affect the accuracy of the model, inputs were added sequentially, beginning with the five used for the site comparison (rainfall, air temperature, time of year, summer, fall), then in order of their predicted effect on N₂O emissions. To test whether the amount of available data had an effect on model accuracy, we calibrated and trained models using less and less data, starting with 80% of the meta-data and decreasing by 20% each time.

Preliminary results indicate that performance varies by site. We also found that when the total percentage of available data was plotted against the R²values,there appeared to be no relationship between the two. When we began removing data from the data set, there was no significant decrease in accuracy until only 20% of the data was used to train and calibrate the models. As for inputs, we found that prediction began to decrease after nine input variables were used. We believe that this is due to redundancy in the information provided by each of the inputs which decreased performance. Finally, we used the trained model to fill the gaps in the data and compared it to a linear interpolation. We then calculated the area under the curve for each method of gap filling to get an annual emission rate and compared the results. The neural network filled data resulted in both under and over estimation compared to that of the linear method.

This project served as a proof of concept for using Feedforward MLPs to fill gaps in N₂O emission data. In the future, researchers should focus on developing a globally convergent algorithm for structure selection and calibration. Error surfaces for environmental models are inherently bumpy and it is difficult to achieve global convergence in a computationally efficient manner. This model is currently being tested across the suite of sites found within the Global N₂O Database to examine the possibility of a globally convergent algorithm as well as test the efficacy of neural networks to be used in gap filling for N₂O emissions data. Other neural network architectures, particularly recurrent neural networks, should also be tested to see if performance could be improved by giving the model information about past trends. Additionally, standards must be developed for testing these types of models. Results from this project can be used by researchers to determine the most valuable metadata to collect from a site. Finally, neural networks should be compared to other gap filling methods.

9A.3 Using a Feed-Forward MLP Neural Network to Fill Gaps in N2O Emission Data

9A.3 Using a Feed-Forward MLP Neural Network to Fill Gaps in N₂O Emission Data