Information-Theoretic Perspective on Benchmarking with Inductive Models
An important aspect of model benchmarking is to use simple models to set a prior expectation of model performance (Abramowitz 2012); often, inductive models are used for this task (e.g., Abramowitz 2005). Luo et al. (2012) interpreted this as helping us to “define a benchmark level of performance that land models can be targeted to achieve relative to the information contained in the meteorological forcing of the surface fluxes.” In essence, the idea is to understand how much information can be extracted from model inputs before applying the model(s) we wish to benchmark.
All of the evaluation metrics proposed by Abramowitz (2012), and almost all model evaluation measures used by hydrologists in general are fundamentally rooted in probability theory. The concept of information was rigorously defined in this context by Shannon (1948) as an integrated change in the negative log-probability of a distribution over the target random variable. Specifically, the amount of information contained in model inputs is the mutual information Cover and Thomas (1991 p19), between inputs and model outputs.
Extracting the information from inputs, or even computing the mutual information, is impossible without knowledge of the “true” joint distribution between inputs and outputs – i.e., the “true” model of the system. We propose an interpretation of Shannon's theory that quantifies the amount of information about a target variable provided by a model as a divergence (e.g., Kullback and Leibler 1951) from probabilistic knowledge about the target variable after applying the model to knowledge about the target variable before applying the model – specifically, we treat the model as an approximation of Bayes' law and compute the divergence from the approximate posterior to a prior. The prior represents knowledge given input data.
The amount of information that we can extract from data is determined by the amount of information present (fundamentally unknowable) and our ability to build an inductive model directly from that data. For a consistent comparison between information provided by data alone and information provided by a model, it is necessary to include in the prior any uncertainty we might have about inductive model itself. We suggest three methods that should work in diverse modeling environments: (i) neural networks with bootstrapping, (ii) Gaussian process regression, (ii) the asymptotic variance of empirical density functions.
Abramowitz, G. (2005). Towards a benchmark for land surface models. Geophysical Research Letters.
Abramowitz, G. (2012). Towards a public, standardized, diagnostic benchmarking system for land surface models. Geoscientific Model Development Discussions.
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York, NY, USA: Wiley-Interscience
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics.
Luo, Y.Q., Randerson, J., Abramowitz, G., Bacour, C., Blyth, E., Carvalhais, N., Ciais, P., Dalmonech, D., Fisher, J., & Fisher, R. (2012). A framework of benchmarking land models. Biogeosciences Discussions.
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal.