12.2 A Fast and Objective Multidimensional Kernel Density Estimation Method for Climate Data Analysis: fastKDE

Thursday, 14 January 2016: 11:15 AM
Room 226/227 ( New Orleans Ernest N. Morial Convention Center)
Travis A. O'Brien, LBNL, Berkeley, CA; and K. Kashinath, N. R. Cavanaugh, W. D. Collins, and J. P. O'Brien

Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011, J. Roy. Met. Soc. B). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. We introduce a method for practically extending the Bernacchia-Pigolotti KDE to multidimensions. We combine this multidimensional extension with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10^5 samples only takes 1 second on a modern workstation. We show that this fast and objective KDE method, which we call the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. We show that this property allows for direct calculation of conditional PDFs from the fastKDE. We demonstrate how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior. We apply this method to 119 samples of annual average precipitation and temperature data from California, USA (CA), and global annual average temperature, to directly show how the joint distribution of CA temperature and precipitation depends on global mean temperature. This analysis shows that (1) the CA temperature marginal steadily shifts toward warmer temperatures as global mean temperature increases, (2) the CA precipitation marginal has a complex relationship with global mean temperature, and (3) the covariance of CA temperature and precipitation switched sign in apparent association with a well-known shift in the state of the Pacific Ocean.

- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner