J3.1
Quantile Regression
Caren Marzban, University of Washington/APL, Seattle, WA
The prediction from most regression models - be it multiple regression, neural networks, trees, etc. - is a point estimate of the conditional mean of a response (i.e., quantity being predicted), given a set of predictors. However, the conditional mean measures only the "center" of the conditional distribution of the response. A more complete summary of the conditional distribution is provided by its quantiles. The 0.5 quantile (i.e., the median) can serve as a measure of the center, and the 0.9 quantile marks the value of the response below which resides 90% of the data. Similarly, the difference between the 0.95 quantile and the 0.05 quantile serves as the 90% prediction interval, thereby conveying uncertainty. Quantiles arise naturally in environmental sciences. For example, one may desire to know the lowest level (e.g., 0.1 quantile) of a river, given the amount of snowpack; or the highest temperature (e.g., the 0.9 quantile), given cloud cover. Recent advances in computing allow the development of regression models for predicting a given quantile of the conditional distribution, both parametrically and nonparametrically. The general approach is called Quantile Regression, but the methodology (of conditional quantile estimation) applies to any statistical model, be it multiple regression, support vector machines, or random forests. In this talk, the principles of quantile regression are reviewed and the methodology is illustrated through several examples. The technique and the examples display many of the features common in both machine learning and statistics. Recorded presentation
Joint Session 3, Bridging the Gap between Artificial Intelligence and Statistics in Applications to Environmental Science-I
Wednesday, 23 January 2008, 8:30 AM-10:00 AM, 219
Next paper