J6.3 Hourly Rainfall Estimation using Gradient Boosted Decision Trees

Monday, 11 January 2016: 4:30 PM
Room 354 ( New Orleans Ernest N. Morial Convention Center)
Devin Anzelmo, University of California, CA

This work describes a method of estimating probability distribution for hourly rainfall from polarimetric radar scans. The data was split into five separate sets based on the number of radar observations to facilitate the feature generation. Features for supervised learning were created by calculating descriptive statistics(mean, stdev, max, etc) for all of the provided polarimetric radar variables. The labels(hourly rain gauge readings) were rounded down to the nearest integer and all samples with rain amount greater then 69mm were discarded from the training data. Because of the low number of high rain rate samples the labels were further aggregated by binning. Soft multiclass classification was performed using the binned labels to generate probabilities for each bin. The full probability distribution was created using bin probabilities and the proportion of each class within the bins. The best performing learning algorithm was gradient boosted decisions trees using the 'multi:softprob' objective from the Xgboost machine learning library. The competition leaderboard was used to validate different models and features. A final Continuous Ranked Probability Score(CRPS) of 0.007488 resulted in a first place finish in part one of the 2015-2016 AMS AI-contest.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner