J6.4 4th Place Solution for “How much Did it Rain? II” Kaggle Competition

Monday, 11 January 2016: 4:45 PM
Room 354 ( New Orleans Ernest N. Morial Convention Center)
Shize Su, University of Virginia, Charlottesville; and A. Jacobs

In this competition, Shize and Alex teamed up with each other and achieved 4th place finish in the public LB (with score 23.69622) and also 4th place finish private LB (with score 24.71707). Our best solution is an ensemble of a couple of xgboost models and two Lasso models. Our success in this competition mainly relies on: 1) the great feature engineering efforts as well as the xgboost parameter tuning work by Alex; 2) the power of applying the ensemble technique by Shize, especially the use of a novel trick called “blend ‘right bad model' with negative weights”. In particular, Alex developed 60+ awesome engineered features from the given raw data set, which are later used as input for xgboost models. Our single best xgboost model trained on the engineered features scored 23.71706 on the public LB and 24.73612 on the private LB. Then we intentionally developed several xgboost variants which are trained on slightly different engineered features with different xgboost parameters, and an ensemble of these xgboost variants improved the score to 23.71083 on public LB and 24.72996 on private LB, which would give us 10th place finish on the private LB on its own. Next, Shize developed two Lasso models and added them into the xgb ensemble, which improved the ensemble score significantly from 24.72996 (private LB, 10th place) to 24.71707 (private LB, 4th place), which is our final best solution. Here we want to emphasize that, unlike in most practices where people aim at developing decent scored individual models to be added into the ensemble, the two Lasso models developed by Shize here are intentionally made as “bad models” by only training on 5 raw data features and only a small percentage (<20%) of train data, with a model total training time less than 1 minute. And, not surprising at all, these two Lasso models had very bad individual model score (24.1539 on public LB, and 25.16316 on private LB, which is even much worse than the given sample submission provided by the competition website). However, when Shize blended these two “bad Lasso models” into the xgboost ensemble with “negative weights” with a blending formula like “blend=1.19*xgb ensemble-0.065*badLassoModelA-0.125*badLassoModelB”, then these two “bad Lasso models” provided significant boosting to the ensemble score from 10th place to 4th place (both public LB and private LB). Such a novel trick of “blend ‘right bad model' with negative weights” was found by Shize in one of his past kaggle competition, and the effectiveness of this trick has already been verified by a couple of kaggle competitions Shize attended.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner