Sarah Balkissoon1, Neil Fox1, Anthony Lupo1, Sue Ellen Haupt3, Stephen G.
Penny4,2, Steve J. Miller5, and Maggie Beetstra6
1Atmospheric Science Program, School of Natural Resources, University of
Missouri, USA
2Cooperative Institute for Research in Environmental Sciences, Boulder, Colorado,
USA
3Research Applications Lab, National Center for Atmospheric Research, Boulder,
Colorado, USA
4Sofar Ocean, San Francisco, California, USA
5Department of Environmental Studies, University of Colorado, Boulder, USA
6The Nurture Nature Center
Abstract
This study seeks to determine those, within the population of Missouri, who belong to the different energy poverty classes using various Machine Learning techniques. To accomplish this, we used a dataset of 776 060 data points consisting of categorical values including of type of tenants, year building was first constructed, and primary heating type. The predictand, the risk type categories, was determine from the four-quadrant approach. The lines of demarcation of the four-quadrant approach were yearly income based on minimum wages in MO in 2018 and yearly expenditure exceeding 10% of income. Expenditure risk occurs when the household’s income was greater than the annual accumulation of minimum wages, but the percent expenditure is greater than 10 percent. Income risk is where the annual income is less than or equal to the defined income line of demarcation and energy expenditure is less than or equal to the 10 % line. Double risk is where households spend more than 10 % but earns less than the yearly minimum wage accumulation for 2018. Three data sets were considered. The first had all the variables except for fuel expenditure as including this feature will incur 100 % prediction accuracy of methods. The second had all variables except expenditure and income. Thirdly, the last permutation only considered the income column. The first dataset incurred the least errors when input into various models such as decision trees, random forest, extreme gradient boosting and support vector machine to determine the energy poverty classes of the test sets. For the tuned hyperparameters, thus far of all the simulations, the extreme gradient boosting is outperforming the other models, having an accuracy score 90.5%. This study is done with the intention to aid policy makers with the tools necessary for them when constructing frame works for those within the population that are most vulnerable to climate change impacts on energy availability.

