Comparison of Regression Algorithms in Multivariable Drought Modeling and Analysis

Chzhen, Maria; Chzhen, Maria

Drought is an extended dry period that any region can experience. The period is characterized by a lack of precipitation which can impact local communities by causing water shortages and affecting crops. More than 11 million people have died of drought-related causes since 1900, and more than 2 billion people have been affected, making drought one of the most significant environmental problems. The purpose of the study is to model the correlation between drought and various climate factors by using and comparing the predictive power of different machine learning algorithms, such as random forest, support vector machine, decision tree, and multiple factor regression. Despite their strong ability to model phenomena in other fields of study, regression algorithms have not previously been deployed in drought analysis. More than 25 gigabytes of geospatial data are extracted from the TerraClimate database to achieve the aim, and multiple statistical techniques are used for the analysis. Additionally, correlational and temporal models from 1995 to 2020 with prediction intervals are constructed for each climate variable, including wind speeds, temperature, and evapotranspiration. The tools used for the study are Python and its data science libraries. As a result of this study, the random forest regression model generally had the highest predictive power compared to other regression models. For future consideration, the report also discusses region-specific measures humans can take to decrease the likelihood and impact of droughts. The project is essential in helping identify the climate patterns accompanying drought, examining novel analysis methods, and increasing our understanding of drought processes.

15A.3 Comparison of Regression Algorithms in Multivariable Drought Modeling and Analysis