TJ4.1 Machine Learning and Statistics: The Yin and Yang of Data Science (Core Science Keynote)

Tuesday, 9 January 2018: 10:30 AM
Room 7 (ACC) (Austin, Texas)
William W. Hsieh, Univ. of British Columbia, Vancouver, BC, Canada

Since machine learning (ML) and statistics both try to extract information from data, a common question is: are the two actually the same? With different origins -- statistics arising from mathematics, and ML, a main branch of artificial intelligence, developing primarily from computer science -- statistics was the first to emerge, but after the advent of the internet and the rapid rise of tech companies like Apple, Alphabet and Microsoft (currently the three largest companies in the world by market capitalization), ML has enjoyed spectacular growth. The different origins and separate cultures led ML and statistics to occupy the yin and yang (i.e. dark and bright) parts of data science. Environmental scientists are far more comfortable with statistics than ML, as tools in ML are often considered "black boxes". This unfortunately led to the under-utilization of ML in the environmental sciences, which otherwise could have made a contribution. Data problems in the environmental sciences are not entirely similar to those in mainstream ML, so there are interesting challenges in developing ML methods for environmental science problems.
- Indicates paper has been withdrawn from meeting
- Indicates an Award Winner