Don't let Machine Learning be the enemy of Data Science

Some of the most impactful outcome of data science don't involve predictive models

Jan 27, 2021

You probably have a weather app on your phone, maybe more than one, that you look at regularly. This app has two main functions: It tells you about upcoming weather and it tells you about the current weather.

The upcoming weather comes from a complex predictive model, or more likely an ensemble of models developed by teams of researchers over decades.

The current weather is read from a sensor on someone’s roof.

But despite the vast difference in complexity, both are extremely useful. I probably look at the current temperature in my weather app as often, if not more often, than the forecast - often enough that I would still want the app even if the predictions weren't there.

The rise of data science has been largely driven by the goal of applying machine learning to make predictions about a wide range of problems. But most data scientists will admit they spend more time on "data wrangling" than building models.

In practice, the biggest benefit of data science is often that it forces organizations to improve data accessibility and quality - prerequisites for machine learning, but also for more mundane applications.

The weather app knows the current temperature because we need it to predict tomorrow's temperature. But if we couldn't predict tomorrow's temperature, would we have gone through the trouble to collect today's?

What problems in your organization could you address by just collecting a new form of data and making it accessible to users? Which of those problems are you ignoring because they don’t require machine learning?

Scaling Biotech

Discussion about this post