Anyone who studies statistics and machine learning will notice that both subjects attempt to solve the same types of problems using very similar techniques.
So what’s the difference?
Statistics developed at a time when computers were slow or non-existent. Only small amounts of data could be effectively collected and analyzed. Experiments were expensive.
So statisticians developed theory to figure out how much data is necessary, what data will get the most mileage and how to squeeze the most out of it.
Because they were on the expensive side of the Experiment Cost Inflection Point, they used theory to make their experiments more reliable.
But all this pre-experiment analysis also made the experiments more expensive.
Machine learning, in its contemporary form, developed in a context where fast computers can gather and process endless quantities of data.
Instead of estimating how much data you needed to make a model accurate, ML measures it directly with cross-validation.
If the model isn’t accurate enough, just collect more data or try a different model.
Experiments are cheap, so why not?
Contemporary ML involves theory, but the theory emphasizes replacing domain knowledge with learning from direct observation, making already inexpensive experiments even more so.
(Early ML, in the context of expensive experiments, was the exact opposite.)
So while both statistics and (contemporary) ML have similar goals and similar tools, they sit on different sides of the inflection point.
And that makes all the difference.