Is biotech data driven?
One of the things that I’ve struggled with since I started working in biotech, and that a lot of the folks I talk to struggle with as well, is communicating the value of the kinds of things I write about here, particularly the more mundane aspects of biotech data, to founders and leaders. There’s no question that most of them are excited about the potential revolution that’s coming to biotech, or maybe has already started, around AI and ML, fueled by data. But the harder part - both for less technical decision makers to understand and for the rest of us to coherently explain - is what kinds of investments are fundamentally important to fueling this revolution vs. which are just frivolous vanity projects.
One major goal of the System Evaluation I recently started offering (Yes, I have to sneak in a plug every week) is to give teams both a framework for explaining what projects are worth investing in, and an evaluation of specific projects. But I wanted to take a step back and think about the framing for this - how can we try to quantify the value that comes from investing in these kinds of things? So this week and the next few weeks, I plan to explore different angles on this.
I’m going to start this week with the idea of being “data driven” - what it means, the value it brings to a biotech organization, and a question: Aren’t biotechs already data driven? (Not in that order, though.) Since I know many of you reading this have experience trying to communicating this kind of value - in some cases more experience than I do - I’d love to hear in the comments what has or hasn’t helped you define the value of your work on the mundane aspects of biotech data.
What is Data Driven?
The usual definition I’ve seen of a data driven organization is that it makes decisions based entirely (or at least mostly) on data. There are two ways that an organization can fail to be data driven: You can ignore available data and make decisions based on experience/intuition. Or you can not have enough data, so you fall back on experience/intuition out of necessity.
Either way, it’s about making decisions and the distinction is between using data or using experience/intuition. The problem with experience/intuition is that it’s personal and individual: everyone has a different experience, so they’ll draw different conclusions from it. With data, there may be different ways to analyze it, but in theory at least there should be a single logical conclusion to be drawn. In cases where data is inconclusive, it’s usually gaps that could be filled with additional data. If you don’t have that data, you have to fill it with experience/intuition so you’re no longer data driven.
And the problem with everyone drawing different conclusions from their different experience is that there’s no objective way to settle those differences. You end up resorting to authority or whoever is (or acts) the most confident. And not only do you end up with worse decisions, but it takes times to sort out the politics. In a data driven organization everyone just looks at the data and the decision is obvious (at least in theory). The data makes the decision for you. It’s fast. And you’re more likely to get it right.
So is biotech data driven?
Well, strictly speaking no organization is completely data driven - it’s an ideal and there’s always going to be gaps in the data. Maybe a better question to ask is: when biotech organizations fail to be data driven, is it the first reason - ignoring available data in favor of intuition - or the second reason - not having enough available data?
It feels like it should be mostly the second reason: biologists love to sit around a nice graph and talk about the nuances of each data point. It’s one of the things that bench teams and data teams have in common. It unites us and it’s beautiful and we should celebrate that fact.
And look - biology data is expensive. Have you looked at the cost of reagents? Experiments are slow and narrowly focused by necessity. The data that biotechs generate is limited by it’s nature, so the gaps are always big. There’s never enough available data.
And yet…
If we look back at that first scenario - ignoring available data in terms of intuition - I can’t help noticing that the term “available data” can be interpreted pretty broadly. Does it include all the public data on the internet? Even the unstructured data that could be scraped from Pubmed and the bioArxiv? Does it include the data from experiments you could’ve run but didn’t?
In other words, I think there are ways in which biotechs teams subtly pick intuition over data while telling themselves it’s because the data isn’t available or is too slow and expensive to collect. Not because they’re lazy. Not because they’re missing something obvious. I’ve been in those discussions and drew the same conclusions.
But the question is: Can we change the calculus, by working on the mundane parts of the data, enough to materially expand the data that’s practically available?
Then where’s the value?
As you read this, there are teams around the world working on new instruments and new technology that will lower the cost of generating biology data. We’ve already seen order of magnitude reductions in cost and there’s every reason to believe that will continue. This covers the cost side of the cost/benefit analysis.
On the benefit side of the equation, we usually think of the models and algorithms that will be unlocked once we can train them on all this new data. But to make that possible, a lot of things need to happen between the lab and your EC2 instance. Each step is a potential bottleneck. A potential source of friction. When that friction adds up, it can make simple things painfully difficult. And when simple things become painfully difficult, most people stop doing them, or at least cut back.
There’s a phase shift that happens in how we use things that go from being available but difficult to ubiquitous and easy. Before cell phones, there were coin-operated pay phones, but the way we use cell phones is fundamentally different (even before you add apps). Can the mundane work we do create a fundamental shift in how biotechs use data?
Today, all biotech startups use data to make core decisions like selecting hits and designing assets. But on the periphery, there are plenty more decisions where they rely on experience and intuition. Startups with amazing platforms that went under because they picked the wrong target. Teams that lost months because they picked the wrong assay. What would biotech look like if all these decisions were data driven?
Conclusion
In the end, biotech is much more data driven than many other industries. Moreover, data is a fundamental part of most biotechs’ identity and culture and I believe most would jump at the opportunity to become more data driven. What’s missing is a clear vision of what that would look like and how to get there with a limited budget.
The hard part, and the responsibility of those of us who work on the mundane parts of biotech data, is to figure out how to communicate that vision.