Sometimes, breadcrumbs are enough
*** A quick plug: I was recently on a podcast with Vicky Li Horst about product management in biotech. We’re thinking about turning it into a recurring podcast depending on the feedback. Let me know if you think we should. ***
This week, I want to write about a situation that many biotech data teams face relatively early on, when they start to switch from explore mode to exploit mode, and suddenly realize that they haven’t been keeping good track of their data. I’ve written about this previously, but I wanted to dig a little more into what this it like and what it feels like.
In the early days of most biotech startups, the goal is simply to validate a hypothesis that will justify further investment into the scientific direction and into the company. This might be as simple as demonstrating that you can measure a particular protein target. Or maybe you’re proving that a complex phenotypic readout corresponds to a disease state. Or maybe it’s about showing that you can design molecules or proteins with particular properties.
Whatever the hypothesis, the only goal is to figure out how to do something, then demonstrate that you can do it. And, of course, you need to do this as quickly and inexpensively as possible. In the course of this, you’re going to be exploring a lot of different approaches, trying different assay, doing one-off experiments, generating one-off datasets.
Now, you could carefully organize all that data as you go. You could sit down before you get started and come up with naming conventions and a folder layout that will ensure all that data is easy to find and reuse in the future. You could carefully record all the details in a centrally accessible database.
But that would be a bad idea.
Ok, so, yes, those are exactly the kinds of things I tend to advocate for in this newsletter. But go reread that third paragraph. The only goal is to move quickly, and for most of these datasets, you’re never going to look at them again. And even if you wanted to use consistent naming and organizing conventions, the set of things that you need to organize is constantly evolving. As soon as you come up with a scheme, you’ll need to change it.
So not only is this kind of organizing work hard to justify at this stage, it’s effectively impossible to do well.
The problem, of course, is what to do once you validate that hypothesis and want to switch from explore to exploit. Now that you can measure the target, you’re going to start measuring it a lot. Now that you know what phenotype to look for, you’re going to be looking for it. Now that you know you can design molecules, it’s off to the races.
At this point, most of those exploratory datasets will become irrelevant. It’s fine that you can’t find them. But depending on which assays and experiments ended up working out, a few of them are going to suddenly be very important. Those are the ones you’ll need to find.
I know teams that have spent months looking for old assay results. Usually it’s because the one person who knew where they were left the company. Often it was only because of a random conversation or a bit of luck that they were ever found. Often they ended up just re-running the experiment.
So this seems unavoidable: It’s a bad idea (if not impossible) to carefully organize everything. But if you don’t, you’re just setting yourself up to lose data that you’re going to need.
Luckily, I think there is a middle-ground that can work. The key is to recognize that you don’t need the data to be carefully organized. You just need to be able to find it.
What’s the difference?
Well organized data follows consistent and intuitive conventions that allow users to quickly find and use what they need without expending a lot of mental energy.
For that early data, you just need some way to find it. It doesn’t have to be convenient. It doesn’t have to be intuitive. It doesn’t have to be fast. It just has to be a trail of breadcrumbs.
So if you’re on a team that’s in the exploration phase, here’s the question: What’s the trail of breadcrumbs that you can lay down without slowing things down, but that will lead you back to the data when you eventually need it?