Don't let good be the enemy of good enough
As we work our way backwards through the different capabilities that biotech startups need their data tooling to support, we’ve so far gone through decision making and analysis. This week, I want to cover one of the last things you do right before you start the analysis: finding the data. This is closely related to an even earlier step that we’ll cover a bit later - evaluating what data you have (or can generate) to define your approach and strategy. In particular, the both rely on something that’s often surprisingly difficult: Knowing what data you have and where it is.
For a while, I always thought of this as a big pharma problem. They have decades of data in a sprawling mess of legacy systems and from all sorts of different sources, often generated or imported by people who are no longer at the company. There’s a whole industry devoted addressing this problem with their own conferences and organizations and acronyms like FAIR.
Most biotech startups, on the other hand, only have a few core sources of data and haven’t been around long enough to have legacy systems or more than a few orphaned datasets. So it’s a completely different magnitude of problem - but still something that needs to be addressed.
In fact, I think the main reason teams get stuck on this is that they see it as simultaneously too small of a problem and too big. If you’re a small startup surrounded by existential problems, centrally tracking data isn’t one of them. Worst case, it will slow your data scientists down a bit. It’s too small to worry about today.
On the other hand, it feels like a big enough problem that you’re going to need a plan and resources, and probably someone who knows what they’re doing, before you try to tackle it. And the longer you leave it, the more this feeling grows, until it becomes one of those problems that you try not to think about because of the sense of dread and embarrassment it induces. Your technical problem has turned into a psychological one.
But in practice, it was never really a technical problem in the first place. More of a process problem. You’re just holding off on the process while you wait for a technical solution. Instead you should use whatever tools you have at hand, even if it’s just a spreadsheet. Then find a person whose job is to make people fill in the spreadsheet. (They will probably need to fill it out for the other teams a few times to show them how it’s done.)
For a few core data sources and a small team, this is all you need. And when you’re eventually ready to adopt a production-grade solution, this spreadsheet will save you months of work.
The most important aspect of this approach is that you can do it today. In fact, you can do it as soon as you finish reading this email (and forwarding it to a dozen of your closest friends.) Make sure it’s a shared spreadsheet in Google docs, or Sharepoint/Teams, or the like. The hard part is finding that one person who can get everyone else to fill in the information. But that’s going to be hard no matter what technology you use. At least with a spreadsheet, there’s no excuse to wait.