Last week, I started to explore the state of categories of software supporting biotech data, particularly software supporting the kinds of data teams working next to bench teams that have become much more common in the last 5-10 years. This week, to get a better idea of what’s going on, I want to explore how categories tend to emerge in new fields as software starts to eat them, so we can understand where we are and where we’re going.
I first witnessed this with the emergence of the Data Lake and Data Catalog categories in the last five to ten years. Folks who have been in biotech longer than me probably saw the same thing with ELNs and LIMS. And I’m sure there are examples going back much farther. Here’s how it works:
It starts with a shift in an industry that creates needs or opportunities that aren’t met by existing commercial software. For Data Lakes and Data Catalogs, the shift was the emergence of “Big Data” that was too complex for the existing databases. For ELNs and LIMS, it was the emergence of the potential to replace paper with digital records. (One might argue this shift took longer because it was the emergence of a capability rather than a need. But that’s a different post.) For the change we’re seeing today, it’s the cultural shift from bench scientists managing experiment + analysis end-to-end, to having a separate data team in the loop.
When a shift like this begins, the first people who notice are the folks working within the teams/companies that need the software. Since they can’t find commercial tools to meet their needs, they build their own. This is Phase 1: You have a bunch of home-grown tools that are being written within the same companies where they’re used. Each of them is addressing an immediate and specific need, so these solutions may look very different from each other. In industries where there’s a culture of openness and sharing, patterns may start to emerge. But biotech/pharma is not such an industry.
Phase 2 starts when the people who wrote these homegrown tools start to get antsy building for just their own team and decide to found startups to build commercial versions. But there are two tricky parts about this phase: 1) Because the patterns haven’t emerged yet, it’s hard to figure out what will be useful for more than just the one company where you got the idea. 2) The new software options have less functionality than the internal tools that have been developed for the last N years, so it’s hard to convince companies to adopt them. (Remember, at this point all these companies have developed their own versions internally.)
This slowly blends into Phase 3 where you have some relatively mature commercial options and some new potential customers that haven’t built their own yet. But there are still two problems: 1) There hasn’t been enough adoption of the commercial options for the industry to form a consensus on the right way to solve the problems and 2) many potential adopters still think they’re in Phase 1 or 2 so they’re reluctant to adopt, or even consider the commercial options.
Closer to the end of Phase 3, the commercial options begin to pivot and adapt. Or they merge or go out of business. But one way or another, as they find product-market fit, they form into categories that meet broad, clearly defined needs. Phase 4 starts when the industry forms a consensus around these categories, and the teams that need them recognize that it would be crazy to build them from scratch. The industry shifts to more of an IT-based model focused on selecting the right software using time-tested, safe frameworks over innovation and internal software development.
I think that when it comes to software supporting data teams in biotech, we are in the early part of Phase 3. We have more recently founded software startups than even I can keep track of. But many biotech startups are still building their own homegrown versions of these tools and, of course, we don’t have a common sense of what the categories are (yet).
The bad news is that Phase 3 is the most painful phase because it’s such a major transition. It’s the middle school of emerging software markets. (Apologies to readers from educational systems that don’t follow the US model.)
The good news is that we’ll eventually get through it. And sure, the idea of a more IT-like approach to selecting software may seem much less fun and exciting. But having established practices and decision making frameworks will ultimately allow the broader biotech industry to focus on the science and avoid unforced errors from the technology. Plus it’ll be a lot easier for software companies to communicate what they do to the teams who need them the most.
One way or another, we’re going to get there. I don’t know how long it’ll take, but I know we’ll make it. And until then (and probably even after) I’ll be here writing about it.
Thanks for reading this week’s Scaling Biotech! I really appreciate your continued support, and I read every comment and reply.
As a reminder, I offer several services to help connect biotech teams with tools, practices and expertise to make their organizations more data driven.
For Biotech Startups: Sign up for a free consultation call to clarify a problem you're facing and identify the best options to evaluate.
For Software Startups: Add your application to the upcoming launch of the Biotech Reference Stack so your target users can find you.
For IT/Informatics Consultants: Learn how Merelogic can help you write white papers and case studies to define and demonstrate your specialized expertise.
Great analysis, wish I had more to say. My interest is in research methods in general, not just biotech, so I really appreciate that this model can be applied to any subject.