Build vs Buy in the Age of Vibe Coding
My post last week about Tahoe Therapeutics’ unusual business model got me thinking about moats again - what makes a product or a service hard for competitors to replicate? This week, I want to take a different angle on this question. In biotech/pharma, the biggest competitor is usually potential customers building their own version from scratch. So what does the build vs buy question look like for AI-for-drug discovery software/models/etc., particularly as AI is making it easier for pharma teams to build their own?
Every pharma company I’ve worked with has multiple custom built internal systems that they probably could’ve used off-the-shelf software for. These usually aren’t for highly regulated pre-clinical and clinical activities. And custom ELNs/LIMS are becoming less common. But when you get into computational biology, machine learning and even more traditional analytics/business intelligence, suddenly everything seems to be built (by consultants and contractors) instead of bought.
Will the same thing happen with generative models and other AI tools?
From a macro perspective, this seems very inefficient: all these companies are independently building their own versions of what often turns out to be essentially the same thing. But there are legitimate reasons to build your own, such as having direct control of design, functionality, the roadmap, etc. And you have something that potentially differentiates you from competitors.
The tradeoff, of course, is that you have to cover the entire cost of development, compared to sharing the cost with all the other customers of the off-the-shelf software. But code is cheap compared to the overall cost of running many of these systems. And once you configure it, migrate the data, bring in a team to keep it running smoothly, the development cost may actually be less than the extra cost of working with a system that doesn’t quite fit your needs.
Meanwhile, constantly improving open source libraries have been making software development even cheaper, before we even talk about AI. You probably shouldn’t be vibe coding production tools on billion-dollar drug programs, but AI-assisted code editors make coding a whole lot faster with very little downside.
So, given that AI has its thumb on the build side of the build vs buy scale, what are the things that should make pharma teams put more serious thought into buying? And what can product/service providers do to give themselves a fighting chance?
The answer that most people will jump to is data, since it’s expensive and time consuming to create, particularly in biopharma. But I don’t think that’s quite the answer either. Data that someone else has collected is generally not that valuable on its own. Or, at least, its value isn’t significantly more than what it would cost you to go and collect it yourself.
For some kinds of data, like RWE (real world evidence - clinical and medical datasets), are impossible to collect without the right contracts in place. But for in vitro, early discovery data, pharma companies with large existing labs can usually generate it faster, cheaper and at a larger scale than any startup that might want to sell it to them. And again, that means it’s unique, proprietary data.
What pharma teams often don’t have is clear direction on what data to collect or the time and focus to learn how to apply that data to solve a specific problem. In other words, you don’t get a moat just by collecting data. You build it by figuring out exactly what data to collect to solve an expensive problem, collecting it, then figuring out how to use it to solve that problem.
The moat isn’t the data or the code - it’s the knowledge of what data to collect and what code to write around it.
It’s hard to build that kind of moat for just software because the knowledge of how to build a good bioinformatics platform, data catalog, etc. varies considerably by company that’s using it, plus it doesn’t feel that hard to figure out (whether or not it actually is.)
The knowledge of how to wring the most actionable information out of a complex proprietary dataset is also very specific to the context of that dataset. But if it allows you to solve a widely experienced, expensive problem, that doesn’t matter.
I think that’s why projects involving foundation models have such large contract values. Even though they’re not solving specific problems, they involve specific knowledge about how to turn large datasets into solutions for lots of different (vaguely defined) problems. That’s knowledge that’s hard to replicate, particularly for most large pharmas.
So if you’re a pharma team trying to decide whether to build or buy your next AI solution, don’t think about the cost of writing the code, or of generating the data. Ask yourself what it will cost to figure out what data to generate, then learn what to do with it.
And if you’re a would-be startup founder, ask yourself how you’ll be able to do it faster than them.