Are we swarming around the wrong problems?
In the course of making the AI Solutions Library that I’ve been hawking the last two weeks, I put together a list of all the companies that I eventually want to profile. But I think this list is probably generally useful. So I decided to publish it as a site I’m calling the Biopharma AI Landscape. It only has a very brief description of each company, and some rough, preliminary categories, but I think even that shows some trends that I wanted to write about this week. (More details about each company will be in the Solutions Library.)
In particular, I want to talk about the long-tail distribution of companies across the different categories. Of the 35 categories, there are 3 that have over 30 companies each, and 9 that have 3 or fewer. Now, this isn’t the most lopsided distribution ever, but it raises some questions about where new founders should be building.
First, some background on where these numbers come from: The 321 companies are from a variety of sources, including Anna Marie Wagner’s excellent list of AI for Science Companies, companies I’ve stumbled across on LinkedIn and companies whose founders have reached out to me over the years. It is by no means complete and I plan to keep adding to it as I find more. (If your company isn’t on there, please email me at jesse@merelogic.net or use the form on the website.) I restricted the list to companies that offer services or partner with drug discovery companies, as opposed to purely pipeline-focused companies. This turns out to be a fuzzy distinction, so I’m sure I messed that up a bit too. (Again, please send corrections.)
Putting the companies in categories was also a fuzzy problem. I spent a long time trying different ways of doing it with an LLM, but those all made a lot of mistakes. So I ended up going through manually, but I’m sure I still got some wrong. (Again, corrections welcome.) Plus, because of how I created the list of companies, it’s probably biased towards the early discover categories. I’m sure there’s a bunch of companies missing in categories related to clinical and commercial.
All this is to say that the numbers in these categories are at best approximate and should be treated as a lower bound rather than an upper bound on the actual numbers.
Still, I think the overall distribution is telling: A large portion of today’s AI for pharma/drug discovery companies are focused on just three problems: Small molecule binding prediction, protein design and omics analysis platforms. I’ve argued in past posts that to actually address the existential problems of drug discovery, AI needs to fundamentally transform the drug discovery process, but these three categories are all mostly addressing individual steps in the existing approach. I think that’s a problem.
I suspect the reason these categories are so crowded is that they address straightforward, obvious problems. Small molecule binding and protein design have well defined technical objectives and can be measured with relatively straightforward benchmarks. Omics analysis is a well defined family of workflows that bioinformaticians and computational biologists spend a lot of time on.
Finding more transformative approaches, and finding the wedges that will get pharma companies to actually adopt them, is not just a hard problem but a really messy one. Most startup founders don’t have the kind of pharma experience that would allow them to jump right into that kind of problem. And I have no idea what we can do to fix that.
Also, don’t get me wrong: These three over-crowded problems are still really important to solve. I don’t know what the “right” number of companies would be. Maybe part of the natural evolution of these categories is a surge of competitors who then merge, pivot or go out of business until the strongest survive. We’re recently seeing this in the omics analysis category.
So I guess at this stage, I just want to ask the question: Are there problems and categories that the industry is overlooking? What are they? And what would a better distribution look like?
As I get more feedback on the landscape, become more confident in the numbers and get a more detailed understanding of these different categories and solutions, I’ll report more on what I see. Hopefully it will help us all start answering some of these questions.

