The emerging scientific stack for digital drug discovery
When I entered the biotech world back in 2019, a number of startups were framing their platforms as modeling biology at multiple scales from the molecule up to the whole organism. At the time, this seemed mostly conceptual and aspirational, at least to the extent that it wasn’t purely marketing. But today, we’re starting to see more clearly defined and broadly adopted components of this potential “stack,” bringing it into the realm of the practical, or at least to the edge of it.
This week I want to outline what this stack is starting to look like. In the coming weeks, I’ll dig into the individual layers in more detail.
My thinking on this was heavily influenced by this paper that came out of a Chan-Zuckerberg Institute workshop and outlines a technical implementation of the kind of stack I’m talking about. I’ll touch on some of the technical details they suggest in future posts, but today I want to focus on the high-level picture, and I’ll actually suggest a broader scope than they cover.
In particular, I want to sketch out a framing of some of the different kinds of AI/ML tools/models that exist today, to show how they could fit together. What makes this conceptual framing more relevant today than it was back in 2019 is the emergence of transformer models as viable tools. Because of how they create a common technical framework for all the different levels of scale, they have the potential to form the communication layer/connective tissue between the different levels of scale.
There’s an ongoing debate about whether transformers can completely replace the existing models for these different problems, and I’m not ready to weigh in on that. But even if they’re only complementing the existing approaches, it seems like transformers will at least play a large enough role to be the scaffolding.
Next week I’ll go into more detail about how transformers work in this context, and at these different levels of scale. But for now the conceptual framing should make sense without those details.
Basically, I want to think of each level of scale as a specific modeling problem with inputs and outputs such that the outputs at each level become, roughly, the inputs at the next level.
Molecular Scale
At the bottom, we have the molecular scale. Here, the inputs to a model are abstract representations of the molecules and the output is a prediction about how they interact. For small molecules, that means two-dimensional molecular diagrams. For proteins, RNA and DNA, that means sequences of nucleotides or amino acids. A model should take some number of these representations and output a prediction such as binding, regulation, etc.
The most common version of this is a model that predicts how small molecules bind to proteins. These include physics-based binding models that work across proteins with known three-dimensional structures, and models trained on binders for individual proteins that can predict (or generate) additional binders. As I’ve covered in previous posts, the newest generation of protein foundation models don’t just predict protein structures - they also show promising results predicting binding between those proteins and small molecules. And they do this based on just the protein’s sequence, without necessarily knowing it’s ground-truth structure (though that helps.)
Yes, I know that these models are trained to predict protein structures, and that’s how most people think about them. But the only reason we care about protein structures to model and predict binding. So for the purposes of this post, they’re binding models.
Cell Scale
The next level of scale is the cell. Here, the inputs are direct interactions between molecules and the outputs are predicted cellular responses to changes to those molecules. Or a slightly less useful way of saying this is, based on direct interactions between molecules, predict the indirect interactions.
The most common version of this is predicting how RNA expression levels will change in response to perturbations such a knocking out or knocking down a gene or introducing a small molecule that interacts with one or more of the molecules in the signaling pathways for gene expression.
In theory if you know how all the molecules interact with each other, you could just piece together those direct interactions to figure out the indirect interactions. And that’s what the early models in this space did, using databases of signaling pathways (before they were called virtual cell models). But biology doesn’t care about your rules. In practice, the databases of interactions are nowhere near complete, and even if they were they don’t capture the nuances of how molecules really interact.
So the newer generation of virtual cell models are trained directly on data from perturbation experiments, with the goal of generalizing to other types of perturbations, particularly combinations of perturbations with non-linear/non-additive behavior. There are also hybrid models that incorporate information from the signaling pathways into the models trained directly on experimental data.
Putting them together
Now, it would be natural to read this and think “Wait - if the cell models are learning from experimental data rather than from individual/direct interactions then why do we need the first layer to predict those individual/direct interactions?”
And yes, cell models don’t need the first layer to predict interactions between known molecules, particularly the DNA, RNA and proteins associated with the 20K or so coding genes in the human genome. But they do need that layer to predict interactions between the known molecules and the billions of novel small molecules, proteins, etc. that you want to screen for your next drug program.
The reason the virtual cell models (by that name or another) haven’t caught on as quickly as binding models is that there hasn’t been a practical way to use them on completely novel compounds, proteins, etc. To predict a cell’s reaction to a potential drug, you need to know a lot about what other molecules it interacts with. As of today, that means doing a bunch of experiments, which means that you’ve already narrowed your search to a small number of drug candidates, which means that you may as well just run the perturbation experiments directly.
In other words, binding models on a single target can be used to screen millions or billions of molecules. Virtual cell models just can’t do that. (They can do other things, and do do other things, but those things are generally more peripheral… That’s probably a good future post.)
Now, single-target binding is a kind of lousy way to screen new drugs compared to the more wholistic predictions you can get from a cell model. But something that works is better than anything that doesn’t. Plus, single-target binding screens have been the backbone of drug discovery for as long as anyone doing it today has been around, so you’re not going to get much push back.
However… if you can predict the direct interactions/binding between millions of potential drugs and a large collection of genes/proteins with reasonable accuracy, maybe even all 20,000ish genes, you could plug that into a virtual cell model, and suddenly you have digital screening at a level of detail that is unfathomable today.
That isn’t currently possible, but it wouldn’t be a terrible bet to suggest that it could be in the next few years.
Tissue scale (and above)
Once you see the pattern between those two levels of scale, you can see how it could keep going. At the next level of scale, the inputs are the indirect impacts of perturbations on individual cells and the outputs are the cumulative effects on the collections of cells that make up tissues, organs, organisms, etc. The exact separations between the layers gets fuzzy at this point and we move into the world of Quantitative Systems Pharmacology (QSP).
Again, we should in theory be able to use our knowledge of individual cells to directly build up models at these higher levels of scale. But also again, biology doesn’t care about your rules. We need different models trained directly on experimental data, not to mention the real world and clinical data that becomes relevant at this scale.
Interestingly, the paper I linked above doesn’t talk about modeling above the cell scale. I’m guessing that’s because QSP is mostly used by clinical and translational teams, and the product reflects the org chart. So this seems like a great opportunity to break down some barriers, and maybe address the information bottleneck between early discovery and clinical. But I probably won’t get to this until I’m well into this deep dive.
Conclusion
It goes without saying that everything in this post is right on the cusp of speculative and achievable. To the extent that drug discovery and development teams are using these tools today, they’re working at one level of scale at a time. But even there, we’re starting to see tangible results. And I think that understanding this bigger picture, this potential scientific “stack,” both opens up possibilities for using the individual tools more effectively and can help create the foundation for the future.
In the upcoming weeks, I’ll dig into the details of how the individual layers of this stack work, how they can fit together, and how pharma and biotech teams are already starting to use them today.
Stay tuned!
I help biopharma teams integrate foundation models into their discovery and development pipelines. If you have data and a mandate, I’ll show you what’s around every corner ahead. Send me an email at jesse@merelogic.net and we can explore whether we’re a fit.


I like the “stack” framing because it forces clarity about interfaces between models, not slogans about “multiscale biology.” The point about org charts shaping tooling is sharp. If you build a follow up, I’d love a concrete example workflow showing handoffs between layers, including where validation usually breaks.