Did a foundation model just solve virtual small molecule screening?

Jun 11, 2025

Last week, a team of researchers at MIT, with support from Recursion, published a new foundation model called Boltz-2 that a lot of people are very excited about. As someone who's mostly worked on small molecule discovery, the thing I'm personally most excited about is the claim related to binder prediction. So this week, I'm going to explain why I'm excited about it and describe a trick the authors used to beat Alphafold3 on this task.

In between writing these posts, I'm building a catalog of scientific AI x Bio R&D use cases, to help pharma/biotech leaders decide where to invest. I’m looking for feedback on an early prototype I just published. Check it out and send me any suggestions/corrections/additions at jesse@merelogic.net.

First, for some context, let's talk about what binder prediction is. The way most small molecule drugs work is that they interfere with a protein by wedging themselves into the functional part of the protein.

Traditional drug development would start with large-scale in vitro experiments to see which compounds will bind to the protein of interest. But synthesizing the compounds and running the experiments is expensive. So if you can train a model to predict the outcomes of those expriments, you can save a bunch of time and money.

Bolts-2 claims to do exactly this, but the exiciting part is how it does it.

I argued recently that there are basically three kinds of models: 1) Mechanistic models that use information about how biology "works", 2) Black box models that learn directly from data from narrow, for-purpose experiments and 3) Knowledge models that use broad information from published results.

Knowledge models for binder prediction exist, but they aren't very useful for finding *new* molecules. So let's ignore those for now.

Mechnistic models for binder prediction start with the three-dimensional structures of the protein and the small molecule, then basically try to fit them together like puzzle pieces.

The predictions aren't perfect because these geometric models are simplified. (All models are wrong, some are useful.) The real problem is that they're slow. There are a lot of possible configurations and the models have to try each one.

Black box models, on the other hand, are just ML models trained on the results from those in vitro assays I mentioned above. They tend to be a lot faster than the mechanistic model, and can potentially pick up on some of the biological and physical complexity that was dropped out of the mechanistic models.

The problem is that to train the model, you need all that (expensive, slow to generate) in vitro data. The mechanistic model runs as well on a brand new target as on an old, long-studied target. The black box model does not... or at least, hasn't until now.

In the past, black box models didn't know anything about the structure of the protein target. So they had no way to transfer information from one target to another.

Protein foundation models, on the other hand, do know something about the structure of the protein. (That's kind of the point.) So it's natural to try and make them translate that information into a binding model.

Note: I still consider them black box models because the mechanisms aren't directly programmed into the model. They learn the mechanism purely from data.

Recall from my post a few months ago that protein models like Alphafold work by building up an (uninterpretable) embedding vector that encodes the relative positions of the amino acids. You can then slap some prediction layers on top that translate the embedding into these distances.

Boltz-2 slaps another set of layers on top that allow you to toss in a small molecule and calculate the relative distances of its atoms to those same amino acids (at the speed of a neural network, not a slow mechanistic model.)

Alphafold3 could do this too, but apparently not nearly as well as Boltz-2. And while I'm sure there are a lot of things the Boltz team did to make this possible, there was one particular trick that stood out to me.

In general, the bottleneck to building a more acurate or more powerful model isn't the model architecture. It's the data. And while you can address that bottleneck by just collecting more data, it's even better to use the data you have more efficiently.

In particular, binding data is sort of T-shaped: You have noisy, sparse data about a huge number of small molecules that have gone through high-throughput screening. These assays are designed to be cheap and fast, and you get the data that you pay for. A binary yes/no at one concentration on the compound.

Then there is much higher-quality data about a small number of molecules that turned out to be interesting. This data has more reliable readings, usually at multiple concentrations.

The binary signal from the wide, cheap data is useful for finding the general shape you want for the small molecule. (The area of chemical space to start with.) But to refine that structure (aka optimization) you want to find the molecule that is effective at the lowest possible concentration. So binary data alone won't cut it.

The clever thing that the Boltz team did was to find a way to train the model on both wide binary data and the narrow concentration data. This allows Boltz-2 to make predictions that are very accurate while covering a wide area of chemical space.

Now, this is all based on the self-reported results from the Boltz team. The real test will be whether these predictions are accurate enough for biotech/pharma teams to actualyl use them for screening. The model has been made publicly available (see the FAQ for how to use it) and apparently lots of teams have already started using it. So we may have results like this fairly soon.

Thanks for reading Scaling Biotech!

Scaling Biotech

Did a foundation model just solve virtual small molecule screening?

Discussion about this post