Beware the long tail of orthogonal assays
*** A quick request: I’m thinking about ways I can help biotechs identify and select software. If you’ve recently picked out new software for your team or are currently going through the process and would be willing to answer a few questions about it, please send me an email (jesse@merelogic.net) or find me on Bits in Bio Slack. ***
I want to switch gears for the next few posts and get back to writing about technical and operational problems that many data driven (or aspirationally data driven) biotechs run into. But in my ongoing effort to make my posts less abstract, I’m going to focus on issues that come up at specific points in their growth and evolution. If you can start planning in advance for these situations, or at least diagnose them more quickly because you know they’re coming, they should be easier to manage.
I’ll kick things off with a situation I’ve seen in a number of the startups in my first cohort of (what I’m now calling) stack audits: The organizational shift that happens as you transition from focusing on a single primary screening assay to a bunch of different orthogonal/secondary/functional/whatever you call them assays. In fact, I’m realizing this may be one of the best reasons to sign up for a stack audit.
Let me explain.
The funnel
For context, let’s start with the screening funnel: Many drug development organizations are built around a process in which a large number of compounds or peptides or whatever, are put through a primary assay to filter them down to a smaller number of compounds/peptides/whatevers, which are then filtered through further assays. This series of progressive filters are often drawn as a downward-pointing triangle (the funnel), in which each horizontal level represents how many are left after each assay.
But this shape is a bit misleading: Often, most of the filtering happens at the first step where tens or hundreds of thousands, or even millions of designs get filtered down to dozens or maybe single-digit hundreds. And even when you begin to iterate after the funnel, the number of compounds/peptides/whatevers tends to stay similarly small. This bottleneck comes from the cost of physically generating custom designs, so even if you wanted to widen the funnel, it becomes cost prohibitive.
This first assay also tends to be fairly generic - most often some kind of binding assay, but not always. It will vary depending on what “target” you’re screening against. (In quotes because the definition of a “target” depends on the type of assay.) But the data that comes out of the different variants, and the way you analyze it, tends to all be similar enough that data teams can treat it as one assay.
On the other hand, for any assay after that, this all gets flipped on its head. It’s here that targets start to morph into programs. In particular, for these later assays you start measuring things that are specific to the disease or indication you’re interested in. It’s no longer just about the “target” - it’s about the mechanism by which the target causes the disease. Each of these assays generates a completely new form of data that needs to be analyzed completely differently. So you go from one assay that you’re running a lot of to a lot of assays that you’re running relatively little.
The Long Tail
That first assay faces all the friction and problems with data and metadata that I like to write about on here. But at the end of the day, it’s a single assay, with a more or less consistent process and at a scale where it makes sense to just engineer the heck out of it. Custom apps to enter experiment parameters. Scripts to design and randomize plate maps. Whatever you can think of, the engineering effort will probably pay off in time saved.
And so this is what many data/AI/ML-focused biotechs have done, particularly in the last few years. The data from that primary assay is going straight into training the models that will speed them towards your first IND. So you do everything you can to get that data quickly and consistently. You start to see results. You start to turn this carefully designed crank. The machine starts to hum. You can almost see the benevolent cycle.
And then you hit the long tail.
Suddenly, out of nowhere, all these other assays start to pop up. There are too many to engineer anything out of them. And even if you had the bandwidth, we’re talking about running each assay a few dozen times on maybe a few hundred compounds/peptides/whatevers. The investment will never pay off. Not even close.
Not that you have time for it anyway. You need results fast, and the bench scientists are already on it. As in, they’ve run the assay multiple times before you’ve even heard of it. You may have thought you were in charge before, but there’s not even an illusion of that now.
Making Compromises
So if you can’t engineer the heck out of the long tail, you have two options: 1) You can throw in the towel and hope the bench scientists figure it out or 2) you can find a compromise between a highly engineered machine and chaos.
As you might’ve guessed, I’m a fan of option 2. If you want to call yourself data driven, then that means all the data.
The compromise usually involves making the tools more manual than you’d probably like while formally defining the manual parts to make them more consistent. Think shared Excel files or Google sheets alongside detailed instructions and reviews/check-ins during meetings.
Ultimately you have to decide where on the scale from informal to automated you want each part of each assay process to live. And to make those kinds of compromises, you need to get a deep and objective understanding of all the different moving pieces. Then you can decide what needs to change, and what’s good enough as it is. This is what the Stack Audit is designed to do, and it’s where I’m seeing biotechs get the most benefit.
Of course, if you’re still stuck on over-engineering the heck out of that primary assay, or even earlier in the process, don’t worry. I’ll dig into the problems that come up at those earlier stages in upcoming posts. Stay tuned!