Hit dashboards may feel easy but they're not

May 01, 2024

This week, continuing with the theme of problems that many biotechs face at specific points in their evolution, I’ve got one that feels deceptively easy and therefore often catches teams off guard. Whether we’re talking about small molecules, biologics or something else, most biotechs will eventually end up with a collection of things that they want to screen with different assays, then pick some to move to the next stage. In drug discovery, these stages are often called hits or leads or something along those lines. So I’m going to call the place where you collect and review this data a hit dashboard.

On its surface, this shouldn’t be that hard. A hit dashboard is just a table, right? A row for each molecule/sequence/etc. and a column for each assay/readout. There are high-tech and low-tech ways to create and share/display a table like this. You’ll probably want it to be sortable. Maybe some images of molecular structures. But at the end of the day, it shouldn’t be that hard.

And yet, it still manages to trip up many biotech data teams.

Of course, the hard part isn’t making the table. The hard part is getting the data into the table. But even there it can be deceptively difficult. Tracking down a single dataset may not be that hard, but the whole point is to look at multiple assays, and thus multiple datasets.

These different datasets were collected at different times from different kinds of experiments, and most importantly by different people. Each person was focused on collecting data to answer a single specific question. Now you want to use their data to help answer a (slightly) different question. And that’s where it gets tricky.

The first layer of the problem is where they put the data. If you’re lucky it’s in a shared drive somewhere. If you’re slightly less lucky, it’s on their laptop and they remember which folder it’s in. If you’re even less lucky, you may have to copy/paste from a table in a slide deck. So if you want to automate this hit dashboard, you’re going to have to somehow get the data into a consistent place.

But that turns out to be the easy part. We haven’t even dug into data formats yet. Some of the results are just a single number. Some of them are two or more numbers that are both required for interpretation. Almost all of these numbers need to be normalized based on a positive and/or negative control, which may or may not be interpretable across batches. And this is even before we get into concentration-response curves, outliers and IC50/EC50s.

To be clear, these are all issues that can be overcome and that are regularly overcome one way or another. The problem is that it’s just complicated enough to require careful thought each time you add a new column to the “just a table”. If you have a more high-tech solution then the data scientist or engineer who knows how to do it becomes an unwitting gatekeeper. And if you try to replace them with a self-serve option, you run the risk of re-inventing Spotfire/Tableau/Excel.

So the much more common solution is that the bench scientist who needs that information in a hurry ends up manually copy/pasting it into Excel. (At a few startups I’ve talked to, it’s the CEO that does this.)

This gets the job done, which is why they keep doing it. But even if you overlook the amount of time it takes, the number of opportunities for error is astounding. It can’t be automated so it will often be out of date. And, of course, there’s no way to verify the work, let alone any sense of reproducibility.

So yeah, the manual approach works. And there doesn’t seem to be an easy alternative. But if we’re going to build data driven biotechs, we need to find a better way.

Sjoerd de Haan

May 21, 2024

"If we’re going to build data driven biotechs, we need to find a better way."

What better ways did you see or do you envision?

Expand full comment

Michael Chimenti

May 9, 2024

What about some combination of a "data lake" (with version control) and an open source dashboard application, like Shiny (https://rstudio.github.io/shinydashboard/) for building a real-time, updated dashboard?

1 reply by Jesse Johnson

2 more comments...

Scaling Biotech

Discussion about this post