Who's left holding the (lab data) bag?

Mar 29, 2023

One of the three reasons I recently proposed for why biotech ML projects often get stuck at the proof of concept phase (Jupyter notebooks and slides, but little tangible impact) is that the data isn’t available or isn’t consistent enough. Number nine on the list of Reciprocal Development Principles (in my recently-published mini-book which you should download and read today) is that data should be stored in a FAIR system starting as far upstream as possible. That would solve the problem, but this deceptively simple-sounding dictum ends up demanding a disproportionate amount of time and energy for many data teams.

I was recently chatting with Erik Reinertsen about why this matters so much, and why it often falls to data teams, rather than the lab teams who produce the data. I want to share what we settled on, which is that it comes down to who’s left holding the bag when it doesn’t happen.

In most biotech orgs, the bench teams control the overall experiment schedule and workflow, of which the analysis step is one part. If there’s a delay in getting data and metadata into a place and a form that the data team needs, this manifests as a delay in the analysis step. Which means the default assumption is it’s your data team’s fault. Moreover, if the bench team thinks they sent you the data, but it’s not in a place or a form that you can actually use it, the finger often ends up pointing back at you.

On top of all of this, you have the dynamic that data teams are often trying to introduce new and experimental capabilities. So if you can’t get the analysis done in time, the bench team can just do the experiment the old way and move on. The consequences of this for the bench team are fuzzy at best. For you, they’re terminal.

So yes, in an ideal world the bench team should be accountable for getting your team the data and metadata it needs in the form that will make your pipelines run smoothly. In practice, however, the incentives just aren’t there.

On the other hand, you understand the downstream needs and available technology much better than the bench teams. So maybe you can think of it as helping colleagues who need it, on something that you just happen to care about more than they do.

It may seem very far from the scientific impact that you’re looking for, but if you don’t take care of it, you can kiss that scientific impact goodbye.

Scaling Biotech is brought to you by Merelogic. We’ll help you turn your ML prototypes into tangible impact, whether it takes a few small tweaks to how your team operates or larger changes to your tools, infrastructure and projects. If you want to explore what this might look like for your team, send me an email at jesse@merelogic.net

Scaling Biotech

Who's left holding the (lab data) bag?

Discussion about this post