Case study: A Wet Lab to Digital Pipeline (Part 2)

Jul 27, 2022

In my last post, I described a scenario in which we needed a system for consistently collecting (meta)data in the wet lab and getting it in front of a data scientist. In this post, I want to discuss two different high-level approaches to getting there, and explain why the less intuitive one is actually more likely to succeed. Next week, I’ll go into the details of what this less intuitive approach might look like in practice.

Before we start, you may want to re-read my last post for context.

*** But first, a quick note: We're building a community of folks working on data/software teams embedded in larger biotech organizations (as opposed to selling tools/services to other companies). If you like this newsletter, there’s a good chance you’ll fit right in. Come join us on the #embedded-data-teams channel of BitsInBio Slack. ***

To solve the problem from last time, there are essentially three things we need to do:

Agree on what metadata is needed, and a standard format/schema in which it will be collected.
Implement software that can be used to input, manage and retrieve the metadata.
Train/convince the bench scientists to input the data in this agreed-on form, and the data scientists to retrieve it in this form.

There’s a logical progression from each step to the next, so the natural inclination would be to go through the three steps in order. But there’s a problem: Each of these steps can take a month or more, and that whole time the current bad process will be continuing to churn out lousy, inconsistent data.

Even step 1, which may seem easy, is often the hardest of the three. Many bench scientists won’t be able to enumerate all the different corner cases that the schema needs to cover, and if you’re trying to do this for multiple teams and programs, you may never finish.

In fact, this approach feels uncomfortably like the dreaded waterfall method from the dark ages of software engineering. And it comes with all the same issues and risks, particularly that you won’t know if you’re on course for success until you’re months into the process.

Instead, we need a way to quickly iterate through all three steps, building out the specification, the software and the process in parallel. Each step in this process is a Minimum Viable Change - a small tweak to software, process or both, that nudges them in the right direction while keeping the system aligned enough that it won’t break.

Breaking this process up into a series of Minimum Viable Changes is no easy task, but next week I’ll look at some ways we might do it.

Scaling Biotech

Discussion about this post