*** Before we get started, a quick plug: My webinar with Kaleidoscope’s Bogdan Knezevic is tomorrow, Thursday, October 24th at 2pm EST. We’re calling it Billion Dollar Decisions: How Kaleidoscope is Shaving Months Off Drug Development Timelines. It’s going to be really good and you can sign up here. ***
In the course of designing the Biotech Reference Stack, I had to make a decision about how objective or opinionated to be. There was bound to be some amount of bias to where the details are, based on what I’ve seen and know. But there were also a couple of places where I added more intentional bias for one of my soap box topics: Encouraging teams to be more deliberate in how they manage metadata.
One of these is the second-to-last component within Data Generation in the Records column: “Shared Staging Location for Plate Maps, Sample Sheets, etc.” This week, I want to explain why it’s there.
Most of the clients I’ve worked with want to have this metadata go directly from the place where it’s generated (ELN, LIMS, spreadsheet, etc.) to the dataset where it will be used. This makes sense as far as keeping things simple and making as few copies of the metadata as possible. But there’s one key reason I think it’s worth the extra step of moving it to an intermediate staging location:
Formalizing the handoff.
Typically, it’s a bench scientist who records this metadata, and typically they do it before the associated readout data is generated. Even if they’re updating the plate map or sample sheet as they’re creating the plates and samples, they should be done by the time the plate/flow cell/etc. goes into the instrument that will generate the readout. So what do they do with it then?
Well, if the readout is going to be analyzed by a different person - a computational biologist, data scientist, etc. - then they should ideally put the metadata in a place where this other person can find it as soon as the readout is complete.
But in even a small biotech lab, there are likely to be multiple bench teams with different ways of organizing their metadata. Keeping all these folks up to date with the organizational scheme for the datasets would be a huge effort for both them and the data team. Even if the data team trains every bench scientist on where to put it, they’ll still be worried about messing up. Similarly, the analyst is unlikely to know where to find the metadata in each bench team’s organizational scheme.
There’s a similar issue with the data itself, but this is usually more controlled because the instruments tend to put the readouts in a consistent place. So it’s much more common for data teams to spend days tracking down the metadata.
That’s why the Reference Stack deliberately calls out having a metadata staging area. This allows bench teams to organize their plate maps and sample sheets however they want while they’re in the lab, then copy them to a place with a simplified and consistent organizational scheme where the next team will be able to find it. And this should be done as soon as the metadata is complete, so everything will be ready to go when the readout is done.
There are still lots of details to figure out about the format and schema of this metadata. But the first step is being able to find it.
Thanks for reading this week’s Scaling Biotech! I really appreciate your continued support, and I read every comment and reply.
As a reminder, I offer several services to help connect biotech teams with tools, practices and expertise to make their organizations more data driven.
The Biotech Reference Stack is a website designed to help biotech data teams identify the tools they need and figure out how to put them together.
For help navigating the Reference Stack, sign up for a free consultation call to clarify a problem you're facing and identify the best options to evaluate.
Or if you’re building software that makes biotech more data driven, find out how to add your app to the Reference Stack.
Seriously good advice! Centralize!