Metadata is People

Feb 15, 2023

Ok, I'm just going to say it: wrangling lab metadata is harder than wrangling lab data.

That’s right, you heard me. No matter how hard you might think it is to get data from microscopes and sequencers and qPCR machines to a place where you can use it, it’s nothing compared to getting clean metadata. It’s not even in the same ballpark.

In fact, it’s quite literally a different kind of problem.

To be clear about what I mean, let’s start with definitions, which are a bit different from how data and metadata work outside biotech: By “data” I mean the readouts that come from instruments. By “metadata”, I mean the records of experiment design and implementation that are collected long before the readouts even start. Your analysis will mostly use the data. But to know how to set it up and how to interpret the results, you need the metadata.

Wrangling data is mostly a technical problem: You often need to deal with large volumes of it, and you sometimes have to deal with instruments that are (intentionally or unintentionally) designed so that getting at that data is unnecessarily hard. (I once heard a particularly frustrating instrument referred to as a glorified toaster.) But ultimately these are all technical problems, which means that they’re fun compared to what’s coming next.

Metadata, in the vast majority of cases, is collected by people, which means that wrangling it is a people problem. Smart people. Well meaning people. People that you and I are proud to call our colleagues. But they’re still people. Which means that they have many priorities to juggle, sometimes misunderstand what’s expected of them, often make mistakes, and always take time to learn new things.

Metadata is people, so solving your metadata problems means wrangling people, not just code. And that's ten times harder.

Scaling Biotech

Discussion about this post