In this ongoing series of development principles for Biotech data teams, everything so far has focused on high level ideas like setting objectives and communicating with other teams. This week’s post and the next few are going to get a bit more technical, starting with how we collect and manage data:
Information should be captured in a FAIR system as early as possible and anything derived from it should remain in a FAIR system.
If you’re in the target audience for this newsletter, you’ve probably heard of the FAIR data standards, which stands for Findable, Accessible, Interoperable and Reusable. Even if you haven’t heard of them, they probably reflect how you’re already thinking about data. That’s because the FAIR standards aren’t meant to change how you do things; they’re meant to help you explain it to stakeholders.
So instead of discussing FAIR data today, I want to focus on two other aspects of this week’s principle: 1) Collecting data as early (or upstream) as possible and 2) keeping the data inside the system once it’s collected.
Most wet lab teams are used to working with Excel sheets saved on laptop hard drives, and collecting everything at the very end. Many ELNs and LIMS reinforce this habit. When they need to analyze data in a shared system, they pull it out, do what they need to in Excel, then potentially upload it back. But for your data team to work effectively, you need the data as soon as possible, in a form you can trust. Which means you need the wet lab teams to capture decisions and analysis as it happens (before they have time to forget) and in a consistent form.
I’ve written previously about how Excel can potentially be for this, particularly as a prototyping tool. But to make this work (at least in the short term) you need deliberate processes and rules for how the data is collected and shared.
As soon as a scientist decides to do an experiment, there should be a way for them to begin recording and updating the experiment design. As they do the experiment, they should be able to record what changed. And any data that their instruments generate should be transferred to a central location with as little manual intervention as physically possible.
If you’ve followed the last few principles I wrote about, you should have a healthy, mutual relationship with the wet lab teams. Now you can put that relationship to use by helping them to establish the processes and adopt the tools that will allow them to capture FAIR data as far upstream as possible.
My favorite is
"""R1.2. (Meta)data are associated with detailed provenance"""
Provenance is key for both science and for mfg.
-en z.