Why do scientists throw away data?
I’m an even tempered person, so I never let on, but more times than I can count I’ve been shocked to discover lab processes that throw away data that could easily be stored and organized. The most recent instance was a process that exported only two or three features from image analysis software that produced hundreds, and aggregated the per-cell statistics down to to per-well values.
Of course, the bench scientists had good reasons to do this: When you’re analyzing data in Excel, hundreds of columns and millions of rows are a problem. I have defended the use of Excel in specific circumstances, but this is not one of them.
With a different toolbox, you wouldn’t notice a difference between two columns or two hundred. And as more and more such toolboxes become easier to use (even for bench scientists) this question becomes even more aggravating. What’s stopping them?
If you’ve been following this newsletter, you can probably guess my answer: Shared Mental Models. This issues hinges on two key parts of a shared task model: 1) The goals of the project and 2) A list of available tools and the conditions under which they’re most suitable.
For a data scientist, a major goal of any data collection process is to create a dataset that can become a long-term resources. For a bench scientist, the primary goal is to answer an immediate question. And for a bench scientist that has done most of their analysis in Excel, their list of available tools is probably quite narrow.
You can get scientists to stop throwing away data in this one instance by handing them a better process and toolbox. But if you want them start from a better process the next time they have an opportunity to throw away data, and the time after that, the only way is to build a better Shared Mental Model.