This is the fourth part of my series about development principles for embedded biotech data teams. The theme of the first three was that data teams should plan their work around a broader scope than just the technical results that we often default to. This week is about the hard decisions that we need to begin making once we understand this broader scope:
The simplest technical solution that will reliably meet scientific objectives should be chosen over complex or novel approaches with marginal improvements.
The failure mode this is addressing is when we choose a more interesting technical solution to a problem, maybe because it’s the “industry standard” or because it uses a technology/library/language that we want to learn, over a less interesting solution that would get the job done faster.
The problem is that a lot of these “industry standard” solutions are built for complexity and scale that are faced by the industry leaders in the tech sector, but not necessarily by your typical biotech lab. Think Kubernetes or Redshift. Your application doesn’t need to scale to billions of users if it’s only for your wet lab team. Your pipeline doesn’t need to be able to handle petabytes of data if your datasets are all measured in gigs.
If you do need to handle those sorts of technical requirements, or legitimately expect to in the future, then you should absolutely design for that scale. But often the tools that are designed for the largest possible scale come with overhead and complexity that aren’t worth it otherwise. A relational database is much easier to use than a distributed data warehouse if you only have millions of rows. A micro-services framework is going to be more headache than it’s worth for a single server and a few dozen users.
At the end of the day, the goal is to get to a solution that is good enough to drive the science, and get there soon enough to have an impact before the wet lab moves on to something else. If the cutting-edge technical solution is going to take twice as long to implement and three times as much effort to maintain, then it’s not going to get you there.
“Industry standard” is sometimes determined by industry marketing rather user ROI. The Internet is a good example. The "industry standard" was a formal proposal was a seven layer stack developed by the Telecom industry and ISO/ANSI. TCP/IP was created bottom up via RFC by the IETF. TCP/IP was simpler, faster, and much cheaper to implement and deploy.
Standards can be 'de jure' or 'de facto'. The best standards get tried out. Operations groups in industry have a very different perspective. a) if it works don't break it. b) tweaking is better than replacement c) if a new shiny new technology is that wonderful, let the CFO pay for it. d) always be able to roll back to tried and true when disaster strikes.
Premature adoption can be fatal to the bottom line.