The problem with an AI computational biologist isn't the AI

Nov 13, 2024

*** A quick plug: I’m hosting a second webinar in one week, this time with Sphinx Bio’s Nicholas Larus-Stone. We’re calling it “Unlocking Biotech Data: How Sphinx uses AI to move biotechs faster and make better use of their data.” It’ll be on Thursday, November 21st at 2pm EST/11am PST and you can sign up here. ****

For the next few posts, I want to explore places in the Biotech Reference Stack where generative AI could have an impact. Last week, I sketched out how I plan to frame the discussion. This week, I want to kick things off by looking at a place where many people tend to start: computational biology/data science/analysis. But as I’ll explain below, I don’t think the bottleneck here is the AI.

In the Reference Stack, computational biology/data science/analysis is the second process in the Data Analysis module, titled ‘Transform/Analyze Data”. I think there are two main reasons folks tend to start looking here for applications of LLMs:

First, LLMs like ChatGPT are really good at writing code. There are already successful code-writing tools based on LLMs, so why not computational biology code?

Second, there is generally much more data to be analyzed than there are computational biologists to analyze it. It’s a big bottleneck. Plus, most computational biologists would rather work on more interesting problems and leave the simple analysis to someone else (or something else.)

So it seems like an obvious one, but there’s one trick. According to the framework I suggested last week, we’re looking for places where we can benefit from putting information into a requested form (code, in this case) that can be verified before it’s used.

The trick is who’s going to do the verification.

If it’s the computational biologists who are doing the verification, then that means they’re still in the loop - still a bottleneck. And it’s unclear if specialized tools would be any than the existing LLM-based code tools, as long as it’s computational biologists using them.

Remember that those general-purpose code-writing LLMs are still meant to augment a coder who can sanity check it, fix any small errors, and intervene for bigger errors.

To actually get to a point where AI is fully replacing the computational biologists, we would need a way for the (non-computational) biologists to verify what the LLM is doing. But if they were willing/able to verify the code, they would know enough to write it, so they wouldn’t need the computational biologist in the first place.

So if someone’s going to build an LLM computational biologist, it can’t write general purpose code. It will need to write some kind of intermediary form that’s intuitive enough for biologists to understand, but can be translated into code - something like a low-code/no-code framework that the LLM can write and the biologist can verify.

Now, I’m fairly certain that as soon as you read “low-code/no-code”, at least half of you lost interest. And I get it - that’s the fad from a few years ago that we were all waiting to die.

Plus, boiling down enough computational biology into a low-code/no-code framework to be useful, AND making it intuitive enough for (non-computational) biologists to actually trust themselves is a really really hard problem. But if you believe the framework I suggested last week, then that’s the bottleneck to an AI computational biologist, not the AI itself.

Thanks for reading this week’s Scaling Biotech! I really appreciate your continued support, and I read every comment and reply.

As a reminder, I offer several services to help connect biotech teams with tools, practices and expertise to make their organizations more data driven.

The Biotech Reference Stack is a website designed to help biotech data teams identify the tools they need and figure out how to put them together.
For help navigating the Reference Stack, sign up for a free consultation call to clarify a problem you're facing and identify the best options to evaluate.
Or if you’re building software that makes biotech more data driven, find out how to add your app to the Reference Stack.

Scaling Biotech

The problem with an AI computational biologist isn't the AI

Discussion about this post