LLMs should be orchestrators, not messaging busses

Jun 17, 2026

Ok, I promise I'll get back to writing about foundation models soon, but I fell down this MCP rabbit hole and I have a few more weeks of ideas I need to explore before I can move on.

For decades, the dream in data engineering/informatics circles has been to integrate all the different tools and software that a team uses into a single system that seamlessly and automatically transfers data across systems, replacing manual copy/paste spreadsheet analysis with reliable processes that leave an audit trail. REST APIs were supposed to make this possible, but in practice, integrations always turned out to be too complex and time consuming to make sense outside of larger teams/organizations with highly standardized processes. Most API integrations were just too brittle for the kind of exploratory work that defines early discovery.

Two weeks ago, I wrote about how MCP servers present an opportunity to fix all this by turning the LLM platform (such as Claude Code or one the growing number of alternatives) into a messaging bus that coordinates between all these different tools with significantly more flexibility at a negligible technical cost. Then last week, I posted about some of the issues that could arise from using unreliable, hallucinatory LLMs to handle data that drives incredibly expensive decisions.

After thinking about it for another week, I’ve started forming an idea about how we could address this, and turn the software that exists for biology today into an ecosystem of LLM-connected tools that safely enable the dream. My hypothesis is that the LLM platform in this future can’t be a messaging bus exactly. It has to become an orchestrator that coordinates direct, deterministic interactions between the other components.

In the rest of this post, I want to explain what I mean by this.

The Vision

Imagine, if you will, your experiment has just completed and the data from the last instrument has been written to whatever system it goes into. You type into your LLM a brief description of the analysis you want done, or the question you want to answer and set it off:

First, it uses an MCP connection to your ELN to find the details of how the experiment was done. Then it uses another MCP integration to find the readout data that was just written. It uses a third MCP server to kick off the primary analysis that will boil the gigabytes of data down to a single table. Then it uses a fourth MCP connection to the knowledge graph to look up contextual information about the genes and proteins involved in the experiment. A call to a fifth MCP server merges everything into a form that’s ready for final interpretation.

In practice, this would probably be multiple prompts, with a big gap in the middle while primary analysis is running, but the steps would be the same. The important part is how the LLM coordinates between the different systems via their MCP connections.

Now, you could absolutely automate this kind of workflow without an LLM, using just REST APIs (assuming the systems all have reasonable APIs, which is sometimes true.) However, to make it happen, someone with a very specific set of technical skills would need to predict more or less exactly what the workflow would look like, then spend a few days, if not weeks or months getting it to work. Even if you have someone with that particular set of skills on your team, that’s significantly more work than… typing a vague description of what you want into an LLM and waiting a few minutes.

The Problem

The problem that I started exploring last week is that if all these systems only talk to the LLM through their MCP servers, then all the data that passes between the systems has to be encoded then decoded by the LLM. That’s a problem for a lot of reasons: It’s expensive (how many tokens is your 20GB fastq file?) and slow, and it pretty much guarantees that you’ll have errors.

So the hub-and-spoke model in which the LLM is the messaging bus isn’t going to work. Data should only pass between and through deterministic systems that keep audit trails, etc. LLMs can generate things like code and parameters, sure, but only things that a human could reasonably review.

Instead of pure hub-and-spoke, we need direct, deterministic connections between the spokes, orchestrated by the LLM. And that probably means old fashioned APIs.

APIs Revisited

So sure, it sounds like we’re back to where we started - we still need to connect all our systems to each other with deterministic APIs. But what makes REST APIs brittle is that they’re designed to work with users who can’t translate what they want into technical terms. When an engineer has to do that translation in advance, you’re stuck with whatever they thought the users would need at the beginning.

In other words, the brittle nature of a REST APIs is a requirement of the context in which they’ve always been used, not a technical limitation of deterministic APIs. In a context where users can do that technical translation on the fly, APIs can be designed to be flexible. In fact, flexible API standards such as GraphQL have existed for years, but never had widespread adoption because of this translation issue.

Today, LLMs can do the technical translation for users almost instantly.

Hub and Spokes and APIs

So, here’s what I’m proposing: MCP servers will very soon give us lots of spokes that can be connected to whatever LLM hub you want. Almost every bio software company I’ve talked to in the last few weeks is building one, or has already built one. The rest have it on their roadmap.

Each of these individual spokes can decide on its own how it will communicate with the hub. The companies I’ve talked to are scrambling to figure that out, and that’s fine because LLMs are generally able to work with whatever they need to.

What’s missing is a way for the hub to tell the spokes how to interact with each other: Tell the workflow runner to pull the sample sheet from the ELN. Tell the visualization tool to pull the contextual information from the knowledge graph. And so on.

Doing this for a single pair of systems probably isn’t too bad. But if this dream is going to be a reality for everyone in biopharma, we need to be able to do it for every pair of systems (or close to every.)

In other words, we need the industry to agree to some kind of standard for what these appropriately flexible APIs between them should be, and how the LLMs should communicate instructions for using those connections. (GraphQL is one option, but not the only one.)

On Standards and Adoption

Getting industries to adopt standards like this has always been incredibly difficult and many (most?) attempts at it have failed.

They fail for many reasons, from politics and perverse incentives to the technical cost of adoption. But often, they fail because the customers who could put the most pressure on developers just don’t see the potential value. If no one’s asking for it, why would software developers do it?

This is probably the biggest reason there was never (as far as I know) an attempt to build an interoperability standard for REST APIs in early discovery biopharma: integrations were always so high-cost and low-value that no one ever asked for it.

In this case, however, there’s a real chance that the cost will be low enough and the value clear enough that customers will start asking for this kind of standardization. There will still be the politics, and potentially the perverse incentives, but user demand can do a lot to overcome those.

Either way, that’s a topic for the future. I think I’ve accomplished my goal of explaining why LLMs should become orchestrators instead of messaging busses.

If you have opinions about this and want to talk about what might be involved in making getting the industry to define and adopt a standard like this, I’d love to hear from you - leave a comment or send me an email at jesse@merelogic.net.

Thanks for reading! I help biopharma teams design, implement and maintain Data Operations Plans that extend the care and consistency they enforce inside the lab to how they handle data outside the lab. To learn more, send an email to data-ops@merelogic.net.

Scaling Biotech

Discussion about this post

Ready for more?