Shannon entropy and low code frameworks

May 31, 2023

To connect a mental model of your lab’s digital twin to an implementation, you need to communicate the information in the model to a computer. I like to think about this problem in terms of the approach to information theory that Claude Shannon developed around entropy and lossless encoding. (Or at least, the vague understanding I have of this from a talk I once attended in grad school.) One of the core ideas, Shannon’s Source Coding Theorem, says that if you know something about the nature of the information that will be transmitted, you can develop a language/encoding that is very efficient for that kind of information with the trade-off that it will be inefficient for other kinds of information.

The simplest example is that if you have a binary signal that’s mostly 0s with occasional 1s, it’s more efficient to send a message with the number of 0s in each run between 1s, rather than transmit each 0 or 1. Shannon showed that this general idea applies to a much broader range of contexts. And the context where I see it popping up over and over is in programming languages and frameworks, thought of as ways of communicating algorithms to computers.

For example, because SQL is designed around a limited set of functionality, you can encode algorithms within these limits in a very compact way, making it both faster to write and easier to read/comprehend. If you define these same transformations in a more flexible form, such as pandas, it typically takes more characters and is often harder for a person to read. And even pandas (which can be thought of as a domain specific language) is both limited and more compact compared to writing that same algorithm in pure Python. On the other hand, if you try to use SQL for algorithms outside this limited target functionality, in particular if you have to write user-defined functions, it quickly gets messier than the equivalent pandas code.

From this perspective, the build vs buy dilemma is due to the fact that neither option provides a language that is optimized for the right scope of functionality. Conventional programming languages (build) are optimized for an overly broad scope of functionality including everything from building a social network to doing statistics homework. ELN and LIMS systems (buy) are optimized for a narrow scope of functionality based on what labs needed a few decades ago.

In between these two is the recently-expanding category of low code development frameworks. The way they become “low code” is by reducing the scope of their target functionality so that you can communicate what you want in less code. In fact, they reduce it so much that most of the code becomes more like configuration, which is why it’s sometime called “no code”. But the scope is still much broader than a pre-built ELN or LIMS. (And even broader functionality is usually available through a back door in which you just write code.)

Of course, last week I expressed my concerns that low code often encourages users who don’t understand technical debt to design software. And I still stand by that. In fact, the thing I find most frustrating about low-code frameworks is that they’re very UI-heavy, which is great for novice users but actually makes things slower for the folks who like to learn keyboard shortcuts. (And these are the folks who tend to understand technical debt on a visceral level.) So I think what we really need are coder-friendly frameworks optimized for the right scope of functionality you would need to build a digital twin of the lab. This seems to be the implicit goal of many of the ad hoc approaches discussed in Kaleidoscope’s recent blog post on how they’ve seen biotech teams building tools. (Which you should read.) And it’s the goal of something I’ve been working on that I’ll talk about next time.

Scaling Biotech is brought to you by Merelogic. We design data models and infrastructure that help early-stage biotech startups turn their AI/ML prototypes into tangible impact. To learn more, send me an email at jesse@merelogic.net

KEVIN CRAMER

Jun 1, 2023

A proper low-code framework is not UI heavy. if it is then its not really a low-code framework. If it was then you could also configure the UI based on what you want each user role to be able to do and see in terms of features, functions and data. A proper platform has this and much more in it, and this is why "proper" low-code science aware platform like the Sapio Sciences platform is the only way to go.

Expand full comment

Scaling Biotech

Shannon entropy and low code frameworks

Discussion about this post