This would probably be a great post for me to write about all the announcements that came out of JPM, but I’ve spent the last couple of weeks deep in the weeds of figuring out what LLMs can and can’t do to help me build the Biopharma AI Landscape. So I wanted to quickly jot down some thoughts about that, and I’ll hopefully have something interesting to say about JPM news in the next few posts.
I think these limitations are very real, especially around instructions and task framing. I’ve been testing LLMs extensively in my own work and have run into many of the same issues you describe, particularly when the task drifts into pure recall or fuzzy categorization.
What’s helped me in some cases is being very explicit about the end goal and constraints, then reframing the task so the model is mostly translating structured inputs rather than inventing structure on its own. I’ve also experimented with letting the model run more independently after that framing, then treating its output as something to audit rather than accept.
Your breakdown of recall vs translation vs categorization really resonated, and I’m still working through similar challenges on the categorization side. I’d love to compare notes if you’re open to it, feels like we’re circling the same problems from slightly different angles.
I think these limitations are very real, especially around instructions and task framing. I’ve been testing LLMs extensively in my own work and have run into many of the same issues you describe, particularly when the task drifts into pure recall or fuzzy categorization.
What’s helped me in some cases is being very explicit about the end goal and constraints, then reframing the task so the model is mostly translating structured inputs rather than inventing structure on its own. I’ve also experimented with letting the model run more independently after that framing, then treating its output as something to audit rather than accept.
Your breakdown of recall vs translation vs categorization really resonated, and I’m still working through similar challenges on the categorization side. I’d love to compare notes if you’re open to it, feels like we’re circling the same problems from slightly different angles.
Hi, Harshil. I'd love to compare notes and hear about what you've been working on. Send me an email, and let's find a time: jesse@merelogic.net