Discussion about this post

User's avatar
Harshil Jain's avatar

I think these limitations are very real, especially around instructions and task framing. I’ve been testing LLMs extensively in my own work and have run into many of the same issues you describe, particularly when the task drifts into pure recall or fuzzy categorization.

What’s helped me in some cases is being very explicit about the end goal and constraints, then reframing the task so the model is mostly translating structured inputs rather than inventing structure on its own. I’ve also experimented with letting the model run more independently after that framing, then treating its output as something to audit rather than accept.

Your breakdown of recall vs translation vs categorization really resonated, and I’m still working through similar challenges on the categorization side. I’d love to compare notes if you’re open to it, feels like we’re circling the same problems from slightly different angles.

1 more comment...

No posts

Ready for more?