We’re currently building an AI coding product at Imbue. I think one important and under-valued direction in coding agents is disambiguation with human-in-the-loop.
An annoyance with current AI-assisted coding tools like Copilot or Cursor is the need to review reams of generated code, or tab-complete recklessly and risk losing a mental model of what you’re building. On the other end of the spectrum are fully agentic approaches like Devin, which require tediously writing huge prompts and anticipating all the design choices ahead of time before letting an agent churn overnight. Sometimes, when I write a short prompt, Claude runs away with approach X when I meant a different high-level approach Y—these cases will become increasingly frustrating as test-time scaling with models like o1 continues and average wait times for a wrong result increase (even if overall accuracy improves).
I think these two problems are fundamentally related because creating software is a complex activity; ambiguities and decisions naturally arise as you are implementing and as you get deeper into a project. These decisions are expressed in detail as (generated) code, but whether approving those decisions is left up to you or not, none of the tools on the market today provide adequate affordances for reviewing them, let alone revising them.
It seems really clear to me that something between Cursor and Devin would be more ideal for consistently keeping humans in the driver’s seat. We’re building affordances for exactly that—automatically generating tests, running fine-tuned verifier models, suggesting test commands & fixes, and providing an editable list of the most salient assumptions that the LLM made when generating your code. (Maybe even visually diagramming your data model or program architecture!)
Coding agents should automate tedious work, but more importantly, I believe they should allow us to consistently focus at a higher level of abstraction (“what is the behavior I want in my software?”). That does not necessarily mean hiding details.
Honest reflections from coding with AI so far as a non-engineer:
— Peter Yang (@petergyang) December 1, 2024
It can get you 70% of the way there, but that last 30% is frustrating. It keeps taking one step forward and two steps backward with new bugs, issues, etc.
If I knew how the code worked I could probably fix it…
Having a compressed understanding of how things work under the hood is crucial to avoiding the most common failure mode with AI coding tools today, which is getting stuck at 70%—the LLM can’t reason about how to refactor itself out of bad patterns or just doesn’t “get it,” and the human feels helpless because they need to digest so much previously unseen code to dig themselves out.
Zooming out, this philosophy is also in line with my belief that technology should empower humans, not disenfranchise them. One of my favorite use cases of ChatGPT is its ability to push back the boundary of unknown unknowns for me in targeted directions. Many have lamented the demise of the junior engineer role: will firms still hire for entry-level when a senior engineer with AI tools can replace an army of junior engineers? Hyperbolically speaking, others are concerned about a WALL-E version of the world where we gradually become dumb because technology does everything for us. This does not have to be the default path. Instead of just automating or replacing human work, AI coding agents can and should provide teachable moments, and in doing so, empower more people instead of fewer.