“Perfect” AI Code Won’t Fix Your Legacy Stack

There’s a lot of focus right now on whether AI can write “perfect” code and what this will mean. As models will get better and context windows get bigger, will code quality improve? Will we soon reach a point where AI produces production-ready software on the first try?

If the answer is "Yes, AI can get it right first time", we should be focused on giving AI the perfect context, all our rules and standards, doing upfront planning on requirements, specifications and then letting a world-class agent output perfect code.

However, whilst context and planning are important, this is not enough. Even if AI outputs perfect code, the rest of your codebase won’t suddenly become perfect along with it.

Software lives in a dynamic ecosystem; code ages, dependencies drift, context changes. Something that looks great today can become outdated, insecure, or no longer fit for purpose a few months from now.

The productivity paradox is real

There’s a growing body of evidence that experienced developers aren’t always faster with AI tools. In some cases, they’re actually slower. I hear this directly in conversations with teams every week.

What I see is a big split. Small, greenfield teams on modern stacks can get incredible speedups. Two or three developers. Node, Python, React. Clean slate. AI feels magical there. But that’s not most of the world.

Most developers I talk to are working in large, long-lived codebases. Legacy systems. Internal libraries. Old frameworks. Constraints you can’t just rip out overnight. LLMs aren’t trained on that context, and they don’t magically absorb decades of architectural decisions.

So what happens in practice is this. AI generates code quickly. Humans spend their time reviewing it. Fixing edge cases. Correcting assumptions. Undoing drift. Flow gets broken constantly. Prompt. Wait. Review. Prompt again. Wait again. I hear this frustration over and over. One developer put it to me like this:

“I used to be a craftsman whittling away at a piece of wood. Now I feel like a factory manager at IKEA, shipping low-quality chairs.”

Faster, maybe. But far less satisfying. That’s not the productivity revolution people were promised.

Planning helps. It doesn’t solve everything

A common reaction to this is to say, “We just need better planning.” And yes, planning matters a lot.

Clear requirements. Explicit constraints. Better upfront context all give AI a better chance of doing something sensible. But planning alone doesn’t fix the deeper issue, because software doesn’t stop evolving once a feature ships.

Requirements change. Teams learn new things. Dependencies go out of date. None of that stops just because you wrote a good plan. That’s where most AI tools still fall short. They treat development like a one-shot interaction instead of an ongoing process.

Maintenance is the work we keep ignoring

This is the part of software engineering we all know but try not to think about. Maintenance never ends. Libraries need upgrading. Frameworks deprecate APIs. Performance assumptions stop holding. Code that once made sense slowly turns into technical debt. Nobody loves this work. Nobody wakes up excited to upgrade Java or migrate Python 2 to Python 3. And yet, this is where huge amounts of engineering time still go.

Ironically, this is exactly the kind of work AI should be great at. Not replacing engineers and certainly not taking over creative problem-solving. But continuously improving, refactoring, and maintaining the systems we already have.

Right now, it’s often the opposite. AI does the fun part, and humans are left cleaning up after it. That’s backwards.

Trust, flow, and learning still matter

There’s another thing I worry about that doesn’t get talked about enough. Learning.

Too often, using AI today feels like being in the back seat of a Ferrari with broken steering. You’re moving fast, but you don’t really know where you’re going, and you’re not necessarily getting better along the way.

That’s a real problem, especially for junior developers, but it affects seniors too. Teams require output, understanding and shared context to give them the confidence that the system is behaving the way they expect.

Trust is earned slowly, flow is fragile and learning doesn’t happen when humans are reduced to passive reviewers.

The real work comes after the first draft

Instead of asking whether AI can get it right the first time, I think we should be asking something else. How do we build systems that assume AI will get things wrong, and then improve them safely over time?

That means planning that clarifies intent and trade-offs. It means execution that supports iteration without chaos, validation that builds confidence instead of fear, and continuous improvement that reduces drift rather than amplifying it.

AI isn’t a replacement for engineering judgment, but rather a multiplier and like any multiplier, it will magnify whatever systems you put around it. If we want AI to actually help teams ship better software, we need to stop treating code generation as the finish line. The real work starts after the first draft.

I see the next phase of AI engineering becoming viable at scale by thinking about the system around the AI. This means thinking about how you scan and understand an existing codebase; how you define rules and intent; how you plan, execute, validate, and then keep improving things as the code inevitably changes over time.