Spec-Driven Development and the Real Work of Coding

Pick almost any thread about AI and software engineering, and you’ll find the same fight. Will AI replace developers, or won’t it? Tectonic shift, or parlor trick? Is it coming for the whole profession, or only the boilerplate?

I’ve read a lot of these. The people are sharp, the arguments are well-made, and the data flies around with conviction. I’m not going to tell you their conclusions are wrong. I’m going to question where they start, because everyone is arguing about whether a machine can do a job that almost nobody bothers to define. Get the starting point wrong, and every conclusion downstream inherits the error.

This isn’t a skeptic’s setup. AI can write code. On a clean, well-bounded problem, today’s models perform about as well as a strong junior engineer, sometimes better. I code with AI every day; I practically live in the prompt. That much is settled.

What you were doing all along

Let me start where the others don’t. Most people, plenty of engineers included, believe coding is about writing down an idea in a language the machine understands. Type the thought into the computer. Done.

It was never that simple. That definition of coding went unquestioned for decades, mostly because nobody had a reason to question it. Coding was simply what software engineers did. The word defined itself. Almost no one looked closely at what the act of coding involves, and we still don’t.

But watch what happens as you write code: you make a steady stream of small decisions no one specified in advance. Should a field in the database be allowed to be empty? What happens when a list is returned from a function call with nothing in it? How many times do we retry on error, and how long do we wait between tries? Is a particular edge case impossible, or just rare? Requirements rarely cover this level of detail. You quickly decide, usually without registering that you were deciding, and move on.

Making those decisions is the part of the job no one ever credited you for.

This is no hot take. It’s one of the oldest ideas in the field — that the real work of coding is the deciding, not the typing — and we mostly forgot it. In 1985, Peter Naur (the “N” in Backus-Naur Form, a Turing Award winner) wrote an essay called Programming as Theory Building. His claim: a program is not its source code. A program is a theory living in the minds of the people who built it: a working understanding of how the system behaves, why it behaves that way, and how it bends when reality pushes on it. The code is just that theory written down. Partial, and lossy.

Naur’s sharpest observation is the one every engineering leader has run up against: the theory can’t be reconstructed from artifacts alone. Hand a fresh team the full source and all the documentation, and they still won’t have the theory, because most of it was never written down. It lives in the decisions.

Fred Brooks said it from another angle a year later, in No Silver Bullet. The hardest single part of building software, he wrote, is “deciding precisely what to build.” He split software difficulty in two. Accidental complexity is the awkward syntax and tooling we impose on ourselves. Essential complexity is the irreducible work of figuring out what the system must do. Tools devour accidental complexity. AI is the best accidental-complexity killer we’ve ever built. But the essential part, deciding what to build decision by decision, stays exactly as hard as it always was.

And that essential, easily-missed work is precisely what people leave out when they define coding. The whole AI debate has been measuring the typing and ignoring the deciding.

Senior, junior, or AI?

What separates a senior engineer from a junior one? It isn’t raw code-writing. A good junior writes clean, working code. The difference shows up in the decisions behind each line. Every decision rests on two things: what you actually know about the project, and the gaps you fill when that knowledge runs out. Those gap fills are the assumptions — made on the fly and tuned to the specific project. Behind a given line, the senior fills the gap and is usually right. A hundred past projects taught them which corner is safe to cut and which one pages you at 3 AM. The junior fills the same gap with something shakier.

And the part that matters most: neither of them usually notices making the assumption at all. It’s just baked into how they decide. Same command of the language. Different judgment, and most of it invisible, even to the person exercising it.

Now look at AI through that lens. Trained on an enormous amount of public code, it writes like a capable junior. And exactly like a junior, it doesn’t stop to ask before resolving what nobody specified. It makes the call automatically, the instant the work demands it. It exercises judgment constantly, with every line. What it lacks is context: it wasn’t in the meetings, and it doesn’t know what the business is really trying to do. That gap is why it asks even fewer questions than a junior, and catches less.

The gap is typically wider on brownfield work. On a mature codebase, the human carries years of history the model lacks. But even on a brand-new project, the human still holds everything that never made it into writing: the goal behind the project, the constraint someone mentioned in a hallway, the reason the particular approach was abandoned. Greenfield or brownfield, the model decides with a fraction of what the human knows without even thinking about it.

So the real worry was never “can AI exercise judgment?” It’s “can you trust the judgment it’s already exercising?” Today, usually not. Not because the model isn’t smart enough, but because no one gave it enough of what the decision depends on: the constraints, the history, the ground truth, the reasons. Feed it more of those, and the decisions get better. A smarter model raises the floor, but it can’t read a constraint nobody wrote down. Drop the sharpest engineer you know into your codebase with none of its history, and they’ll guess as blindly as the model does. Weak reasoning was never the problem. The gap is knowledge: what the model was told before it had to make the call.

The dream

For most of software’s history, the industry was busy with a completely different problem: typing was too slow. And it kept solving that problem, with real success.

It started decades ago with autocomplete in the editor, the environment finishing the names of variables or functions as you typed. It got steadily smarter, until it was completing whole lines, then whole blocks. Useful. Incremental. Nobody’s job changed.

Then came vibe coding. You describe what you want in plain language, the model writes a chunk, you steer, you describe the next chunk. A real step up, but you stayed in the loop on every move, and the gains, though genuine, didn’t transform anyone’s output. The steering could get strange, too: you’d watch the model drift the wrong way and pull it back. Miss that moment, and a bad assumption slips through and erodes the codebase over the following weeks — one of the most common complaints you’ll hear from people who tried it. And even when you steered the model well, what actually worked was telling it why — the context it lacked. That almost never got written down; the context died with the session, and next time you have to supply the rationale from scratch.

Then agents learned to run on their own — for hours, planning, writing, testing, and opening a finished pull request. You point them at a goal and walk away. And the way you point them is a specification. Which is how we got to the term everyone’s using now: spec-driven development.

This dream is old, and it kept failing in the same place. In the 1980s, CASE tools promised to generate software straight from diagrams and specifications. Corporations and governments poured in billions, and by 1993 the U.S. Government Accountability Office reported “little evidence” they improved quality or productivity at all. In the 2000s, Model-Driven Architecture made the same promise with UML models. The days of hand-writing code were supposedly numbered. It produced database schemas and class skeletons cleanly enough. It never produced the part that decides what the application does.

Notice the pattern: the spec could capture the structure — a sliver of the judgment, never the why beneath it. Plenty for people to agree on what to build. Never enough for a machine to build it.

So why would anyone revive a dream that failed this reliably?

Two problems were mainly responsible for the failures of specification-first approaches. First, it was brutally hard to write a spec complete enough to drive the build. Second, someone had to read, interpret, and execute it — and no human reads a thousand-page spec without skimming.

AI ends that second problem outright: it doesn’t skim, and it’ll execute a spec at a level of detail no person would sit still for. That’s what gave the old idea its second life. The first problem, writing a complete and consistent spec, didn’t go anywhere. Hold that thought.

The new code

Take the idea from its most credible advocate. In mid-2025, OpenAI’s Sean Grove gave it a stage, a talk called The New Code, and put the thesis plainly: the specification, not the code, is the artifact that matters. Feed a spec to the model, and the code comes out the other end. Code is a lossy projection of the spec, the way a decompiled binary hands back its logic with every reason stripped away. “If you can communicate effectively,” Grove told the room, “you can program.” He’s right — as far as it goes.

And to his credit, Grove never pretends the spec writes itself. He calls writing it the new scarce skill, the one that now makes you the most valuable programmer in the room. The skill is only scarce because it’s hard; it tracks back to the unsolved problem we flagged earlier — writing a spec complete enough to build from.

What Grove didn’t tell you is how to actually write that spec. He only voiced a hope — that someday an IDE, an “integrated thought clarifier,” might pull the endless ambiguities out of you. That tool doesn’t exist yet. Without it, is spec-first just a fantasy? To answer that, think of what a spec is really made of: decisions.

We know more than we can tell

Look at how those decisions arrive. They don’t line up politely when you sit down to write the spec. They surface while you build, the moment you get cornered. “Wait — what should happen if both of these flags are set at once?” It never crossed your mind, or anyone’s, until the work forced it out of you. And when it surfaces, you do one of three things: answer it if you know, push it up to the PM or the architect if you don’t, or, most often and most quietly, make an assumption and keep moving.

There’s nothing sloppy about it. That’s just how expert knowledge works. In 1966 the philosopher Michael Polanyi put it in a line that has never been bettered: we know more than we can tell. The point isn’t that the knowledge can’t be written down. If it couldn’t, no spec would ever work, and they plainly do. It’s that you can’t summon it cold, before the situation calls it up. Human memory is associative: the right consideration surfaces when something in the work brushes against it, not when you sit in front of a blank page asking, “what am I forgetting?”

And a good share of a senior’s judgment fires below the level of words. Pure intuition, no rule attached. Turning that into an explicit line in a spec is hard enough. Harder still: you have to know the consideration is even there to write it down, and usually you don’t until reality trips it.

It’s the point almost everyone misses: a spec written cold is necessarily missing the very decisions that only the work can surface. This is the main reason CASE tools and UML initiatives failed — and incomplete requirements have consistently topped the list of failure causes since the 1990s.

Is spec-driven development just a fantasy waiting for the magic IDE to arrive? Can it be successfully practiced today? As it turns out — yes, with some help.

The missing piece

Let’s define the new deliverable for engineering work in the spec-first world. The spec isn’t the paperwork you do before the development. Writing the spec is the development.

That changes what you produce. When AI writes the code, the code is still the deliverable. It’s just no longer your deliverable. Yours is the spec. And because AI has no instincts, that spec has to carry everything that used to live in your head: the intent, the constraints, the edge cases, and the reason behind each one. That’s not a document a product manager throws over a wall. It’s the continuous work of the whole team (product, design, engineering, architects included) deciding together, on the record, what used to get decided alone and in silence.

The job doesn’t vanish; it shifts. You’re still at the keyboard, but you’re recording decisions now, not writing code.

But you can’t write that spec cold: the decisions don’t exist yet, they surface in the work. So how do you get them out? You hand the build to the machine — instructed not to plow ahead on its own guesses, but to flag every decision the spec left open. It runs, hits those gaps, and hands back not code but a gap report: each open decision weighed for business, technical, and user impact, and routed to whoever owns it — product calls to a PM, technical ones to an engineer. Unlike vibe coding, your answers are captured in the durable spec, not a transient prompt — and not just the verdict, but the rationale and the ground truths behind it. The inputs are checked for consistency against everything recorded earlier. A fresh dry run carries the fuller spec further, surfacing the next layer of issues.

A few rounds in, the gaps run dry, and the build comes out clean. The process extracts the same decisions out of you as traditional manual coding does — but it captures the durable intent the agent can reason from, not a brittle rule that breaks when requirements shift. This is the “integrated thought clarifier” Grove wished for — and it doesn’t take a new IDE to get there. I know, because I’ve been building it.

Not from the bleachers

I’m not new to this. I’ve spent years on declarative specifications — intent and rationale over imperative instructions. My latest development platform, Rishon, runs business applications directly from a formal, intent-bearing spec language, with no hand-written code and no visual design at all.

Outside Rishon, I’ve spent the last eight months turning a generic agentic environment (mostly Claude Cowork) into Grove’s missing IDE — a set of reusable AI skills that embody the full spec-driven development methodology. They do the unglamorous, decisive work: weeding out each assumption the moment it’s made, running coding dry-runs that expose gaps, maintaining the entire spec in a consistent state in real time, and keeping an audit trail of who decided what and when. Generic agent in, disciplined spec-writing environment out.

Agentic Spec-Driven Development book cover

#1 Best Sellerin Software EngineeringAmazon, June 2026

I thought about shipping the skills as a package. But a black box is limiting and no fun — you should understand how the skills work, and bend them to your own needs. Instead, I wrote the method down as a book: Agentic Spec-Driven Development. It walks through each skill with examples so you can use them, extend them, and grow your own. I did it because all the papers and books I could find assume the spec is something you can just sit down and write, so none teach the how.

get-on-amazon #1 Best Sellerin Software EngineeringAmazon, June 2026

Could this be a real IDE someday? Probably. The field is moving too fast to freeze the method into a product. Simulating the IDE inside an agentic environment is the right move for now. The method’s skills become the IDE’s spec once we’re ready.

What’s next?

If you’re an engineer, then the way you do your job is definitely changing — but the job doesn’t disappear into thin air. It just can’t. The invisible part of the work, the one that holds the immense value, stays. But you have to keep yourself on the bleeding edge of a rapidly changing landscape — and it is far from easy.

The job of a product manager also changes — you now have to work more collaboratively with engineering. Good news: changes in business requirements are acted upon much faster, sometimes without involving engineers. As long as the spec remains consistent and complete, AI automatically assembles the new version of the product, and you can touch it right away.

Do you agree with my take? Want me to go deeper? Drop me a line — I read every one.

Cheers!

References

Peter Naur, “Programming as Theory Building”, Microprocessing and Microprogramming, 1985 — a program is a theory in its builders’ minds; code and documents are necessary but insufficient.
Frederick P. Brooks Jr., “No Silver Bullet — Essence and Accident in Software Engineering”, 1986 — essential vs. accidental complexity; “the hardest single part of building a software system is deciding precisely what to build.”
Michael Polanyi, The Tacit Dimension, 1966 — “We know more than we can tell.” The foundational statement of tacit knowledge.
The Standish Group, CHAOS Report — incomplete requirements among the most-cited causes of project failure since the 1990s.
Computer-Aided Software Engineering (CASE) tools, 1980s–90s — promised to generate software from specifications and visual models; a 1993 U.S. Government Accountability Office report found “little evidence” they improved software quality or productivity. (overview; “Why are CASE tools not used?”)
Model-Driven Architecture (OMG, 2001) — promised working code generated from UML models; in practice generated schemas and class skeletons but not the behavior that decides what the application does. (systematic mapping study)
Sean Grove, “The New Code”, OpenAI, AI Engineer, 2025 — specifications, not code, as the primary artifact; “if you can communicate effectively, you can program,” with code a “lossy projection” of the spec.
Anthropic, “Effective harnesses for long-running agents”, and Cognition’s Devin — agents that now plan, write, test, and open pull requests over multi-hour runs.

% cat Programmers Were Never Paid to Program. Even Before AI.