Spec-Driven Development | Anatoly Volkhover

TL;DR: Spec-driven development is the old dream of describing what you want and letting the machine produce the software. The dream went unrealized for forty years; modern AI has finally unblocked it — though not in the way most people assume. I practice spec-driven development for a living — here’s how it works today, where I see it heading, and what I’m building next.

Spec-driven development (SDD) is not a new idea. It is one of the oldest dreams in engineering: describe what you want, and let the machine produce the software. The industry has chased it for decades — through 4GLs, CASE tools, and UML-driven modeling — and never made it real.

To different people, “describe what you want” always meant different things. To executives it was software development without the engineering bottleneck — repeatedly promised, never delivered. To product managers (PMs) it was wave after wave of tools that improved so little that, decades later, they still write requirements in Word. To engineers it was a parade of technologies that never panned out. Most don’t even remember the failed attempts; today we’re being sold the same dream in new packaging. I’ve been in the field long enough to know better, so before we assume this time is different, let’s dissect SDD for what it actually is.

Start with the spec

The term is simple: a specification is how two parties communicate formally and record their alignment. Party A writes down what they want and hands it to Party B. B reads it and asks questions. A revises until the questions stop. Only then can B safely act on it, inside the alignment they’ve reached. The spec is the durable record of that alignment.

In software, “the spec” has two incarnations. The PM writes the product spec — the PRD — and the architect writes the engineering spec. Both feed into coding. The engineering spec isn’t just a restatement of the PRD. It adds the architecture decisions, and the technical specifics that pin down how the product must actually work — detail a PRD would never carry — while recasting the PRD’s intent in terms engineers find easier to act on. Each recasting risks drifting from the original intent. And when engineers talk about spec-driven development, they almost always mean it from the coder’s seat: give me the full spec of what to build, and I’ll build it.

Over the wall, and back again

Here is how the work has flowed for as long as anyone can remember. Stay with the chain — its length is the point.

An executive wants a system that moves a business objective, and explains it — usually verbally and informally, in a meeting — to the PM.
The PM writes up the features that system needs and hands the write-up to engineering.
The architect lays down the technical foundation those features will sit on. Now all that’s left is to code it.
The coder writes the code — lots of it.
The tester checks that the code matches what the PM wrote.
The PM finally plays with the actual system and, often for the first time, sees the gaps that were invisible on paper — gaps that could trace back to their own write-up, the architecture, or the code. Each gap restarts the cycle, on a smaller scope each time. Once it’s good enough, the PM takes it to the executive, who may be satisfied — or may send everyone back to the drawing board. The loop ends when the result is finally right, or when time, budget, or patience runs out.

Every handoff in that chain is a chance to lose or distort the original intent. The work moves mostly in one direction, and alignment does get checked — but by hand, at great effort, and never reliably. The people who could have caught a problem early are the last to see the result. A solo builder never needed this chain; it exists only because a team has to pass work from one person to the next, and until recently there was no better way to keep everyone aligned. Hold onto that picture — the fix is coming.

How the code quietly becomes the source of truth

Once the spec is coded and accepted, the code becomes the engineers’ source of truth, and it quietly frames how they approach everything that comes after. A new requirement arrives and the engineers reach straight for the code — how do we modify what’s already here? — rarely stepping back to consider a clean rebuild. A rebuild means a budget overrun; it means killing the thing they made — their baby — and it means admitting, out loud, that the earlier attempt missed. So the original spec freezes in time, and every change rides on a new, smaller spec describing only what’s different. This is spec-first development: a full spec goes to an engineer who codes it, and from then on every change is patched onto code that has become the single living truth.

Teams with deeper pockets go a step further. Every time a change is made, they loop back and update the original spec as well — the PM revising the PRD, the architect revising the engineering spec. This is known as spec-anchored development. It helps with onboarding and QA alike, because there is finally one up-to-date place to learn the state of the project and to test the build against. But it is costly and a little idealistic — nothing actually guarantees the spec matches the code, and the two drift further apart with every release.

Spec-as-code, and why it never worked before

There is one idea that promises to fix all of that at once: spec-as-code. If the spec truly holds everything needed to build, producing the code becomes pure automation — and automating the coding removes the lossy handoffs that plague the process. The spec-versus-code drift disappears as a side effect: the moment code is generated, it stops being the source of truth — the spec is — so there is nothing left to drift. This was the promise of CASE tools in the 1980s–90s and of model-driven architecture with UML in the 2000s. They did a good job capturing the structure of a system and could generate database schemas reliably, but they couldn’t describe the behavior — what the application actually does — because they relied on a formal language, and no formal language can capture everything from gaming to banking. The one place where a formal language does work is inside a narrow domain. For instance, my Rishon platform uses a domain-specific language (DSL) to do exactly that for business applications. In the general case, though, such a universal language is impossible to build — the variety of what software can do has no end — so the tools never closed the gap. Practitioners fell back on what they always used — English — but the informal nature of human language is precisely what leaves requirements incomplete, which has ranked among the most-cited causes of project failure for decades.

So the idea of spec-as-code fell out of love — until modern AI rekindled it by removing the biggest blockers. First, you no longer need a formal grammar to write the spec; plain English is enough. Second, AI generates the code. For the first time, the automation that spec-as-code always assumed actually exists.

Why the common narrative doesn’t hold

So: point AI at the spec and let it write the code. As a story it’s clean, and on the surface it works. In practice, it rarely holds up — and the reason is easy to miss. To see it, you have to look closely at what coding actually is, and very few people do, coders included.

A programmer does not simply convert a spec into code. They could, if the spec were mathematically complete — but it never is. There are always gaps, smaller or larger depending on who wrote the spec. Those gaps surface only when someone actually builds and hits the question: how should this behave, exactly? The big, obvious gaps get escalated to the architect or the PM. The ones that appear insignificant to the coder get filled with a quick assumption, and the work moves on.

This is the part of the job nobody names: coding was never mostly typing — it was deciding, a steady stream of small calls the spec never made. Across the teams I’ve led, I’ve watched ungrounded guesses like these miscalculate sales tax, mishandle free gifts on returned purchases, and accrue incorrect interest on backdated financial transactions. Juniors guess wrong more often than seniors, which is why their code frequently passes spec-gated testing and fails much later in production.

Now put AI in the coder’s seat. It guesses just as readily, and gets it wrong for a sharper reason: the answers it needed were never in its training, and it was not in the meeting when the context was shared with the team.

When you’re vibe-coding and watching closely, you can still catch a bad assumption as it happens — not reliably, but possible. Try to live on the bleeding edge — hand the work to an autonomous agent — and that safety net is gone. If you’re counting on manual review to catch bad assumptions afterward, reconsider: reading someone else’s code is far harder than writing your own, and it doesn’t scale — erasing every bit of productivity promised by agentic AI coding. Going down this path lands us exactly where CASE and UML did: nowhere.

Making spec-as-code real

The way out isn’t in the coding. It is in the spec — not written off the top of someone’s head, but developed in an environment that agentic AI makes practical. Here is what that environment has to provide:

Intent over instruction. Record not just what was decided but why. A bare decision is rigid; the intent behind it lets the spec adapt when conditions change. Decisions are still captured — but always with their reasoning, so they can be questioned later instead of blindly followed.
Precise and comprehensive, in plain English. The spec stays human-readable but has to be exact and complete. People can’t do that reliably on their own, so the process has to help them get there first, then enforce that standard.
Always consistent. Any decision or fact that would contradict what’s already recorded is caught and resolved before it’s admitted to the spec, so the spec is never in an inconsistent state.
A workflow for compromises. Many decisions are forced by outside limits — a deadline, a vendor, a regulation. A compromise should never outlive its cause: once that limit is gone, the trade-off is flagged for another look, since it may no longer be needed.
Technical debt and rollout schedule inferred from business priorities. How much technical debt to take on, and how to sequence the rollout, are not purely technical calls; they follow from business priorities, risk, budget, and timing — reasoned out by weighing the pros and cons each option carries.
One shared environment. The spec is no longer built in rigid phases; it becomes the product of collaborative work — PMs, engineering, architects, and ideally executives — all contributing. The environment organizes the flow that replaces the chain of traditional handoffs.
Approval workflow. The decisions that carry weight — the ones that put revenue or legal liability on the line, or affect the architecture — have defined owners. The system knows who must sign off, so such a decision can’t be made quietly by whoever happens to hit it first.
Decisions surfaced early, by design. Those decisions don’t come out on their own; a deliberate process drives them out. The same questions that used to ambush the coder mid-build — or slip by entirely — are raised and answered while the spec is being written. This one is deceptively hard, and it is where I have personally burned the most midnight oil.
Built-in evals. Every behavior is recorded together with the check that proves it correct. It’s test-first thinking, moved from code to the spec. Some cases will always go unwritten, but the method makes capturing them the habit rather than the afterthought.
Non-functional and compliance requirements. Security, performance, data residency, and regulatory constraints are first-class decisions, not afterthoughts. They’re often negotiable and the trade-offs around them shift over time, so they’re recorded with owners and re-checked continuously, never settled once and forgotten.
Traceability and auditability. End to end, every behavior traces back to the decision behind it, and every decision back to who made it and why — so the reasoning lives in the record instead of in people’s memories.
Code as artifact. The spec is the source of truth, and the code is generated from it — most of it disposable, regenerated at will. Where a piece of code carries lasting consequences — a database schema, say, since changing it forces data migrations — that piece (in any form) becomes a decision in the spec, not left buried in the code.

This is not a complete list, but it is a start. I work this way today, from solo and small-team builds all the way to enterprise scale, on greenfield and brownfield projects alike.

How I practice this today

In my projects today, the spec is stored as an ontology graph — ground truths, decisions, preview artifacts, and more, all linked together — with protocols in place that keep every part consistent with the rest. Both the graph and the protocols are enforced today by the skills exercised by agentic AI: Claude, Codex, Antigravity, Cursor — whichever you prefer.

The method is easier to show than to spell out in the abstract. What follows is an illustration: one exchange while developing a spec — use the button below to step through it, or skip and read on.

spec-session

anatoly ❯ new requirement: a customer can return a purchase for a refund within 30 days from the date of the purchase

Open decision: some orders include a free promo gift (BOGO). On a refund, does the customer keep it, return it, or have its value clawed back?

Why it matters — revenue: refund leakage · legal: clawback may breach the promo’s terms · UX: clawback drives chargebacks.

Owner: Product (jeff).

Pick one:

1 Queue the question to jeff

2 Drop the requirement

3 Something else

anatoly ❯ 1

the question is queued to jeff; you can work in another session while waiting

jeff answered:

D-207 — the customer keeps the gift. Rationale: the promo promised it unconditionally; dispute cost outweighs the leakage. Scope: promo gifts only, never discounts.

inconsistency — D-207 collides with GT-031 (“a refund restores inventory”): a gift the customer keeps must not return to stock.

Proposed fix: restore inventory for purchased items only.

Pick one:

1 Apply the fix

2 Drop the requirement

3 Something else

anatoly ❯ 1

The proposed fix has been applied.

Reconciled. The requirement is now consistent with the rest of the specification.

Automatic eval added: return + promo gift → gift kept, inventory unchanged

Pick one:

1 Publish the requirement

2 Drop the requirement

3 Something else

anatoly ❯ 1

The requirement R-204 is now published and live in the project spec. Ready to rebuild on your command.

You get the idea — this is the kind of interaction the spec-development AI skills create.

Agentic Spec-Driven Development book cover

#1 Best Sellerin Software EngineeringAmazon, Summer 2026

I developed a set of AI skills for spec-as-code work, and I could have shipped them as a black box — usable, but no real help in understanding the moving parts. So I wrote the method down instead — every piece laid out for you to see, learn, and adapt. Agentic Spec-Driven Development walks through every skill with worked examples.

get-on-amazon #1 Best Sellerin Software EngineeringAmazon, Summer 2026

The book isn’t the end of the road — it’s the beginning. I’m already building the next generation of the skills the method relies on, and I’ll release them when they’re ready.

What I’m building next

The journey doesn’t stop at instructions written for an AI to follow. Three moves are already underway.

From skills to tools. The next step replaces pure-AI skill instructions with a mix of agentic tools and plugins for the popular AI clients — Claude, Codex, Antigravity, Cursor. That turns today’s Git-based sharing into true team-wide collaboration, cuts token cost, improves reliability, and enables efficient multi-model routing. When it’s ready, it can be offered as a product.

Domain knowledgebases. In parallel, I see real value in knowledgebases that assist spec-building for select verticals — think of them as modules in an ERP system, where the spec configures a module instead of defining it from scratch. The same idea works for legacy codebases: the spec takes over the job once held by a veteran maintainer.

A runnable spec. The last move makes the spec not just codable but runnable — record a piece of spec and immediately see what it does, without waiting for a build. Getting there most likely means migrating the whole environment (Claude and the others) into a custom spec-focused agent — in effect, the spec-writing IDE Sean Grove imagined in 2025. That agent would interpret the spec and run the application it describes. A product manager could enter a new decision or a constraint and, moments later, get their hands on the working app it produces — then make a correction and try again. The turnaround stays immediate unless a change requires a new engineering decision — a judgment call, not implementation. The build is always automatic. In an established system most of those decisions are already on record, so a genuinely new one is rare. When one does surface, it goes onto the engineering queue, and the app is ready the moment engineering answers.

Building this from scratch would be an enormous undertaking — if not for the existing Rishon codebase, which already executes an ontology graph for the business domain. This gives me a strong place to start. A simple integration enables cloud deployment, a flexible-schema data store, role-based data security, interpretation of a formal spec, AI automations, and a polished user interface that goes far beyond a simple chat window typical for AI agents (demo).

Bringing this to your team

Almost everyone shares the same need: working software assembled from a spec the whole team shapes, with AI doing the build, and very little ceremony in between. Getting there reliably is harder than it looks, and the guidance out there is mostly direction and motivational talks, not instruction. That gap is exactly why I locked myself away and wrote the book — the actual how-to, not another pep talk.

If your team is already practicing spec-driven development and just wants practical ideas, the book may be all you need — yours to put to work today. If you want hands-on help getting there faster and without the hiccups, I’m open to working with a few teams directly, as a fractional CTO or an advisor on AI adoption in engineering.

drop me a line

Either way, I wish you the best of luck with your projects.

% man Spec-Driven Development

Start with the spec

Over the wall, and back again

How the code quietly becomes the source of truth

Spec-as-code, and why it never worked before

Why the common narrative doesn’t hold

Making spec-as-code real

How I practice this today

What I’m building next

Bringing this to your team