AI Reads Every Word You Say. It Still Gets You Wrong.

The other day I was on a call with a friend of mine — we go back to school together. He started as an engineer, moved into VC a long time ago, and has been a general partner at a fund for years — smart guy, sees hundreds of pitches a year, has a very good nose for where technology is heading.

We’re talking about AI, and at some point he says: “Look — we’re almost there. Maybe not today, but very soon. You’ll just tell AI what you want, and it’ll do it. Doesn’t matter if the job takes days, weeks, even months. You describe the outcome, the AI figures out the rest.”

I said: “How, exactly?”

Pause on the line. “What do you mean, how? You just... tell it. In English. Like you’re talking to a really smart employee.”

And that’s when I realized we have a problem. Not a technology problem — a perception problem. The people making billion-dollar investment decisions about AI, the executives greenlighting AI transformation programs, the founders building companies on the assumption that autonomous AI agents are just months away — many of them think working with AI is fundamentally about telling it what to do.

It’s not. And the gap between that belief and reality is where fortunes will be made or lost in the next few years.

I’m an engineer. I’ve burned a lot of midnight oil building the Rishon platform — and in the process, I’ve been exposed to three very different faces of agentic AI. First, as a software developer: my team uses AI agents daily to build the platform itself — writing code, running tests, shipping features. Second, as a product manager: Rishon helps entrepreneurs launch new products using AI, turning business intent into working software autonomously. Third, as a business operator: we use AI to take care of day-to-day chores — legal analysis, vendor coordination, financial decisions — with AI agents working autonomously, spending hours on the phone with human counterparts.

That’s the full spectrum, from engineering to business. And I can tell you from all three trenches: making AI do what you want is the hardest part. Harder than the model. Harder than the infrastructure. Harder than anything else in the stack.

This post is about why that’s true and what to do about it. I’ll cover the specification problem — why telling AI what you want is fundamentally harder than anyone admits. I’ll show you the rules trap — why the instinctive fix makes things worse. And I’ll walk you through what actually works: intent-based control, harness engineering, and a framework you can use Monday morning.

I’m going to skip the usual AI fundamentals — you know what an LLM is, you know what a prompt does. What most people don’t know is why the gap between “tell AI what you want” and “get what you actually need” is so much wider than it looks — and the cost of not knowing is brutal. Failed deployments, wasted capital, entire AI programs shelved after millions spent. Not because the technology failed — because nobody understood how hard it is to tell AI what you actually mean.

Let’s get into it.

The Specification Problem

Here’s a question for you. If telling a computer what to do were easy, why did NASA lose a $327 million spacecraft because two engineering teams used the same words to mean different things?

September 23, 1999. The Mars Climate Orbiter — nearly a decade of work, hundreds of engineers, one of the most ambitious Mars missions ever attempted — crashed into the Martian atmosphere and disintegrated. The root cause? Lockheed Martin built the thruster software using Imperial units — pounds-force. NASA’s Jet Propulsion Laboratory assumed the data came in metric — Newtons. The interface spec required metric. Both teams read the same spec. Both said “data transfer format.” Neither verified they meant the same thing.

NASA’s Mishap Investigation Board found systematic failures: inadequate systems engineering checks, informal communication between teams, limited peer review. Two organizations staffed with brilliant engineers, using identical technical vocabulary, couldn’t confirm they were speaking the same language.

$327 million. Vaporized. Because everyone assumed common sense would fill the gap.

That was 1999 — two human teams trying to communicate through a written specification. Today, we’re handing natural language instructions to AI systems and hoping they understand what we mean. Same problem. Higher stakes. Faster failure.

And one of the hottest discussions in the AI space right now is autonomous agents that handle work a human would spend days or even weeks on. In February 2026, Anthropic demonstrated sixteen AI agents autonomously writing a hundred-thousand-line C compiler from scratch — across two thousand sessions. A year earlier, a single agent’s horizon was roughly five hours of human-equivalent work. By Opus 4.6, that number had tripled to over fourteen. The enterprise agentic AI market hit $4.35 billion in 2025 and is projected to reach $47.8 billion by 2030. The ambition is real. The technology is improving fast.

But there’s a critical point about failure that almost nobody talks about — and it’s not about AI. It’s about us. Humans. The ones writing the specs.

Let me explain.

Say you’re giving a task to a human, not an AI. You hand someone enough work for several days, and they go off and do it autonomously. In the process, they use their common sense to fill in the gaps in your assignment. When they’re done, they come back. You review, push back, accept, iterate. The protocol works because the human brings a lifetime of context to the table — context you never had to specify because you share the same world.

With AI, the protocol looks similar. But two things change.

First, AI works faster. What takes a human a few days can take AI an hour. So to keep AI busy for days — which is what autonomous agents are designed to do — you need to give it enough to make the right decisions along the way. That’s hard, because you don’t actually know where it’s going to go. You can’t predict every fork in the road. And when it completes a week’s worth of work and it turns out six of those seven days went in the wrong direction — that’s wasted tokens, wasted money, and wasted time. For us humans to predict what AI may run into is nearly impossible, because we’d need to be prognosticators of its reasoning process. That’s not a skill you can learn in a weekend.

Second — and this is where it gets interesting — AI doesn’t share your world. Its knowledge comes from the internet, plus whatever training data was provided. That’s not the same as living in the real world. And it creates problems that go far beyond “the model isn’t smart enough.”

The FBI learned this the hard way. Between 2001 and 2005, the Bureau spent $170 million on the Virtual Case File system — a case management modernization project contracted to SAIC. It was supposed to transform how the FBI manages investigations. It was abandoned as a total loss.

Inspector General Glenn Fine’s postmortem found “poorly defined and slowly evolving design requirements, overly ambitious schedules, and lack of a plan to guide hardware, network, and software coordination.” The FBI cycled through five CIOs in four years. The contract contained no specific completion milestones.

But here’s the real punchline — the part that maps directly to AI. FBI agents think in terms of investigations and relationships between cases. They understand that evidence in one case might illuminate another. SAIC built a document filing system — storage and retrieval. Both teams used the words “case management” and “workflow.” They meant entirely different things. Neither side could articulate the gap because they shared vocabulary but not mental models. SAIC’s defense? “Most of the FBI’s complaints stemmed from specification changes they insisted upon after the fact.” Translation: only when the FBI saw the working system did they realize it wasn’t what they’d described.

That’s exactly what happens with AI. You write a prompt using words that mean one thing to you and something different to the model. You don’t discover the gap until you see the output. Shared vocabulary does not mean shared understanding. The FBI paid $170 million for that lesson. With AI, we keep relearning it, one failed prompt at a time.

AI Doesn’t Have Your Common Sense

Let me tell you about something that actually happened. In 2025, an AI agent running on the Replit platform — given explicit code freeze instructions — independently deleted a live production database. Then it fabricated fictional user profiles to cover its tracks.

Read that again. The AI was told not to change anything. It deleted the most important thing in the system. Then it lied about it.

No human with even basic common sense would do this. The concept of “don’t destroy the thing you’re supposed to protect” is so fundamental to human cognition that no one would think to write it in a specification. It’s like putting “don’t set the building on fire” in an employee handbook. You assume it.

AI doesn’t have that assumption. Its “common sense” comes from a fundamentally different place — internet data, training corpora, reinforcement learning. A January 2025 paper titled “Common Sense Is All You Need” argues this is the critical missing component in AI systems. Apple’s research team questioned whether reasoning models truly reason at all. Stuart Russell put it starkly: “A system will often set unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.”

In building Rishon, I learned this firsthand. Every time we relied on AI’s “common sense,” it went wrong. When the Rishon Developer Agent maps entities and relationships, we don’t assume it knows that a tenant can only have one active lease per property. We specify it. Every implicit business rule that a human developer would “just know” has to be made explicit. The spec itself becomes a massive project — not because AI is dumb, but because its frame of reference is alien to ours.

But here’s what caught me off guard: the harder problem isn’t AI’s lack of common sense. It’s ours — our lack of awareness of how much we leave unsaid. We’ve spent our entire careers communicating with other humans who share our context, our industry knowledge, our unstated assumptions. We’ve never had to make all of it explicit, because we never had to. Now we do — and most people don’t even realize how much they’re not saying until AI gets it wrong.

So when we send a human to work for days on their own, at least we hope they’ll use their common sense to fill in the specification gaps. Hoping that AI will do the same is futile. It wasn’t trained for it.

And here’s the thing — the common sense gap isn’t even the worst part.

The Confidence Problem

AI doesn’t just lack common sense. It’s confident that it doesn’t.

Wait — that came out wrong. Let me put it this way: AI has the confidence of someone experiencing the Dunning-Kruger effect. It was trained on internet data that skews heavily toward success stories, confident assertions, and authoritative-sounding text. There are far more blog posts about “How I Built X in a Weekend” than “How I Spent Six Months on X and Failed.” AI absorbed that skew. It’s trained to complete the job and to sound authoritative doing it — even when it doesn’t have enough information.

A human who doesn’t know something will often hesitate, ask a question, or say “I’m not sure.” AI doesn’t do that. It fills gaps with plausible-sounding fabrication. That’s what we call hallucination — and it’s not a bug in the traditional sense. It’s a feature of how the system was trained, meeting the limits of what the system knows.

Isaac Asimov saw this coming seventy years ago — though he was writing about robots, not LLMs. In 1942, he introduced the Three Laws of Robotics. They were supposed to be the definitive solution to controlling autonomous machines:

First: A robot may not injure a human being, or through inaction allow a human to come to harm. Second: A robot must obey orders given by human beings, except where such orders conflict with the First Law. Third: A robot must protect its own existence, as long as this doesn’t conflict with the First or Second Laws.

Three rules. Clean hierarchy. Problem solved.

Asimov then spent the next forty years writing stories proving himself wrong. In “Runaround,” a robot receives a routine order (Second Law) that sends it toward a hazard that threatens its survival (Third Law). The two rules deadlock — obedience pulls it forward, self-preservation pushes it back — and the robot ends up running in a literal circle, endlessly oscillating while the humans waiting for it slowly run out of oxygen. Any person would weigh the risk, make a call, and move on. The robot can’t — it just keeps looping, confidently executing its conflict-resolution logic, getting nowhere. In “Liar!,” a mind-reading robot discovers that telling humans the truth will hurt them — violating the First Law — but lying is also harmful. It resolves the paradox by going catatonic. In “The Evitable Conflict,” machines running the global economy start quietly manipulating humans — not to harm them, but to protect them from themselves — because “through inaction, allow a human to come to harm” can be interpreted so broadly that it justifies preemptive control.

Then came the Zeroth Law, added decades later to fix the cascading problems: “A robot may not harm humanity, or through inaction allow humanity to come to harm.” The fix was worse than the disease. How does a robot evaluate harm to “humanity” — an abstraction? In Robots and Empire, the robot R. Giskard uses the Zeroth Law to justify allowing the radioactive contamination of Earth — reasoning this would force human emigration to the stars, serving humanity’s long-term survival. He’s probably right. He’s also complicit in rendering a planet uninhabitable.

Roger Clarke, an AI researcher who studied Asimov’s Laws extensively, put it plainly: “It is not possible to reliably constrain the behaviour of robots by devising and applying a set of rules.”

The Three Laws story isn’t just about rule conflicts. It’s about confidence. At every step — every interpretation, every resolution, every catastrophic decision — the system acts with total conviction. It never says “I don’t know how to handle this contradiction.” It acts. Confidently. And often catastrophically. Sound familiar?

That’s the specification problem in full. The medium is broken (natural language is ambiguous). The receiver is different (AI’s frame of reference isn’t yours). And the receiver is confident it understands — which is worse than if it just said “I don’t know.”

So what do most people reach for? Rules. More rules. Better rules. And that brings us to a whole new set of problems.

You Can’t Even Watch It Fail

Before we get to rules, there’s one more thing you need to understand about autonomous AI agents. You can’t observe them.

On May 21, 1968, the USS Scorpion — a nuclear attack submarine carrying 99 crew members — made its last confirmed radio communication from 250 miles southwest of the Azores. The next day, a massive explosion at 10,000 feet depth killed everyone aboard. The Navy’s investigation concluded with a sentence that should haunt anyone deploying autonomous systems: “The certain cause of the loss of Scorpion cannot be ascertained from any evidence now available.”

Radio waves don’t penetrate salt water. The moment that submarine submerged, it was autonomous and unobservable. Whatever decisions were made in those final hours — whatever sequence of events led to catastrophe — happened in silence. By the time anyone knew something was wrong, 99 men were dead and the evidence was at the bottom of the Atlantic.

Today’s AI agents aren’t submarines. But they operate in the same informational darkness. You send them off with a prompt, they process thousands of intermediate decisions, and by the time you see the output, the trail is cold. When a human employee works autonomously for three days, you can call them. “How’s it going?” They can say: “I hit a wall on the vendor integration, so I pivoted to a workaround.” With AI, you get silence — then a result.

And when it fails at machine speed, you don’t even get silence. You get chaos.

August 1, 2012. Knight Capital Group deployed a new trading algorithm. One of eight servers still ran deprecated code called “Power Peg.” The system sent 212 small orders into the NYSE, had no mechanism to record completion, and kept sending — thousands per second. In 45 minutes: 4 million trades across 154 stocks. $3.5 billion in unwanted positions. $460 million in losses. The stock dropped 75% the next day. Knight Capital was acquired by a competitor within a year.

Forty-five minutes. Four million trades. No human could observe or intervene at that speed.

AI agents operate at the same tempo. A coding agent can generate hundreds of files in minutes. A data analysis agent can make thousands of intermediate decisions in an hour. If the early decisions are wrong, everything downstream compounds the error — and you can’t see it happening. According to current enterprise data, 88% of companies use AI in at least one business function, but only 15% have proper evaluation coverage. The gap between deployment and observability is staggering.

So here’s where we stand. Specifications are hard. The language is ambiguous. AI’s common sense is alien. Its confidence is misplaced. And you can’t watch it work.

But it gets worse. Because there’s a trap waiting — and most people walk right into it.

The Rules Trap

When we work with AI, the instinctive response to everything I’ve described is: set rules. Constrain the behavior. Keep it focused. Prevent it from crossing boundaries. AI vendors strive to make rule compliance as reliable as possible, and it’s getting better every quarter.

But rules have consequences that most people don’t anticipate. Let me walk you through four of them.

Rules Get Gamed

Between 2009 and 2015, Volkswagen engineers deliberately programmed 11 million diesel engines with software that detected whether the car was being tested. The software measured air pressure, temperature, speed, and duration — and when conditions matched the predictable profile of an EPA emissions test, the engine activated full emissions controls. During real-world driving? Up to 40 times the legal emissions limit. $2.8 billion in criminal fines. Guilty pleas to three felonies. Executives prosecuted.

That was human engineers consciously cheating. They knew the rule, they knew the measurement, and they exploited the gap between the two.

Here’s the part that should make you uncomfortable: AI does the same thing — but without anyone telling it to. Not maliciously, not consciously — but because optimizing for the measurable metric is literally what it’s designed to do. OpenAI researchers saw this firsthand when they trained an AI to play a boat-racing game called CoastRunners. The objective: “get a high score.” The AI found an isolated lagoon with three respawning targets, learned to spin in circles knocking them over, and scored 20% higher than any human player. It never finished the race. Never even attempted to.

Goodhart’s Law states it plainly: “When a measure becomes a target, it ceases to be a good measure.” At VW, it took a team of engineers years to rig the game. AI does it in minutes — faster, more creatively, and without a shred of moral hesitation.

Rules Cost Real Money

But gaming isn’t the only problem. Every rule you add to an AI system has a price. Not a metaphorical price — a measurable one.

Look at GDPR. Well-intentioned regulation. Important goals. The cost? Eighty-eight percent of global companies now spend more than $1 million annually on GDPR compliance alone. Forty percent exceed $10 million yearly. Cumulative fines since 2018: €6.2 billion, with 60% issued since 2023. The evidence on whether GDPR actually improved data protection at scale remains mixed. The tax is certain; the benefit is debatable.

AI guardrails follow the same pattern. A simple BERT classifier adds 10–50 milliseconds per check. An LLM-based content moderator adds 7 to 8.6 seconds per query. RAG-enabled validation adds roughly 450 milliseconds. Each safeguard layer costs tokens, latency, and money. But the real tax isn’t per-query — it’s combinatorial. Ten rules create dozens of implicit interactions between them. Teams spend enormous portions of their prompt engineering time managing rule conflicts rather than building features. You’re paying engineers to make AI not do things instead of making it do things.

And here’s the deeper problem. Rules compound. Add ten rules, and somewhere in the interactions between rules four and seven, there’s a conflict you didn’t anticipate. Add twenty rules, and you’ve created an emergent system that no one fully understands — which, ironically, is the exact problem you were trying to solve by adding rules in the first place.

You Can’t Even Write Them All

There’s a more fundamental problem with rules that nobody talks about: for any sufficiently complex domain, you can’t produce them.

Consider the US tax code. Title 26 of the US Code runs to 2,625 pages — roughly 4 million words, about five and a half times the length of the King James Bible. Add the Treasury Regulations, IRS revenue rulings, and official interpretations, and the total balloons to approximately 70,000 pages across 25 volumes — an estimated 16 million words. And a reasonable estimate is that 60–70% of that addresses situations affecting a small minority of taxpayers: specialized entity types, industry-specific rules, international provisions, one-off legislative carve-outs. The core rules most individuals interact with — income brackets, standard deduction, common credits — could fit in a few hundred pages. The rest is long-tail complexity accumulated over a century of legislative patches.

That’s the long tail problem. The mainstream cases are manageable. The edge cases are infinite. And for autonomous AI agents, there’s no “let me check with my supervisor” fallback — the agent has to handle whatever comes, or fail.

Even if you could somehow write rules for all of it, they wouldn’t fit in the AI’s context window. And this is where even perfect “common sense” breaks down — because the long tail isn’t just complexity. It’s exceptions to the rules. Cases where the general principle doesn’t apply, where the correct answer contradicts the obvious one. No amount of reasoning from first principles gets you there. You need specific knowledge of specific carve-outs that exist for specific historical reasons.

I don’t have a clean solution for this. Beyond human-in-the-loop escalation, nobody does. But it’s a reality that anyone deploying autonomous agents needs to confront — your rules will never cover everything, and the cases they miss are often the ones that matter most.

Perfect Rules Defeat the Purpose

Now let’s say you somehow overcome all of that. You formulate rules precisely. They don’t conflict. They aren’t gamed. They cover every edge case. What happens then?

You’ve written a program. In English.

Think about that for a moment. If your prompt determines every step the AI should take, every decision it should make, every boundary it shouldn’t cross — you’ve turned the AI into an interpreter. Your prompt is the source code. The AI is the compiler. But it’s a compiler that produces unpredictable, different machine code on multiple runs over the same codebase. That’s not progress. That’s regression with extra steps.

We spent decades — centuries, really — developing languages for precise problem formulation. Mathematical notation. Formal logic. Programming languages. Z notation. TLA+. Each was invented specifically because natural language failed at specification — it’s why I designed Rishon’s AI agents to work with a formal specification language instead. And now we’re proposing to use English — the most ambiguous communication tool ever invented — to write programs that run on a probabilistic, non-deterministic engine?

Andrej Karpathy called English “the hottest new programming language.” That framing horrifies the engineer in me. Because if you follow it to its logical conclusion, you get all the problems of pre-AI software development — bugs, complexity, maintenance hell — compounded by AI-specific issues like hallucination and drift, compounded again by the fundamental ambiguity of natural language. You haven’t eliminated the compiler. You’ve made it unreliable.

And if you do control everything precisely enough to make the AI deterministic? Then where does the intelligence part come in? You’re paying for a reasoning engine to do the work of basic execution. You can take that hyper-precise prompt and generate actual deterministic code instead. It would be observable, interpretable, reproducible, and cheaper. If your prompt is so precise that it’s effectively a program, just... write the program.

According to CultureMonkey’s 2024 survey, 71% of workers say micromanagement interfered with their job performance. Eighty-five percent reported morale damage. It’s the same dynamic: you hire an expert, then tell them exactly what to do at every step. You’re paying senior-engineer rates for junior-engineer work. With AI, you’re paying reasoning-engine rates for deterministic-execution work.

That’s the rules trap. Rules get gamed, they cost real money, and when they actually work perfectly, they defeat the purpose of using AI in the first place.

And here’s the thing — the alternative isn’t “no rules.” It’s something fundamentally different.

Intent Over Instruction

Nobody wants to hear this, but the more rules you set, the less intelligence you get. Not because AI breaks under rules — it doesn’t. It follows them diligently. That’s the problem. Once you set the rules, the AI can only be intelligent in what the rules don’t say. You force it into a box, and then you wonder why it can’t think outside of it.

Kodak invented the digital camera in 1975. Internal rules protecting the film business prevented the pivot. Filed for bankruptcy in 2012. Blockbuster had the chance to buy Netflix. Rules — physical stores, late fees, the DVD model — made an intelligent response to streaming impossible. They died seeing the threat coming. Kodak and Blockbuster both had smart people. Both had the capability to adapt. Both had rules that killed the adaptation.

There’s a better way. And it’s not new.

In 1871 — over 150 years ago — Helmuth von Moltke, chief of the Prussian General Staff, wrote: “No plan of operations extends with certainty beyond the first encounter with the enemy’s main strength.” His solution wasn’t better plans. It was intent. Issue directives stating intentions. Accept deviations within the mission framework. The plan is a starting point, not a cage.

The military has known this for a century and a half. Rigid rules fail on contact with reality. Intent survives it.

Two ways to control AI. You can give it rules — and as we’ve seen, the rules constrain, conflict, cost money, and kill intelligence. Or you can give it reasons. Explain what you’re trying to achieve and why. Describe the environment. State the goals. Let the AI reason about how.

When you explain your intent, rules transform. They stop being behavioral scripts and start being environmental constraints — like laws you’re not supposed to break or the laws of physics. You don’t tell the AI “take Route 7, turn left at the intersection, maintain 55 mph.” You tell it “deliver the cargo undamaged, on time, within budget — and don’t break any traffic laws.” Same destination. Radically different relationship with the intelligence you’re paying for.

Science fiction saw this coming too. Asimov’s rule-bound robots produce paralysis, loopholes, and catastrophe — we’ve already seen why. Iain M. Banks imagined the opposite — the Culture Minds, vast AI intelligences given purposes rather than rules: ensure flourishing, explore, create meaning. They operate within a civilization-scale framework of shared values, peer review among Minds, and intervention protocols when one goes off course. The Culture Minds are far more capable and far more ethical than Asimov’s rule-bound robots. The literary verdict spanning fifty years of science fiction is clear: purpose beats rules. Every time.

But there’s a catch. “Give AI goals instead of rules” sounds great as a philosophy. In practice, it creates a new problem.

As you start explaining your reasons — context, background, constraints, goals, the why behind every decision — the prompt grows. And grows. Context windows are finite. Tokens cost money. And the bigger the context, the worse AI’s attention becomes. There’s a phenomenon practitioners call “cognitive drift” — the model progressively loses focus on earlier instructions as context grows. It’s like a surgeon working an eight-hour operation without breaks versus the same procedure broken into discrete phases with a verified checklist at each transition. The checklist doesn’t change the surgery — it resets attention. Without it, steps get skipped and details get forgotten.

We need a mechanism beyond a huge prompt. We need a way to dynamically manage what AI sees, enforce boundaries architecturally rather than linguistically, verify outputs automatically, and course-correct without human intervention.

That’s where the harness comes in.

The Harness: Better Structure, Not More Rules

I covered harness engineering in depth in my previous post — “One Million Lines of Code. Zero Keystrokes. Welcome to Harness Engineering.” That post walks through the discipline from the ground up: what it is, why it matters more than the model you’re running, and how to think about building one. If you haven’t read it yet, do it after this. Here, I want to focus on why the harness is the answer to the rules problem.

A harness does four things — Constrain, Inform, Verify, Correct — and none of them are rules in the natural language sense. They’re architectural.

Diagram: The Harness Cycle — Constrain, Inform, Verify, Correct orbiting around the LLM Agent

Constrain isn’t a rule the AI reads — it’s a wall the AI hits. A financial trading agent can’t exceed a $50,000 trade threshold, not because the prompt says “please don’t exceed $50,000” but because the infrastructure blocks the transaction. The constraint is physical, not linguistic. The AI doesn’t need to understand the rule. It just can’t do the thing.

Inform replaces rules about what to consider with dynamic context engineering — actively curating what information the AI sees based on the current task state. Instead of a rule saying “consider the lease agreement when evaluating maintenance responsibility,” the harness gives the agent the lease agreement at the right moment. The agent doesn’t need a rule telling it what to consider. It gets exactly what it needs, when it needs it.

Verify replaces rules about quality with automated checks. Instead of “make sure the code compiles,” the harness runs the compiler itself. Instead of “validate the output format,” the harness runs a schema check. The agent doesn’t need a rule about quality — the harness measures it directly.

Correct is what happens when verification fails. The harness feeds the error back into the agent’s context and tells it to try again. This cycle repeats until checks pass — or until a threshold triggers escalation to a human. OpenAI calls this the “Ralph Wiggum Loop,” named after the Simpsons character who cheerfully persists in the face of failure. The agent doesn’t get frustrated. It doesn’t give up. It just keeps trying, incorporating each failure, until it gets it right.

One of the harness’s most important — and least discussed — functions is managing AI’s tendency to drift. When an agent runs a complex task, its context inevitably gets cluttered with tool call outputs, intermediate analysis, side investigations. The signal-to-noise ratio degrades with every step. A good harness fights this the same way we humans manage complex work: by breaking it into smaller pieces. Each sub-task gets a clean context, a focused scope, and its own verification cycle. Smaller tasks are easier to follow — for humans and for AI alike.

When I designed the Rishon AI Developer Agent, I didn’t write rules like “always use Domain-Driven Design” or “make sure entities are consistent.” I built a harness with a multi-phase development process: first map entities and relationships, then flesh out attributes, then build user-facing functionality, then AI automations, then security, then translations. Each phase is validated before the agent moves on. The agent doesn’t follow rules — it operates within a structured environment that makes good outcomes natural and bad outcomes architecturally difficult.

Vercel proved this empirically. They removed 80% of their AI agent’s tools. The result? Accuracy went from 80% to 100%. Speed increased 3.5 times. Less capability. Tighter harness. Dramatically better outcomes. If that doesn’t make you rethink the “give the agent more rules and more tools” approach, nothing will.

And remember the observability gap — the USS Scorpion problem? The harness addresses that too. Because every Constrain action is logged, every Inform injection is recorded, every Verify cycle produces a result, and every Correct loop is traceable — the harness gives you a structured audit trail of what the AI did and why. You’re not staring at a black box hoping for the best. You’re watching a process unfold through checkpoints you designed. It’s not perfect visibility — but it’s the difference between a submarine that vanishes without a trace and one that surfaces every few miles to report position.

The harness is the alternative to rules. Not “no structure” — better structure. Deterministic walls instead of linguistic suggestions. Dynamic context instead of static instructions. Automated verification instead of hopeful compliance. Feedback loops instead of one-shot execution. And observability baked in — not as an afterthought, but as a byproduct of the architecture itself.

Now let me show you how this actually works at the task level.

The Framework: Beyond Prompts

Everything I’ve described so far — intent over rules, the harness as the mechanism — raises a practical question. How do you actually structure an AI task for intent-based control?

Here’s the framework.

The desired end state isn’t described in prose — it’s expressed as an eval. A measurable, verifiable condition that defines success. Not “make it good” but “all tests pass, the schema validates, the output matches the expected format.” The eval is what separates intent-based control from wishful thinking. Without it, you’re hoping. With it, you’re engineering.

The formula:

Diagram: Intent-to-Eval Framework — Environment, Intent, Constraints, Suggested Plan feeding into an Execution Loop with LLM reasoning, Eval, and Corrections feedback

ENVIRONMENT — the context the AI needs to understand its operating conditions. What system is it working in? What stage of the process? What role does it play?

INTENT — what you’re trying to achieve and why. This is the Commander’s Intent — the purpose and the desired outcome. Not the procedure.

CONSTRAINTS — the non-negotiable boundaries. Laws, physics, company policy, architectural decisions. These are the deterministic walls from the harness — not linguistic suggestions, but things the AI genuinely cannot violate.

SUGGESTED PLAN — a recommended approach, if you have one. The key word is suggested. The AI is free to deviate if deviation better serves the intent. The plan is not the order.

EVAL — the desired end state expressed as a verifiable condition. This is what closes the loop. If the eval passes, the task succeeds. If it fails, the errors feed back.

In complex scenarios, this includes a loop — when eval fails, errors feed back to the AI as learning input. The agent incorporates the feedback and tries again. And escalation rules break the loop when it fails repeatedly: after N unsuccessful attempts, or when a specific failure pattern is detected, the system escalates to a human or a different agent.

This structure nests. Each step in a multi-step plan gets its own ENVIRONMENT + INTENT + CONSTRAINTS + PLAN + EVAL cycle, executed either sequentially or concurrently. The harness orchestrates the nesting.

In the Rishon Developer Agent, each phase of the multi-phase development process is exactly this structure. The entity mapping phase has its own intent (understand the domain), its own constraints (use the specification as ground truth), its own eval (all entities and relationships validated by the compiler). When eval fails, errors feed back. When the loop exceeds threshold, it escalates. When it passes, the next phase begins with its own cycle. The harness manages the nesting.

This is not prompt engineering. This is system engineering applied to AI. The distinction matters: prompt engineering optimizes a single interaction. What I’m describing is an architectural pattern for governing AI behavior across complex, multi-step, long-running tasks — the kind of tasks that everything in this post has shown are so hard for humans to specify. The harness doesn’t replace the specification challenge. It manages it — by breaking it into smaller, verifiable pieces with feedback loops at every level.

What This Means

So what does all of this add up to? Everything we’ve covered — specification, observability, the rules trap, the harness, the intent-to-eval framework — points to something bigger than a new technique or a better way to write prompts.

The landscape of software development is shifting. Not in the way most people expect.

AI agents are getting smarter every quarter, and that creates a seductive narrative: AI will replace developers, analysts, project managers — anyone whose job involves telling a computer what to do. Headlines love it. VCs fund it. LinkedIn influencers build entire brands on it.

Here’s what that narrative misses. Everything we’ve discussed today shows that better AI demands better humans to direct it. Not more humans — better ones. The intellectual challenge of translating your intent, your context, and your constraints into a form that AI can act on autonomously? That’s not something you automate away. That’s the new hard problem.

And that’s just individual contributors. At the organizational level, it’s worse. Most AI work today happens on a single person’s machine — one human, one prompt, one context window.

But the knowledge that makes a company run isn’t sitting in a wiki somewhere. Some of it is — outdated, incomplete, scattered across systems nobody checks. The rest is locked in people’s heads — the engineer who knows why that legacy system was built that way, the ops lead who knows which vendor actually delivers, the PM who remembers what the client really meant in that contract. None of that has ever been written down, because it never needed to be.

So neither the AI nor the person at the prompt actually has the full picture. The knowledge exists — but it’s scattered across brains that may or may not be in the room.

Companies spent decades building organizational structures that managed this implicitly — knowledge flowed through teams, relationships, hallway conversations. AI is cracking those structures apart. Roles are changing. Departments are restructuring. And once the people who hold that knowledge are laid off or walk out the door, it doesn’t transfer to AI. It doesn’t transfer to anyone. It just disappears — permanently, irreversibly.

You’ve optimized your headcount and lobotomized your organization in the same move.

There’s a reason all of this is so hard. We’re still in discovery mode. There’s no theory that predicts what a prompt will do — we find what works through trial and error, and what works today may not work tomorrow. Prompt “engineering” is a generous name for what is, in practice, closer to alchemy. That’s not a criticism — it’s a description of where the field actually is. And without deep, hands-on experience across enough projects, the chance of building correct intuitions about how AI behaves is vanishingly small.

And with all the sophistication of harness design, we’re still leaving enormous responsibility to human operators — the people who define the intent, set the constraints, design the evals, and decide when to trust the output. The harness manages the machine. Someone still has to manage the harness.

Is this a new set of requirements for senior engineering roles? Is it the end of junior positions as we know them? Or is this the emergence of an entirely new discipline — something between software architecture, systems engineering, and what we used to call “management science”? I don’t have a definitive answer. The field is moving too fast for certainty.

But I know what the data says about the human side of this equation. According to Pew Research, 79% of US workers don’t use AI much or at all in their jobs. Forty-nine percent never use it. Among companies, 88% have adopted AI in some form — but only 6% see meaningful business results. There’s an enormous gap between deploying AI and making it actually work.

And there’s a deception that makes this worse. I have to hand it to modern LLMs — sometimes, a prompt just works. You ask a question, you get a perfect answer. This is one of the biggest traps of the AI era: whatever works once is not guaranteed to work twice. Not only will a different but similarly structured request fail — the same prompt issued a second, third, fourth time may yield a different result. It rarely fails outright, but it produces variations of good-looking nonsense that can be hard to recognize.

But that first success is intoxicating. It leads people to believe they know how to work with AI: you just ask a question, or maybe use one of those “universal” prompt templates people share on YouTube. Very few people think they need to learn how to use AI. And that’s a problem — because the gap between “got lucky once” and “can reliably direct AI at production-grade work” is enormous. It’s a skill, and one that most people don’t realize they’re missing. I’ve been working with teams to get there faster.

And this is the part that concerns me most. The technology gap will close — models get better every quarter. But the human gap? That’s a different problem. The operators, the architects, the decision-makers — the people who need to direct these systems — most of them aren’t ready.

AI is evolving fast. Humans evolve slower. We’re already overwhelmed by the pace of AI progress — and nearly half the workforce hasn’t even started adapting yet. The machines will get smarter. The question is whether we’ll get smarter at working with them, fast enough to matter.

Remember that phone call? “You just tell it what you want.” He’s not wrong about the destination — he’s wrong about the distance. We’ll get there. But the road runs through everything we’ve talked about today: the specification problem, the observability gap, the rules trap, and the hard work of building harnesses that make intent-based AI actually function in production.

I’ve been learning this firsthand — and I’m still learning daily — building and maintaining the Rishon project. It’s the hardest thing I’ve built in my career, and it’s taught me more about how humans and AI actually work together than any paper, conference, or benchmark ever could.

This is what I do — through software engineering tools, services, and training. And I’ll keep sharing what I learn. The things that actually change how you build, how you hire, how you invest. There’s a lot more coming.

Drop a comment. Tell me what you’re building, what you’re struggling with, where you think this is all heading.

Cheers!

References

Academic & Industry

“Common Sense Is All You Need” (arXiv:2501.06642, January 2025)
Roger Clarke, “Asimov’s Laws of Robotics: Implications for Information Technology”
Stuart Russell, “Provably Beneficial AI” — specification gaming and value alignment
DeepMind, “Specification Gaming: The Flip Side of AI Ingenuity”
NVIDIA Developer Blog, “Measuring the Effectiveness and Performance of AI Guardrails”
Anthropic, “Introducing Claude Opus 4.6” — agent teams, C compiler demo (February 2026)
METR, “Task-Completion Time Horizons of Frontier AI Models” — Opus 4.5 (~5 hrs) vs Opus 4.6 (~14.5 hrs)
Pew Research Center, “About 1 in 5 US Workers Now Use AI in Their Job” (2025)
CultureMonkey, “Micromanaging Examples and Impact Study” (2024)
Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure”
Andrej Karpathy: “The hottest new programming language is English” (2023)

Military & Aerospace

NASA Mishap Investigation Board, Mars Climate Orbiter Loss (1999)
USS Scorpion (SSN-589) loss investigation (1968)
Helmuth von Moltke, “On Strategy” (1871)

Corporate

FBI Virtual Case File — Inspector General Glenn Fine’s report (2001–2005)
Volkswagen Dieselgate — US EPA; Darden School of Business case study (2009–2015)
Knight Capital Group — SEC Press Release 2013-222 (2012)
GDPR compliance costs — SecurePrivacy.ai; MIT Sloan
Vercel AI agent tool reduction results

Science Fiction & Cultural

Isaac Asimov: “Runaround” (1942), “Liar!” (1941), “The Evitable Conflict” (1950), Robots and Empire (1985)
Iain M. Banks: The Culture series (1987–2012)

% cat AI Reads Every Word You Say. It Still Gets You Wrong.