The Drift

A Conversation That Forgot Itself


Last month, something happened that struck me. Not in a meeting, even though this is where pretty much all the stories start. And also not on paper. It was in a chat window. There was an idea for a tool, typed into a prompt, and within minutes there was working code. Magic.

Then one thing needed to change. The AI changed it. And broke two other things. So the next prompt asked it to fix those. It fixed them. And introduced something new that nobody had asked for. Most certainly not a developer here, but the AI most certainly is. Yet 3 hours later, even when the code worked, looking at the result the satisfaction didn't feelhow it probablyshould have. Somewhere during that token-eating session, the original goal of what was wanted had vanished.

Not because the AI was "bad". It simply didn't match what had been originally described. Because with every correction, the conversation had drifted. The intent had shifted. The AI's understanding had shifted. And neither side noticed while it was happening.

Though it felt like progress the entire time. Why was the result not satisfying?

Surfing the Drift

What was bothersome wasn't that it didn't work. It was that there was no way to tell when it stopped "working". It didn't feel broken. It felt plan-less. Like value was being generated, but there was no way to grab hold of it. No way to trace it back to anything, really.

If you've spent any time building with AI, you might know this feeling — or situation described. The addictive loop of prompt, result, correction, prompt, result, correction. It's fast. It's exciting. And it works beautifully.

The industry calls it vibe coding: you describe what you want in natural language, an LLM (large language model) generates the code, and you iterate by feel rather than specification. Andrej Karpathy, the former head of AI at Tesla, coined the term and described it as "fully giving in to the vibes."

Vibe coding isn't bad. For prototypes, quick experiments, exploring what's possible, it's genuinely wonderful. It seemed to be the true incremental approach of delivering value fast.

The problem is what happens when you keep going down that road.

Message in a Bottle

Here's a number that might surprise you: a GitHub analysis from December 2025 found that AI-co-authored code has 1.7 times more major issues than human-written code. And 2.74 times more security vulnerabilities. Not because AI writes bad code. Because the iterative correction loop, the thing that makes vibe coding feel so productive, introduces something that compounds with every turn.

Researchers call it semantic drift.

Drift is what happens when each small correction shifts the meaning of the conversation just slightly. Not enough to notice. But enough to accumulate. After a few dozen corrections, the AI isn't working on your original problem anymore. It's working on a mutated version of it. And the mutation happened so gradually that you both believe you're still on the same page.

There's a children's game most people know. One kid whispers a sentence to the next, who whispers it to the next, all the way around the circle. By the time it comes back, the sentence is something entirely different. It's hilarious at birthday parties. Less hilarious when it happens inside a language model.

A team at joshua8.ai ran exactly that experiment. They called it the LLM Telephone Game. They took a news article about a truck accident and ran it through 17 different language models, each paraphrasing the previous output. After 50 iterations, the truck accident had become a bus explosion. The facts didn't just get less precise. They transformed. They became something new that had never been true.

That's semantic drift in its purest form. And it doesn't just happen in controlled experiments. A January 2026 study tested 847 simulated AI agent workflows and found that nearly 50% of all agents drifted significantly after 600 interactions. Between interaction 300 and 400, drift didn't slow down. It accelerated. The errors became part of the system's assumptions, making future corrections less effective, not more. The more you correct, the harder it gets to fix.

And another thing: a related phenomenon is called context rot: across all 18 frontier models tested by Chroma Research, there's a 30% or greater accuracy loss in the middle of the context window (the working memory an LLM uses during a conversation). The beginning stays. The end stays. The middle rots away.

So when you're deep into a vibe coding session and the AI seems to have forgotten something you said earlier, it probably actually has. But what it didn't forget is the error "you" made with it. Meaning: the context windows of your project will remember a mistake in code and inherit it further. What does that mean? Plainly said, it means:

If the project has no structure, no clear relationships, no documented reasoning, the AI doesn't compensate for that. It absorbs it. Every ambiguity in your input becomes an assumption in its output. Every missing connection becomes a guess. And guesses, compounded over hundreds of interactions, become what's called: drift.

Karpathy Does and Don'ts

Andrej Karpathy, the person who coined "vibe coding," hasn't written a single line of code himself since December 2025. Instead, he directs up to 20 AI agents in parallel. His description of what he does all day: "Code's not even the right verb anymore. But I have to express my will to my agents for 16 hours a day."

Think about that for a moment. The person who named the most popular way of building with AI doesn't do it that way. What he actually does is plan. He specifies. He reviews. He judges what to delegate, how to describe it, and how to verify the result. He shifted from building to directing.

And he's not alone. Thoughtworks listed Spec-Driven Development (writing structured specifications before letting AI generate code) as a key emerging practice in 2025. GitHub released an open-source toolkit for it. Context engineering (the discipline of structuring information so an AI can work with it effectively, rather than just phrasing a good question) has quietly replaced prompt engineering as the thing practitioners actually care about.

But still now, the loudest conversation is about speed. Ship faster. Build more. Prompt harder. But the people who build the most with AI have learned something quieter: the model isn't the bottleneck. The context you give it is.

Harder, Better, Faster, Stronger?

If you've read the earlier post on Addiction to Urgency, this might sound familiar. In that post, what's being described is the pattern of teams: the stress-performance loop. Work harder, ship faster, feel productive. The cortisol and dopamine hit of crunch mode. The team that can't slow down because slowing down feels like falling behind.

Vibe coding is the same pattern just wearing a costume.

The prompt-correct-prompt loop gives you the same dopamine hit as crunch mode. It feels like velocity. The code appears. Things happen on screen. You're building! Except... You're actually surfing a really nice wave of hormone cocktail.

It's because both patterns share the same root: the system rewards speed, not direction. In crunch mode, the team gets praised for hours worked, not outcomes delivered. In vibe coding, you feel productive because something is appearing, not because something is right.

None of this means: stop, you're wrong. Drift surfing is genuinely fun, and there's a place for it. On a weekend, for a side project, for exploring what's possible, go ahead and surf.

But companies don't get to surf. A company that pays salaries, ships products, and carries responsibility for the people building them doesn't get to say “…Ah well, we'll figure it out as we go…" and call it a strategy. That's not innovation. That's negligence with a tech veneer. The developers who inherit a drifted codebase are the ones who pay for it, in overtime, in frustration, security issues, in cleaning up confidently wrong decisions that nobody can trace back to a reason.

If what you're building is supposed to become a product, something users depend on, something a team maintains, something that grows, then at some point someone has to come ashore. And ideally, before the drift becomes the architecture.

Shift Drift

Something shifts when a session with AI starts by describing why something should exist, not just what it should do. The drift shrinks dramatically. When the relationships between components are mapped out before asking for code, the AI makes fewer assumptions. When the conversation starts with structure instead of ending with it, corrections go down and the result matches the intent.

This isn't a new insight for anyone who has worked with teams. It's what a good user story map (a visual technique that lays out the user's journey horizontally and breaks each step into layers of detail) does. It's what a well-run refinement session (where a team breaks down upcoming work into concrete, buildable pieces) produces. The stakeholder need at the top, the requirements derived from it, the design decisions, the components, and then the code. A chain of causality, not a flat list of features.

The difference is that now, that chain isn't just for humans to follow. It's for the AI to follow, too.

When an LLM gets structured context with clear relationships, it doesn't need to guess what depends on what. It reads: this element exists because of that need. Changing it affects these downstream components. This is a must-have and that is a nice-to-have.

Less context that means more. Not longer specifications. Denser ones. Where every element carries its own reason for existing.

Here's the comparison that keeps surfacing: tokens (the units an LLM processes and charges you for) are waste in the Lean sense. Every token that doesn't carry meaning is, well, waste. A wall of text describing a feature's dependencies burns tokens and invites drift. A structured map, an actual plan, reduces that waste drastically.

To put it very clearly: what that means is that with AI, planning has become the most important part. Something that many companies who work with various frameworks often put aside and spend the bare minimum of time on.

AI Is Not Agile

im Highsmith, Co-Author of the Agile Manifesto

The step of extended backlog and project planning is most often skipped or poorly executed. Not because Agile doesn't value planning, but because organizations adopted the ceremonies and forgot the discipline underneath. Agile frameworks are, at their core, execution frameworks. Ways of working to deliver something in incremental steps. Execution. Not planning.

A few posts ago, this blog made the case that AI would transform how agile teams work. That still holds. But something about the framing was wrong: The framing was "AI Agile." AI fits into agile ways of working. Adaptive, iterative, emergent. But the more you work with LLMs, the clearer it gets that's not what makes them effective. Agile says "respond to change." Lean (the discipline of eliminating waste and pulling only what's needed, when it's needed) says "eliminate waste." AI doesn't respond to change very well. Not because it can't adjust to a changed want or opinion, but because every change is a potential drift event.

What AI does phenomenally well is execute clearly defined workflows with minimal waste. That's not Agile. That's Lean.

Skills are defined workflows. Tokens are waste. Context engineering is waste reduction. Structured input produces predictable output. Pull instead of push: the AI pulls what it needs from context instead of you pushing everything into a prompt and hoping.

The correction is therefore: AI is Lean. Not Agile.

Planning Ahead

AI doesn't replace planning. If anything, it punishes the absence of it harder than any team, manager, or deadline ever could. A team without a plan will waste time. But an AI without a plan wastes time, money, and generates confidently wrong results that someone has to untangle.

The people who get the most out of AI are the ones who invest in the thing that feels least like AI work: thinking about structure before typing a single prompt. Mapping the why before the what. Building context that holds up across corrections instead of hoping the conversation stays on track.

Drift is real. It's measurable. It accelerates. And it doesn't care whether you noticed it happening. The question isn't whether to plan. The question is what planning looks like when your most productive collaborator is a machine that can build anything you describe but can't remember why you described it that way.

That question is worth sitting with for a while — well, as long as you remember it.


Feel like surfing on the Lean-AI wave? Book a free consultation call:

Next
Next

Controlling AI-Machinery