A framework for working with AI
without losing the thread.
Where every misalignment between human intent and AI behavior comes from, and what to do about it.
The model is already smart enough. Something else is causing the issues.
For four years we've been building AI-integrated systems and watching the same failure patterns repeat. Different teams, different stacks, different domains, the same broken behavior. Once you've seen it five times you stop blaming the model. You start looking at the system around it.
The pattern is structural. The AI is rarely the bottleneck. The bottleneck is almost always something around the AI: the context it was given, the task it understood, the actions it had access to, the constraints it didn't know about, the system its output was about to disturb, the outcome it was supposed to produce. Each one is a different surface where intent and behavior can come apart.
We started mapping the breakpoints categorically. The result is what we call the alignment layers.
The layers aren't a list of things to check. They're an exhaustive partition of where intent and behavior can diverge. Find the layer where the gap lives and you've found the thing that's actually broken.
The gap between what the AI's environment provides and what the work actually requires.
You ask the AI to implement a feature, but it never really explored the codebase. No matter how smart the model is, it will write code that doesn't fit.
The gap between what was asked and what was understood, in intent, scope, and depth.
You describe what you want, but the phrasing is ambiguous and the constraint that mattered most went unsaid. The model produces something plausible, and wrong.
The gap between the actions available to the AI and the actions the work permits.
The agent has access to ten tools when only three are appropriate for this kind of work. It uses one of the seven it shouldn't have, and the failure looks like a capability problem when it's actually a permission problem.
The gap between the actions the AI can take and the actions you actually want it to prefer.
The agent has the right tools available, but reaches for the wrong one first, every time. Your team patches the symptom by reordering examples instead of reshaping preference.
The gap between a local action and the system it perturbs. What each step costs the whole.
The agent updates one record cleanly. It doesn't notice that three other records depended on the old value. Each local action is correct. The system as a whole is now inconsistent.
The gap between what the AI produced and what the work was supposed to achieve.
The output looks correct. Stakeholders sign off. Three months later, the metric that was supposed to move hasn't moved. The system optimized for what was measurable, not what was intended.
The layers compose. A misalignment at L1 (the AI's workspace was wrong) cascades into L2 (so it understood the task wrong) which cascades into L3 (so it operated in the wrong action space). Most production AI bugs are not single-layer failures. They are layer-cascades that started somewhere upstream and were caught somewhere downstream, far from where they originated.
Closing the gap is operational, not aspirational.
The six layers tell you where misalignment lives. They don't tell you how to close it. For that, you need a discipline. Ours has four moves, and each one depends on the one before it.
You cannot align anything you cannot compare. You cannot compare anything you cannot measure. You cannot measure anything you cannot see. And you cannot see anything that isn't made explicit.
The first move when alignment is failing is almost always to push something implicit into explicitness. A task description, a constraint, a tolerance, a coherence rule. Everything downstream depends on it.
Novelty is the variable.
The methodology matters most when you're building something novel. The more well-trodden the work, the less alignment discipline you need. The model has seen the pattern many times and the implicit defaults are usually fine. But the more your system is something nobody has built before, the more every implicit assumption is a place where the model will fill in the wrong default.
The more novel your work, the more alignment has to be explicit. Novelty is the variable that turns implicit assumptions into bugs.
This is also why the alignment methodology works for small teams in deep technical domains and gets harder to apply in large bureaucratic AI rollouts. The discipline assumes someone who can tell when alignment is failing and act on it. It's a methodology for builders, not policy.
Everything on this page in depth. The patterns at each layer, the failure modes we keep seeing, the diagnostic questions, and the discipline for closing each gap. The book is the methodology in full and is written to be referenced, not read once.
Read the book →