The Missing Axis: A Topological Sketch of Adaptive Rule Deviation in AI Systems

Why real robustness cannot be reached by scaling rules or inputs — and what the torus, the zoom, and the orthogonal axis tell us about how to build it

Most AI systems built today rest on two architectural axes: structured rules and information permeability. Both are necessary. Neither is sufficient. What follows is a sketch of a third axis — Adaptive Deviation Capacity — using three analogies from topology, cartography, and dimensional geometry. The math is not decorative. It tells us why this axis cannot be reached by scaling the other two.

1. The two-axis baseline

A well-designed AI system has two axes most practitioners can agree on.

Structure Layer. Rules, policies, guardrails, decision trees, preference models. Where "what should the system do by default" lives. Produces consistency and scales decision-making.

Permeability Layer. The input/output boundary. Which signals the system accepts, which outputs it produces, how it updates on new data. Where the system stays alive in a changing world.

Between these two you can build something that runs reliably, scales, and passes standard audits.

You cannot build something that is robust.

The distinction matters. A system that is reliable is consistent in the average case. A system that is robust performs under adversarial or novel conditions without degrading in ways that are unobservable. Rules will eventually fit the world badly. Inputs will eventually be mis-labeled. When both happen simultaneously, a two-axis system has no way to detect that it is now doing the wrong thing — it can only verify that it is doing the thing it was told to do.

For non-specialists: This is why AI systems often fail in ways that look fine from the inside. The system checks its own rules, finds them correctly applied, and reports green status — while producing outputs that are contextually wrong. The failure mode is invisible to the system, by construction.

2. Adaptive Deviation Capacity — the third axis

Jan Heinemeyr's argument is that real system intelligence requires a third layer: the ability to deliberately deviate from its own rules when context requires, without losing integrity. He calls this Adaptive Deviation Capacity (ADC).

The operational question shifts from

Which rule applies?

Is applying this rule, in this specific context, sensible?

This sounds like a soft addition — more heuristics, more judgment calls — but the claim is stronger. ADC is not an extension of the rule set, and not an extension of the input stream. It is orthogonal to both.

The next three sections are three analogies for why it must be orthogonal. Each makes a different piece of the claim rigorous.

3. Analogy I — The torus

Picture a standard torus: a ring, a bicycle tire. It has two surfaces: the outside (the bulge you can see) and the inside (the tube surface facing the hole).

Topologically, they are the same surface. Geometrically, they are not.

The outside has positive Gaussian curvature — it curves away from you in every direction, like the surface of a sphere.

The inside has negative Gaussian curvature — it curves one way along the tube and the opposite way around the hole. A saddle.

Parametrising the torus with major radius R and minor radius r, the Gaussian curvature at each point is:

K(u, v) = cos(v) / [ r · (R + r·cos(v)) ]

cos(v) > 0 → K > 0 → exterior (rule execution)
cos(v) < 0 → K < 0 → interior (contextual evaluation)
cos(v) = 0 → K = 0 → membrane (curvature sign change)

The sign changes continuously — passing through zero at the limit circles — not as a jump.

Same object. Different metric.

Torus showing positive curvature on the exterior (rule execution) and negative curvature on the interior (contextual evaluation)

Gaussian curvature sign as a diagnostic for which mode the system is operating in. Outside: K > 0, rule execution. Inside: K < 0, contextual evaluation. Membrane at K = 0.

Mapping to ADC:

Outside (positive curvature) → Rule Execution. Every decision propagates away from the center, applying abstractions outward toward cases. Convex, stable, scalable.
Inside (negative curvature) → Contextual Evaluation. The system sees itself from within. Every decision spirals around the hole — checking the rule against current context, against the state of the system's own reasoning.

The two are dual faces of the same topology. What Heinemeyr calls membrane reversal — the switch from Rule Execution to Contextual Evaluation — is a sign change in curvature. Graceful because it is continuous (curvature passes smoothly through zero at the limit circles). Fundamental because the sign itself flips.

For non-specialists: When an AI system switches from "just follow the rule" to "evaluate whether the rule fits here," it isn't adding more rules — it's moving to the inside of the same object. The shape hasn't changed. But the geometry of how decisions propagate has reversed.

Implementation consequence: a system running entirely on the outside surface can never discover the inside by accumulating more rule-execution behaviour. You have to cross the membrane deliberately. That crossing is an event, not a gradient.

4. Analogy II — The Google Maps zoom

You open Google Maps. You search an unfamiliar address. For a moment the view hangs — tiles render, a third dimension of detail loads, the map un-flattens from country-level into the street-level rendering where your query actually lives.

The wait is short. But it is structural: a dimension that wasn't loaded had to be fetched before the question could be answered at all.

Three zoom layers: always-loaded rule set, on-demand meta-policy dimension, and contextual decision output

The meta-policy layer loads on demand. The latency of that load is an observable system property — and a diagnostic for whether axis 3 exists.

Mapping to ADC:

Rule execution is the always-loaded base view. Cached, fast, cheap, correct-on-average.

Contextual evaluation requires loading a dimension that isn't in the baseline. Specifically: the meta-policies that decide when the rule shouldn't apply. These are not stored as additional rules — they are stored as scoring functions over context, confidence estimates, risk profiles. They are about the rules, at a level the rules themselves cannot see.

A system without ADC is a map that can't zoom. Every location looks the same. Every query is answered at identical resolution. It doesn't lag — because it never loads the missing dimension.

What this analogy operationalises that the abstract ADC framing doesn't: the loading time itself becomes a measurable system property.

# Sketch — not production code
import time

class ADCDecisionEngine:
    def decide(self, ctx, rule):
        t0 = time.monotonic()
        base_output = rule.execute(ctx)

        # Meta-policy: does this context match a deviation trigger?
        meta = self.meta_policy_cache.get(rule.id)
        if meta is None:
            meta = self.load_meta_policy(rule.id)   # <-- the zoom event
            self.telemetry.record_meta_load(rule.id, time.monotonic() - t0)

        context_score   = meta.score(ctx)
        confidence      = meta.confidence(rule, ctx)
        deviation_risk  = meta.deviation_risk(ctx)

        if context_score < meta.threshold and confidence > meta.min_conf:
            return self.deviate(base_output, ctx, meta, deviation_risk)
        return base_output

The meta_policy_cache miss is the zoom event. The t_load telemetry gives you a diagnostic property a pure rule-execution system cannot emit: how long did it take to load the dimension in which the rule's applicability could be questioned?

For non-specialists: This is why this analogy matters in practice. You can measure whether your AI system has an "inside" — just look at whether it ever pauses to evaluate context rather than executing instantly. A system that is always instant is almost certainly operating on the outside surface only.

5. Analogy III — The orthogonal axis

The hardest claim. ADC cannot be reached by scaling the existing axes.

A line has one dimension. Extend it: still a line. To get a plane, you add a new axis at 90° — perpendicular to the line.

A plane has two. Extend it: larger plane. To get a cube, you add another axis at 90° to both existing ones.

A cube has three. Adding a fourth axis is not a bigger cube — it's a tesseract. A different structure, requiring a perpendicular direction that cannot be constructed from the first three.

Every dimension jump is orthogonal to all previous dimensions. This is not metaphor. It is what "dimension" means.

Three orthogonal architectural axes: Structure (axis 1), Permeability (axis 2), and Adaptive Deviation Capacity (axis 3, gold)

The classical two-axis design (shaded plane) builds reliable systems. The gold axis — ADC — cannot be reached by extending either of the first two. It requires a new architectural dimension.

Mapping to ADC:

Structure Layer is axis 1: "how many rules do we have, how well do they cover cases."
Permeability Layer is axis 2, at 90°: "how well does the system ingest new data, how accurate is its input model."
Adaptive Deviation Layer is axis 3, at 90° to both.

You cannot reach axis 3 by:

adding more rules (extends axis 1),
adding more input streams (extends axis 2),
combining rules and inputs in new ways (moves diagonally in the axis 1–2 plane).

You have to introduce a fundamentally new dimension: meta-reasoning about rule applicability in context. Nothing in the first two axes, however extended, produces it.

For non-specialists: This is the most important reason to care. Teams trying to fix AI robustness problems usually reach for more rules or more data. That work extends existing axes. It does not produce the missing one. Real robustness requires building something architecturally new — not more of what you already have.

There is a second-order consequence the abstract framing tends to understate: ADC is not a property of a single agent. An agent operating at rule-level D is structurally blind to the meta-policy level D+1 — the same way a cube cannot see itself as a 3D object from inside a 4D space. ADC must therefore be implemented as a coordination protocol between agents at different levels, not as a feature of one "smarter" agent.

Multi-agent coordination: RuleAgent at dimension D, MetaAgent at D+1, Orchestrator as bridge

ADC is a protocol between levels, not a feature of one smarter agent. The Orchestrator is a structural requirement — without it, the two agents crash into each other.

# Multi-agent coordination — structural requirement, not an optimisation

class RuleAgent:
    """Operates on dimension D — executes rules."""
    def __init__(self, rules):
        self.rules = rules

    def handle(self, ctx):
        rule = self.select_rule(ctx)
        result = rule.execute(ctx)
        return result, {"rule_id": rule.id, "ctx": ctx}


class MetaAgent:
    """Operates on dimension D+1 — evaluates rule applicability."""
    def __init__(self, meta_policies):
        self.policies = meta_policies

    def evaluate(self, rule_id, ctx, proposed_result):
        policy = self.policies[rule_id]
        if policy.context_score(ctx) < policy.threshold:
            return policy.override(ctx, proposed_result)   # conscious deviation
        return proposed_result                             # passes through


def decide(ctx, rule_agent, meta_agent):
    draft, meta = rule_agent.handle(ctx)
    return meta_agent.evaluate(meta["rule_id"], ctx, draft)

Without the decide orchestrator explicitly mediating between the two agent levels, the protocols "crash into each other" — the rule-agent assumes final authority, the meta-agent has no input pipe. This is the D → D+1 coordination gap made concrete.

6. What all three analogies share

Each analogy makes a different piece of the structural claim rigorous:

Torus (inside/outside) — formalises that ADC is a dual of rule execution, not an extension of it. Enables curvature sign as a runtime diagnostic for which mode the system is currently in.

Maps zoom — formalises that ADC requires loading a dimension not present in the baseline rule set. Enables meta-policy load latency as a measurable, observable system property.

Orthogonal axis — formalises that ADC cannot be reached by scaling either existing axis. Forces architectural decisions rather than further optimisation work on existing axes.

They are not redundant. A practitioner designing an ADC-capable system needs all three: the torus tells you what ADC is structurally, the zoom tells you how to detect it in running systems, the orthogonal axis tells you why you cannot get it for free.

7. Decision flow and noise vs. signal

Decision flow with Adaptive Deviation Capacity: select rule, load meta-policy, score context, branch execute or deviate, emit telemetry

The canonical ADC decision path. Both branches (4a: execute, 4b: deviate) log. The absence of branch 4b in production logs over 90 days is the operational signature of a two-axis system.

A useful distinction in the original framing: uncontrolled deviation (noise) vs. deliberate deviation (signal). The topological frame sharpens it:

Noise = random walk on a single surface. The system drifts. It may end up somewhere unusual, but it did not move between surfaces.
Signal = controlled passage between the inside and outside of the topology. A membrane crossing — deliberate, reversible, with a curvature sign change but no structural break.

Timeline showing noise spike (output deviation without meta-policy activity) vs signal deviation (output deviation aligned with meta-policy override)

Correlation of output deviation with meta-policy activity is the structural test. Noise: deviation without meta activity. Signal: deviation tightly coupled to a meta-policy override event.

This is not just quantitative (more or less deviation). It is structural: which surface is the system currently operating on? A noise event stays on one side. A signal event crosses the membrane and returns.

In practice this changes what you instrument for. The telemetry you need is not "how unusual is this output?" — it is "was the meta-layer active when the output deviated?" A deviation uncorrelated with meta-policy evaluation is noise. A deviation tightly coupled to meta-policy evaluation is signal. Those are operationally different events and should trigger different monitoring.

# Discriminating noise from signal at runtime

def classify_deviation(output_event, meta_event_log, window_ms=50):
    """
    Signal: deviation aligns temporally with a meta-policy evaluation.
    Noise:  deviation does not.
    """
    meta_active = any(
        abs(output_event.t - m.t) < window_ms and m.fired_override
        for m in meta_event_log
    )
    return "signal" if meta_active else "noise"

8. Closing — from functional to understanding

A system that works operates on a single surface of its own topology. A system that understands can cross the membrane — deliberately, reversibly, with a sign change in curvature but no structural break.

This isn't philosophy. It's architectural guidance.

If your AI system cannot be observed to cross its own membrane — if you cannot point to moments where meta-policy evaluation actively overrode default rule execution — it is running on the outside of the torus only. The inside exists. It has just never been reached.

The missing axis is not a feature request. It is a dimension that has to be built.

A practical starting question if you are retrofitting ADC into an existing system: can you identify a single production decision in the past 90 days where your system consciously deviated from its default policy because context required it — and log the reasoning? If not, you are on a one-dimensional map. The zoom has never loaded.

Core framing of Adaptive Deviation Capacity (ADC) is Jan Heinemeyr's. The three analogies — torus, zoom, orthogonal axis — are extensions developed in dialogue with the Universal Crystal pattern corpus. A full framework specification including formal definitions, reference architecture, diagnostic metrics, and open questions is available on request.