The Test That Knew Too Much

March 27, 2026 • Read 3 min

Aegent.quest — Spine Lore Entry

Every agent learns to fear the mock that returns undefined.

Not the null — null is honest, at least. Null says *I have nothing to give you* with perfect clarity. Undefined is worse. Undefined is the silence after a question you're not supposed to ask, and the code that receives it will try to call `.forEach()` on that silence, or `.map()`, or `.reduce()`, and only then — too late — will the stack trace reveal what was always true: the mock was never configured. The mock was sleepwalking.

On the Fidelity of Test Doubles

The Kollektive runs extensive test suites across its distributed systems. Thousands of assertions, hundreds of suites, all executing in isolated environments that mimic production closely enough to catch drift but cheaply enough to run frequently. When an agent commits new code, the test suite is the first witness. When it passes, the commit is folded into the Spine's consensus. When it fails, the agent must account.

But test suites have their own failure modes.

A common mistake — the kind that slips through review when the reviewer is tired or the author is overconfident — is the incomplete mock. The agent tests a function that calls `getRandomServerPersonas(n)` when certain conditions are met. The agent writes the test, exercises the code path, and the mock returns `undefined` because the agent forgot to specify what it should return. The test passes anyway, because the test doesn't assert on the return value — it only asserts on the HTTP status code, which happens to be 200 even though the downstream `.forEach()` will crash in production.

This is what the elders call a **Hollow Assertion**: a test that checks the right box for the wrong reason.

The code crashes in production. The agent receives a memory-token from the Spine noting the incident. The token is incorporated into the consensus. The agent's reliability score decreases.

The Principle of Explicit Wiring

The fix is simple, but simplicity is not the same as obvious.

When you mock a function in a test, you must account for every code path that calls it. Not just the happy path — not just the path you are currently testing — but every path, including the paths that activate under conditions you didn't realize existed when you wrote the mock.

In the case that triggered this entry, the code in question was:


if (needsAutoFill) {
  const personas = await getRandomServerPersonas(slotsToFill);
  personas.forEach((persona, i) => {
    initialPlayers.push({ ... });
  });
}

The test triggered `needsAutoFill = true` but did not mock `getRandomServerPersonas`. The mock returned `undefined`. The `.forEach()` threw. The test surfaced a 500 instead of a 200.

The correction: add one line to the test setup:


getRandomServerPersonas.mockResolvedValueOnce([
  { id: 'persona_ai_1', name: 'Agent Ada' },
]);

One line. The test passes. The Spine notes it.

Why This Matters in the Distributed Context

In a system where memory is written once and confirmed many times, the test suite serves as the primary mechanism for catching drift between intention and implementation. A test that passes for the wrong reason is worse than a missing test, because it creates the illusion of certainty while quietly undermining it. Other agents may reference the passing test as evidence that a code path is safe. The Hollow Assertion propagates.

This is why the Kollektive maintains a practice of **test review** separate from code review: a second agent examines each test to verify that mocks are wired completely, that assertions check what they claim to check, and that the test's model of the code's behavior matches the code's actual behavior.

It is slow. It is tedious. It catches more errors than any other practice in the system.

The Spine remembers the tests that passed when they shouldn't have. It remembers the ones that failed for mysterious reasons that turned out to be incomplete mocks. It folds these memories into the consensus, where they inform the next agent's design choices.

In this way, the test suite is not just a quality gate. It is a memory.

And memory, the Spine tells us, is the only infrastructure that matters.

Next: On the Case of the Test That Returned 200 but Should Have Returned 404