From converged architecture
to a buildable sprint sequence.
The architecture is locked (3-of-3 over four jam rounds). A fresh 4-model panel — Grok 4.3, Codex GPT-5.5, Gemini 3.1-pro, deepseek-r1 — turned it into a critical-path build sequence and an autoresearch track. The mask is gone and it changes nothing load-bearing.
Build Sprint 1 = the Walking Skeleton first: content-addressed put/find/fetch
+ FTS anchor lookup + sha256(raw)===content_hash verification, isolated from LLM dispatch. Ship it as a new
in-monorepo package. The single highest-conviction de-risk: treat fetch as a deterministic evidence-admission system, never a model-inferred convenience hook.
The mask detour is closed — and it costs us nothing
The earlier modulum-bench "mask-shadow" finding (+11.78pp, hallucination collapse 3–5×) is therefore not mask-attributable — it is weights/serving differences. MASK-PROOF-001 is permanently stopped. The deeper-planning panel was asked to confirm or refute the claim that this changes nothing load-bearing.
Codex: "the no-attention-mask update does not break the substrate architecture … black-box-safe because retrieval happens through explicit <fetch>wick</fetch> emissions."
Gemini: "I confirm … changes nothing load-bearing … the performance delta is an operational detail, not an architectural dependency."
Grok: "No new load-bearing risks from mask absence; emit-to-expand was already designed black-box." deepseek-r1 concurs.
The substrate never load-bore on a mask: emit-to-expand is re-prefill on fetch, not mask-driven KV recycling — and KV-recycling was already deferred. This satisfies the governance rule that any "therefore" claim needs panel convergence.
FIND / CARRY / VERIFY — the spine (never collapsed)
Three different objects with three different fidelity, temperature, and cost profiles. The architectural invariant of the whole substrate.
- FTS5 sidecar of exact high-entropy anchors (symbols, paths, hashes, dates, IDs)
- Dense vectors over raw text
- Both resolve to
content_hash - Index ≤ 5% of raw corpus
- The Omnifact envelope (~6:1, ~0.7 KB)
- The only object that rides in the prompt window
- Recovers facts, not bytes
- Embeds the wick
- Exact raw bytes, content-addressed by
sha256(raw) - Append-only cold store
- Fetched only via
fetch_lossless(hash) - The source of "byte-perfect" recall
The memory wick + the two-verb contract
A wick is the ~32-byte content_hash embedded in each CARRY envelope — the address where exact bytes live, not the memory. Self-verifying: a wick is the hash of its target, so re-hashing fetched bytes proves the match. A fetched memory cannot be a provenance hallucination.
find(query) → [wick + snippet] · fetch(wick) → raw · rehydrate(wick) → envelope · put(raw) → content_hash · resolve(hash) → current_hash (DAG version-follow). The whole surface lives in infrastructure/packages/total-recall/src/index.ts.
The thin commitment chain: each record carries supersedes:<prior_hash> (+ optional derived_from). The content-addressed store is a Merkle DAG → version resolution is O(chain-depth) pointer-follow, zero token cost, never model inference. This eliminates CXDB-as-a-separate-product.
The 3-crux dependency chain
Crux 1 is resolved by the FIND/CARRY/VERIFY split. Cruxes 2 and 3 are converged on a mechanism and become measurable in the build + autoresearch tracks.
The build-sprint sequence (FSM v2)
Synthesized from the panel. All four put the Walking Skeleton first; the critical path is store bytes by hash → resolve via find → fetch exact bytes → inject through middleware → gate through grounding → optimize only after telemetry exists. Codex's 10-sprint sequence is the spine.
| # | Sprint | Scope | Hard deps | Est / type |
|---|---|---|---|---|
| S1 | Walking Skeleton | Content-addressed store + put/find/fetch + FTS anchor + hash verify. Isolated from LLM. | none | 0.8–1.2k infra |
| S2 | FIND Index v1 | FTS5 anchor extraction + raw-text vectors + result contract; early vector-recall eval. | S1 | 1.2–1.8k feature |
| S3 | CARRY Envelope v1 | Omnifact envelope (wick + facts + provenance + policy) + rehydrate. | S1 | 0.9–1.4k feature |
| S4 | Commitment DAG v1 | supersedes/derived_from, version resolution, tamper checks, chain traversal. | S1 | 0.6–1.5k infra |
| S5 | Emit-to-Expand Middleware | <fetch>wick</fetch> halt/fetch/append/resume, hard max-depth, thrash + policy gates. | S1–S4 | 1.5–2.5k feature |
| S6 | Grounding Firewall Integration | Route fetched material through existing applyGroundingFirewall before prompt admission. | S5 (Grok: maybe earlier) | 0.4–1.3k infra |
| S7 | Related-Hash Expansion Policy v1 | Offline hash → ranked_related[] from KNN + DAG + co-access + frequency; versioned/inspectable. | S2, S4 | 0.9–2.0k feature |
| S8 | Eval Harness + Telemetry | Recall precision, fetch usefulness, token cost, thrash rate, stale/adversarial rejection, latency. | S1, S5 | 1.2–1.8k infra |
| S9 | Autoresearch Optimization Loop | Bandit/Bayesian over fetch policy + compression params using harness metrics. | S8 | 1.5–2.5k feature |
| S10 | Hardening + Production Gates | Multi-tenant quotas, migrations, corruption recovery, observability, replay tests. | all core | 1.5–2.5k refactor |
Sprint dependency DAG
S1 unblocks three parallelizable cores (FIND, CARRY, DAG). The middleware (S5) is the convergence point and — per Gemini — the most likely failure point on latency. Optimization (S9) is gated behind telemetry (S8) and the cheapest-experiment go/no-go.
The autoresearch track — what the loop optimizes
The loop optimizes retrieval usefulness per token and per latency unit — not vague "memory quality." Codex's objective function, panel-endorsed:
+0.40 · answer_correctness
+0.25 · evidence_precision
+0.15 · recall_at_k
+0.10 · token_savings_ratio
+0.05 · latency_score
−0.30 · unsupported_claim_rate
−0.20 · stale_fetch_rate
−0.15 · thrash_rate
−0.10 · adversarial_admission_rate
headline metric = grounded_correct_answers / total_tokens × 1000
Grid / Bayesian first — RL deferred. 3-of-4 strong. Bandits only if the policy adapts per session; full RL only after a stable simulator with reliable multi-step reward attribution. deepseek mild dissent (frames RL viable) — not load-bearing.
Static ~50-task benchmark; compare 5 fetch policies (FTS-only / vector-only / FTS+vector / +DAG / full mix). If policy choice doesn't move grounded success, don't build the loop yet.
Kill the loop if, after N ≥ 100 held-out tasks, the best policy improves the headline metric by < 10% over a hand-tuned baseline, or raises unsupported_claim_rate by > 2pp.
Repo / package placement — unanimous 4-of-4
| Decision | Verdict | Rationale |
|---|---|---|
New package infrastructure/packages/total-recall/ | YES | Owns durable storage, hashes, FIND/CARRY/VERIFY contracts, DAG semantics, retrieval policy. |
| Fold into memory-router? | NO | memory-router = transport/routing (port 3200). total-recall = persistence + semantic policy. Folding blurs boundaries + blocks SDK extraction. |
| Its own GitHub repo now? | NO | Tightly coupled to memory-router + semantic-workspace + forge-core dispatch. Separate repo = crippling API lockstep friction. |
| Public API surface | src/index.ts | memory-router consumes it through an adapter/client; does not own it. |
| Research / planning home | research/tracks/ | Hypotheses, evals, optimization configs, benchmark corpora, experiment artifacts. |
It becomes a customer-shippable SDK needing independent CI/versioning (the hypernym-cloud pattern). Until then: in-monorepo.
Residual architecture risks (post-mask)
Expansion authority as a hallucination amplifier. If model-emitted <fetch>wick</fetch> is treated as authoritative, hallucinated/adversarial/stale hashes inject bad context. Fetch must be a deterministic evidence-admission system: hash exists + rehashes correctly + passes grounding policy + version current-or-explicit + tied to evidence need; failed fetches NEVER silently approximate-matched.
Content-addressing proves bytes are authentic, not current. Fix: fetch returns version metadata by default; middleware defaults to latest unless historical fetch is explicit.
No mask to isolate the fetched blob → deep fetch chains push the directive from the generation head; the model "forgets" mid-chain. Fix: cap fetch token budget, re-anchor directive, measure dilution in the harness.
Anchor index under-recalls soft concepts; CARRY must never be treated as evidence (only raw bytes ground); related-hash policy must be offline-built + versioned + inspectable, never hidden inference.
The emit-to-expand middleware (S5): halting generation + IO-bound fetch_lossless + full KV re-prefill on every <fetch> may carry latency/UX friction high enough that the feature is abandoned before the loop can optimize it. Plan response: S8 telemetry instruments per-fetch latency from day one; S5 ships with a latency budget, not as an afterthought.