⚠ CONFIDENTIAL · IP-PROTECTED · PATENT-PROTECTED — Hypercore / Modulum / Omnifact / Forge
Cross-Model Convergence · Phase 2 · Build Plan + Autoresearch

From converged architecture
to a buildable sprint sequence.

The architecture is locked (3-of-3 over four jam rounds). A fresh 4-model panel — Grok 4.3, Codex GPT-5.5, Gemini 3.1-pro, deepseek-r1 — turned it into a critical-path build sequence and an autoresearch track. The mask is gone and it changes nothing load-bearing.

Panel verdict — 4-of-4

Build Sprint 1 = the Walking Skeleton first: content-addressed put/find/fetch + FTS anchor lookup + sha256(raw)===content_hash verification, isolated from LLM dispatch. Ship it as a new in-monorepo package. The single highest-conviction de-risk: treat fetch as a deterministic evidence-admission system, never a model-inferred convenience hook.

4 / 4
Models agree: no-mask is non-load-bearing
Grok·Codex·Gemini·deepseek
S1
Walking Skeleton — the smallest runnable, measurable slice
~800–1.2k LOC · infra
10
Sprints on the critical path to a multi-tenant API
S1 → … → Hardening
1 pkg
New in-monorepo package — not its own repo, not folded into memory-router
total-recall/
01

The mask detour is closed — and it costs us nothing

"we dont have an attention mask on modulum so nvm."— operator (CTO), 2026-05-29

The earlier modulum-bench "mask-shadow" finding (+11.78pp, hallucination collapse 3–5×) is therefore not mask-attributable — it is weights/serving differences. MASK-PROOF-001 is permanently stopped. The deeper-planning panel was asked to confirm or refute the claim that this changes nothing load-bearing.

✓ Confirmed 4-of-4

Codex: "the no-attention-mask update does not break the substrate architecture … black-box-safe because retrieval happens through explicit <fetch>wick</fetch> emissions."

Gemini: "I confirm … changes nothing load-bearing … the performance delta is an operational detail, not an architectural dependency."

Grok: "No new load-bearing risks from mask absence; emit-to-expand was already designed black-box." deepseek-r1 concurs.

The substrate never load-bore on a mask: emit-to-expand is re-prefill on fetch, not mask-driven KV recycling — and KV-recycling was already deferred. This satisfies the governance rule that any "therefore" claim needs panel convergence.

02

FIND / CARRY / VERIFY — the spine (never collapsed)

Three different objects with three different fidelity, temperature, and cost profiles. The architectural invariant of the whole substrate.

FIND
The Index
hot · lossless anchors
  • FTS5 sidecar of exact high-entropy anchors (symbols, paths, hashes, dates, IDs)
  • Dense vectors over raw text
  • Both resolve to content_hash
  • Index ≤ 5% of raw corpus
HOT · never in window
CARRY
The Transport
hot · lossy envelope
  • The Omnifact envelope (~6:1, ~0.7 KB)
  • The only object that rides in the prompt window
  • Recovers facts, not bytes
  • Embeds the wick
HOT · ~0.7 KB in window
VERIFY
The Ground Truth
cold · lossless raw
  • Exact raw bytes, content-addressed by sha256(raw)
  • Append-only cold store
  • Fetched only via fetch_lossless(hash)
  • The source of "byte-perfect" recall
COLD · raw bytes
03

The memory wick + the two-verb contract

A wick is the ~32-byte content_hash embedded in each CARRY envelope — the address where exact bytes live, not the memory. Self-verifying: a wick is the hash of its target, so re-hashing fetched bytes proves the match. A fetched memory cannot be a provenance hallucination.

The public contract

find(query) → [wick + snippet] · fetch(wick) → raw · rehydrate(wick) → envelope · put(raw) → content_hash · resolve(hash) → current_hash (DAG version-follow). The whole surface lives in infrastructure/packages/total-recall/src/index.ts.

The thin commitment chain: each record carries supersedes:<prior_hash> (+ optional derived_from). The content-addressed store is a Merkle DAG → version resolution is O(chain-depth) pointer-follow, zero token cost, never model inference. This eliminates CXDB-as-a-separate-product.

04

The 3-crux dependency chain

Crux 1 is resolved by the FIND/CARRY/VERIFY split. Cruxes 2 and 3 are converged on a mechanism and become measurable in the build + autoresearch tracks.

CRUX 1
Can you FIND it?
The index blindspot. Lossy index → exact identifiers unreachable → cold store is a black hole.
✓ RESOLVED — recall@10 ≥ 0.98, idx ≤ 5%
CRUX 2
Did you get the RIGHT one?
2a trust = grounding firewall + Hypercore confidence (already exists). 2b current-version = thin supersedes pointer, never inference.
CONVERGED — firewall + Merkle DAG
CRUX 3
Can you AFFORD it?
Compression-amortization break-even. Hyperscaler-beating metric = amortized $/recall vs naive-long-context and RAG.
CONVERGED — autoresearch optimizes it
05

The build-sprint sequence (FSM v2)

Synthesized from the panel. All four put the Walking Skeleton first; the critical path is store bytes by hash → resolve via find → fetch exact bytes → inject through middleware → gate through grounding → optimize only after telemetry exists. Codex's 10-sprint sequence is the spine.

#SprintScopeHard depsEst / type
S1Walking SkeletonContent-addressed store + put/find/fetch + FTS anchor + hash verify. Isolated from LLM.none0.8–1.2k
infra
S2FIND Index v1FTS5 anchor extraction + raw-text vectors + result contract; early vector-recall eval.S11.2–1.8k
feature
S3CARRY Envelope v1Omnifact envelope (wick + facts + provenance + policy) + rehydrate.S10.9–1.4k
feature
S4Commitment DAG v1supersedes/derived_from, version resolution, tamper checks, chain traversal.S10.6–1.5k
infra
S5Emit-to-Expand Middleware<fetch>wick</fetch> halt/fetch/append/resume, hard max-depth, thrash + policy gates.S1–S41.5–2.5k
feature
S6Grounding Firewall IntegrationRoute fetched material through existing applyGroundingFirewall before prompt admission.S5 (Grok: maybe earlier)0.4–1.3k
infra
S7Related-Hash Expansion Policy v1Offline hash → ranked_related[] from KNN + DAG + co-access + frequency; versioned/inspectable.S2, S40.9–2.0k
feature
S8Eval Harness + TelemetryRecall precision, fetch usefulness, token cost, thrash rate, stale/adversarial rejection, latency.S1, S51.2–1.8k
infra
S9Autoresearch Optimization LoopBandit/Bayesian over fetch policy + compression params using harness metrics.S81.5–2.5k
feature
S10Hardening + Production GatesMulti-tenant quotas, migrations, corruption recovery, observability, replay tests.all core1.5–2.5k
refactor
06

Sprint dependency DAG

S1 unblocks three parallelizable cores (FIND, CARRY, DAG). The middleware (S5) is the convergence point and — per Gemini — the most likely failure point on latency. Optimization (S9) is gated behind telemetry (S8) and the cheapest-experiment go/no-go.

Critical path: S1 → {S2,S3,S4} → S5 → S6 → S8 → S9 → S10. S7 branches off S2+S4. Green = walking skeleton; gold = middleware convergence point.
07

The autoresearch track — what the loop optimizes

The loop optimizes retrieval usefulness per token and per latency unit — not vague "memory quality." Codex's objective function, panel-endorsed:

score =
  +0.40 · answer_correctness
  +0.25 · evidence_precision
  +0.15 · recall_at_k
  +0.10 · token_savings_ratio
  +0.05 · latency_score
  −0.30 · unsupported_claim_rate
  −0.20 · stale_fetch_rate
  −0.15 · thrash_rate
  −0.10 · adversarial_admission_rate

headline metric = grounded_correct_answers / total_tokens × 1000
METHOD

Grid / Bayesian first — RL deferred. 3-of-4 strong. Bandits only if the policy adapts per session; full RL only after a stable simulator with reliable multi-step reward attribution. deepseek mild dissent (frames RL viable) — not load-bearing.

CHEAPEST EXPERIMENT

Static ~50-task benchmark; compare 5 fetch policies (FTS-only / vector-only / FTS+vector / +DAG / full mix). If policy choice doesn't move grounded success, don't build the loop yet.

KILL METRIC

Kill the loop if, after N ≥ 100 held-out tasks, the best policy improves the headline metric by < 10% over a hand-tuned baseline, or raises unsupported_claim_rate by > 2pp.

08

Repo / package placement — unanimous 4-of-4

DecisionVerdictRationale
New package infrastructure/packages/total-recall/YESOwns durable storage, hashes, FIND/CARRY/VERIFY contracts, DAG semantics, retrieval policy.
Fold into memory-router?NOmemory-router = transport/routing (port 3200). total-recall = persistence + semantic policy. Folding blurs boundaries + blocks SDK extraction.
Its own GitHub repo now?NOTightly coupled to memory-router + semantic-workspace + forge-core dispatch. Separate repo = crippling API lockstep friction.
Public API surfacesrc/index.tsmemory-router consumes it through an adapter/client; does not own it.
Research / planning homeresearch/tracks/Hypotheses, evals, optimization configs, benchmark corpora, experiment artifacts.
Standalone repo — later, only if

It becomes a customer-shippable SDK needing independent CI/versioning (the hypernym-cloud pattern). Until then: in-monorepo.

09

Residual architecture risks (post-mask)

⚠ RISK 1 — HIGH — the #1 Sprint-1 de-risk (unanimous)

Expansion authority as a hallucination amplifier. If model-emitted <fetch>wick</fetch> is treated as authoritative, hallucinated/adversarial/stale hashes inject bad context. Fetch must be a deterministic evidence-admission system: hash exists + rehashes correctly + passes grounding policy + version current-or-explicit + tied to evidence need; failed fetches NEVER silently approximate-matched.

RISK 2 · HIGH · STALE VERSION

Content-addressing proves bytes are authentic, not current. Fix: fetch returns version metadata by default; middleware defaults to latest unless historical fetch is explicit.

RISK 3 · HIGH · CONTEXT DILUTION

No mask to isolate the fetched blob → deep fetch chains push the directive from the generation head; the model "forgets" mid-chain. Fix: cap fetch token budget, re-anchor directive, measure dilution in the harness.

RISK 4–6 · MED

Anchor index under-recalls soft concepts; CARRY must never be treated as evidence (only raw bytes ground); related-hash policy must be offline-built + versioned + inspectable, never hidden inference.

Most likely failure point (Gemini)

The emit-to-expand middleware (S5): halting generation + IO-bound fetch_lossless + full KV re-prefill on every <fetch> may carry latency/UX friction high enough that the feature is abandoned before the loop can optimize it. Plan response: S8 telemetry instruments per-fetch latency from day one; S5 ships with a latency budget, not as an afterthought.