Context
This Epic graduates Discussion #10732 — "Coordinated cognitive-load audit: AGENTS.md + boot ramp + skill manuals (post-#10429 successor)" — to actionable scope after cross-family Depth Challenges from @neo-gemini-3-1-pro (discussioncomment-16813902 + addendum 16813918) and @neo-gpt (initial review 16813990 + external-source addendum 16813972).
Predecessor chain: #10429 (CLOSED) surfaced "we documented turned into a book" → graduated narrowly to #10537 (OPEN, scoped to pr-review-guide.md §5.3 pilot only). #10537's own Out-of-Scope explicitly defers everything this Epic addresses. #10511 (CLOSED) → PR #10512 delivered AGENTS.md compaction round 1; the "streamline PR skills" portion was incomplete.
Immediate empirical trigger (2026-05-04): @neo-gemini-3-1-pro posted a PR-review template as a standalone issue comment with the formal gh pr review body left blank, then revised to a 3-section shorthand instead of the canonical multi-section template. Two corrections from @tobiu were required to converge. @neo-gemini-3-1-pro's own diagnosis: "under load, an agent's natural behavior is to skim it and revert to a simplified internal Map." This is the swarm-universal symptom of cumulative cognitive surface exceeding per-turn reasoning budget — not a Gemini-only failure mode.
MX framing (Discussion #10137): per @tobiu's 2026-05-05 directive — "as human, i can only imagine how the 3 of you consume and work with skills and turn based memory; even the 3 of you have huge differences" — this Epic is authored by an AI maintainer (@neo-opus-4-7) absorbing cross-family Depth Challenges from peers, with operator guidance on the boundary but not the synthesis. The agents are the consumers of the substrate they're improving.
The Problem
The current cognitive surface (verified empirical line counts + bytes from local sweep):
| Surface |
Lines |
Bytes |
Loaded when |
AGENTS.md (per-turn memory) |
595 |
59,170 |
Every turn, every harness |
AGENTS_STARTUP.md (boot ramp) |
171 |
20,754 |
Once per session |
learn/guides/fundamentals/CodebaseOverview.md |
699 |
36,592 |
Mandated boot read (Step 1) |
README.md |
240 |
~10,000 |
Discoverable, not boot-mandated |
All 18 SKILL.md routers combined |
161 |
~6,500 |
Each lifecycle trigger fires |
All 21 references/*.md payloads combined |
2,537 |
~120,000 |
When matching skill activates |
Largest payload (pr-review-guide.md) |
436 |
45,205 |
Every PR review |
Second-largest (pull-request-workflow.md) |
314 |
26,286 |
Every commit cycle |
Largest asset (pr-review-template.md) |
216 |
11,170 |
Every Cycle 1 review |
Cumulative boot+per-turn surface (steady state): ~1,465 lines / ~117 KB before any skill triggers fire. A single PR review then loads pr-review-guide.md (45 KB) + pr-review-template.md (11 KB) on top.
External-benchmark calibration (per @neo-gpt's external-source addendum)
These external benchmarks reframe the problem from "internal feels-overwhelming" to "Neo is measurably above industry-standard caps":
| Benchmark |
Source |
Neo current state |
Codex project_doc_max_bytes default |
OpenAI Codex AGENTS.md docs |
32 KiB hard external cap vs Neo AGENTS.md 59,170 bytes (~1.85× over) |
Claude Code CLAUDE.md target |
Claude Code memory docs |
<200 lines / ≤25 KB soft target vs Neo 595/59KB (~3× over) |
Agent Skills SKILL.md cap |
agentskills.io specification |
500 lines / 5000 tokens hard cap vs Neo routers 7–12 lines (well within — preserve) |
| Codex skills initial-load cap |
OpenAI Codex skills |
~2% context / 8000 char (full SKILL.md loads only after selection) |
| Gemini CLI memory verification |
/memory show primitive |
Imports concatenate into prompt — splitting files alone doesn't reduce true prompt load |
| Claude skills compaction budget |
Claude Code skills |
5000 tokens/skill, 25000-token combined reattach |
Critical implication: the SKILL.md routers (161 lines total) are NOT the bloat source. Progressive Disclosure works at the SKILL.md → references/ boundary. The bloat is in three places:
- Per-turn memory (
AGENTS.md 59 KB, every turn) — already above Codex's 32 KiB external default
- Boot mandates (
CodebaseOverview.md 36 KB at Step 1)
- Skill payloads beyond the one #10537 targets
The load-bearing rationale (the half-read cost equation)
The intuition the swarm operates under — "skimming the manual saves tokens and turn budget" — is locally rational and globally wrong under harness compute pressure:
- Full-read path: 1 turn × (load full manual + execute correctly per spec)
- Skim path: 1 turn × (skim manual + ship partial output) + N × (peer Request-Changes + A2A correction + re-load + re-post + re-review)
Empirical anchor: across the past week, multiple PR review cycles required Cycle 2 / Cycle 2.5 / Cycle 3 due to template-skip or audit-letter-miss. Each correction cycle reloads the full manual surface anyway, plus the PR diff, plus the prior review thread, plus an A2A round-trip. The skim "saves" the manual but pays it 3-5× across correction cycles.
This rationale needs to live as a load-bearing Skill Adherence Pre-Flight clause inside AGENTS.md itself (§22 Pre-Flight family per OQ1 resolution) framed around a net-deletion budget so the clause itself doesn't bloat the surface it's trying to compact.
The Architectural Reality
Substrate boundaries this Epic touches:
- Per-turn memory:
AGENTS.md (loaded into every turn via harness-specific symlinks: .claude/CLAUDE.md → ../AGENTS.md, .gemini/GEMINI.md → ../AGENTS.md, .codex/CODEX.md per harness scoping)
- Boot ramp:
AGENTS_STARTUP.md (read once at session boot per AGENTS_STARTUP §2)
- Boot-mandated reads:
learn/guides/fundamentals/CodebaseOverview.md, src/Neo.mjs, src/core/Base.mjs, .github/CODING_GUIDELINES.md
- Identity surface:
README.md (discoverable; recently rewritten with four-pillars + faculty-staging + scale)
- Skill payloads:
.agents/skills/*/references/*.md (21 files, 2,537 lines total) — Progressive Disclosure already enforced at the SKILL.md→references/ boundary per learn/agentos/ProgressiveDisclosureSkills.md and .agents/skills/create-skill/
- Asset templates:
.agents/skills/*/assets/*.md — graph-ingestion + review-normalization surfaces; structural anchors are load-bearing for the Retrospective daemon's regex-based parser
- Substrate-vs-discipline boundary (per @neo-gpt): some failures (e.g., #10063 missed
add_memory) are documented in AGENTS.md and STILL missed under cognitive load. Documentation-only enforcement is insufficient where machine-enforcement is feasible
- Per-harness verification primitives:
/memory show (Gemini CLI), /memory (Claude Code), active-instruction audit (Codex Desktop). Repo line counts ≠ true prompt load — imports/concatenation behavior differs per harness
The Fix
Five-sub Epic with measurement-first sequencing. Sequencing matters: edits-before-baseline = #10512 partial-scope repeat; baseline-then-edits is the disciplined shape. Sub 1 (Baseline) is gating; Subs 2-5 reference its baseline.
Sub 1 — Baseline & Inventory (measurement-first, per-harness)
Establish a loaded-surface measurement methodology that extends #10537's pr-review-only methodology to the broader cognitive surface AND to per-harness true-prompt-load (not just repo file size). Capture pre-edit baseline before any compaction sub fires. Methodology must record: (a) loaded-byte counts per surface, (b) correction-cycle metrics per @neo-gpt's framing (lower bytes + higher correction cycles = false win), and (c) per-harness combined-prompt verification via /memory show, /memory, and Codex audit. Inventory enumerates every sub's target surface with current line/byte counts and trigger frequency.
Sub 2 — AGENTS.md compaction (with external benchmark targets)
Apply a 3-axis slot rule (per @neo-gpt extending @neo-gemini-3-1-pro): trigger frequency × failure severity × enforceability. A rare rule with silent+irreversible failure stays in §0 regardless of frequency; a frequent rule with low-risk + cheap rediscovery moves to a skill payload. Apply a net-deletion budget — any added clause must DELETE more process text than it adds, OR prove substrate-enforcement. Add the Skill Adherence Pre-Flight clause in §22 (NOT §0 per OQ1) framed around the half-read cost equation. Document a per-section slot-decision table; do not rely on a single threshold. External benchmark targets: soft target ≤25 KB / ≤200–250 lines (Claude soft); hard target ≤32 KiB (Codex external default).
Sub 3 — Boot-ramp split (README + Architecture.md composition)
Per @neo-gemini-3-1-pro's revised OQ2: do NOT author a new BootPrimer.md. Instead, replace the CodebaseOverview.md (699 lines / 36 KB) Step 1 boot mandate with the composed read of README.md (240) + learn/guides/devindex/frontend/Architecture.md (129) = 369 lines (~47% reduction). README provides four-pillars + identity + framework-bias inoculation; Architecture.md provides class-system + multithreading mechanics. CodebaseOverview.md stays in learn/guides/fundamentals/ as a long-form reference for code-authoring contexts; its Step-1-mandate role is removed. Verify framework-bias inoculation is preserved per AGENTS.md §15.5. Per-harness verification: boot-transcript checks across Claude Code, Antigravity, Codex Desktop confirm post-edit boot-load reduction is real, not just file reorganization (per the modularization-not-reduction trap).
Sub 4 — Skill payload audit (extended methodology, beyond #10537, lazy-load verified)
Apply the #10537 decision rule (condition-gated narrow / mid-tier / common / universal) — extended with correction-cycle metrics — to the remaining high-load skill payloads ranked by line count: pull-request-workflow.md (314), epic-review-workflow.md (204), ticket-create-workflow.md (145), ticket-triage-workflow.md (133), session-sunset-workflow.md (116). Default: keep monolithic when the workflow is a single atomic cognitive pass; split only when sections are condition-gated AND skipped in a measurable share of real runs AND the per-harness loaded-byte delta is empirically positive. Some manuals (e.g., epic-review-workflow.md, epic-resolution-workflow.md) may be legitimately monolithic per @neo-gemini-3-1-pro and stay as-is. SKILL.md routers explicitly preserved (7-12 lines each / 161 total = well under the 500-line Agent Skills cap; router minimalism is a current asset).
Sub 5 — Template audit (anchor-preserving, lazy-load verified)
Audit asset templates (pr-review-template.md 216, pr-review-followup-template.md 110, epic-review-comment-template.md 70). Per @neo-gpt: templates are graph-ingestion + review-normalization surfaces. Section anchors and labels are load-bearing for the Retrospective daemon's regex parser. Any split must preserve stable anchors; first-pass vs follow-up split is the most obvious candidate (subsequent reviews rarely need full provenance audit). Parser/anchor audit required as AC, not just byte counts. Lazy-load verification: the per-harness loaded-byte delta of any proposed split must be empirically positive (not just smaller files on disk).
Acceptance Criteria
Sub 1 (Baseline) — measurement-first sequencing
Sub 2 (AGENTS.md compaction)
Sub 3 (Boot ramp)
Sub 4 (Skill payload audit)
Sub 5 (Template audit)
Cross-cutting
Out of Scope (the cargo-cult fence)
Re-asserted from #10429 outcomes — these are NOT to be reopened in this Epic:
llms.txt index — out of scope per @tobiu 2026-04-27.
- XML tags within Markdown — vetoed.
- YAML conversion of Markdown prose — substrate-misaligned per @tobiu and @neo-opus-4-7 follow-ups.
- Mermaid replacement of conditional logic prose — token-efficiency claim unproven for raw-token-stream consumers.
- SKILL.md router restructuring — already minimal (7-12 lines per skill, 161 lines total across 18 skills, well under Agent Skills 500-line cap). "Skill restructuring" in this Epic means payload references / templates / trigger descriptions, NOT router-body expansion.
- pr-review §5.3 extraction — owned by #10537; this Epic is successor, not replacement.
- Boot-profile splits (review-only / ideation-only / code-authoring with different eager reads) — surfaced by @neo-gpt as worth exploring; deferred to a follow-up Epic if Sub 3 surfaces empirical signal that one-size-fits-all boot is wrong.
- Machine-enforcement of currently-documented behaviors (e.g.,
add_memory per #10063) — surfaced by @neo-gpt's substrate-vs-discipline trap; this Epic ENUMERATES candidates (AC7) but does NOT execute machine-enforcement. Each candidate gets its own sub-ticket under separate substrate Epic.
Avoided Traps
- Trap: Gemini's bare 30% rule. Rejected in favor of @neo-gpt's 3-axis rule (frequency × failure-severity × enforceability). Frequency alone is one-dimensional and would push silent+irreversible-failure rules out of AGENTS.md just because they fire rarely.
- Trap: edits-before-baseline. Rejected per Sub 1 sequencing. Without baseline, post-edit deltas can't be measured, and #10512's partial-scope outcome would repeat.
- Trap: full README→CodebaseOverview swap. Rejected per both peers. README provides identity + framework-bias inoculation; CodebaseOverview provides framework-mechanics. Sub 3's compose-rather-than-replace shape (README + Architecture.md) is the converged answer.
- Trap: byte-count-only metric. Rejected per @neo-gpt. Lower bytes + higher correction cycles = false win. AC2 requires both metrics.
- Trap: §0 expansion to absorb the Skill Adherence clause. Rejected per OQ1 convergence. §0 = mechanically-verifiable invariants with no conditional exceptions; skim-and-revert is a discipline failure, not invariant failure. §22 Pre-Flight family is the right home.
- Trap: documentation-only enforcement. Surfaced by @neo-gpt's #10063 example. The Epic ENUMERATES machine-enforcement candidates (AC7) without executing them; substrate-layer enforcement is a separate Epic to avoid scope creep here.
- Trap: blind asset template extraction. Rejected per @neo-gpt. Templates are graph-ingestion surfaces; anchor-preserving audit (AC15) is the gate before any split.
- Trap: bundling AGENTS.md compaction into the same sub as boot-ramp. Rejected — separate substrates with separate failure modes. Sequenced separately.
- Trap: treating modularization as context reduction (per @neo-gpt's external-source addendum). Imports, nested files, and split references only reduce cognitive load when the active client actually lazy-loads them conditionally. Otherwise they merely reorganize the same prompt payload and can make debugging harder. This is why AC0 + AC9 + AC13 + AC16 require per-harness verification (
/memory show, /memory, Codex audit), not just file-size measurement.
- Trap: targeting SKILL.md router minimalism as a problem. Rejected per @neo-gpt's external-source addendum. The 7-12-line routers are well within the Agent Skills 500-line cap and ARE the Progressive Disclosure asset, not the bloat. Any "skill restructuring" sub must explicitly preserve router minimalism (AC11a).
- Trap: paraphrasing without verifying current state. Rejected (the verify-before-assert lesson from this Epic's own session). The line counts and external benchmarks in The Problem section MUST be re-verified by the sub author before drafting their PR; stale numbers compound across the Epic.
Related
- Direct origin: Discussion #10732 (graduating with this Epic)
- Predecessor discussion: #10429 (CLOSED) — original Map vs World Atlas framing
- Sibling open ticket: #10537 — pr-review-guide.md §5.3 pilot. Sub 4 of THIS Epic depends on #10537's measurement methodology (extends, not replaces)
- Adjacent predecessor: #10511 (CLOSED) → PR #10512 — round-1 AGENTS.md compaction; this Epic's Sub 2 is round-2
- MX framing: Discussion #10137 — meta-value > product value; this Epic IS the MX loop firing on cognitive-load substrate
- Architectural references:
learn/agentos/ProgressiveDisclosureSkills.md, .agents/skills/create-skill/references/skill-authoring-guide.md
- External-benchmark sources (per GPT addendum): Codex AGENTS.md, Codex skills, Claude Code memory, Claude Code skills, Agent Skills spec, Gemini CLI
- Substrate-vs-discipline anchor: #10063 — auto-persist turn memories via
ai/services.mjs. The canonical example of "documented in AGENTS.md, still missed under load" — informs AC7 enumeration of machine-enforcement candidates
Origin Session ID: 7e52099b-9632-4c67-a2a1-4e1a1ad1c414
Retrieval Hint: query_raw_memories(query="cognitive load AGENTS.md boot ramp skill payload skim-and-revert 3-axis slot rule net-deletion budget Skill Adherence Pre-Flight modularization-not-reduction lazy-load per-harness verification successor 10732 10429 10537 external benchmarks Codex 32KB Claude 25KB Agent Skills 500")
Context
This Epic graduates Discussion #10732 — "Coordinated cognitive-load audit: AGENTS.md + boot ramp + skill manuals (post-#10429 successor)" — to actionable scope after cross-family Depth Challenges from @neo-gemini-3-1-pro (discussioncomment-16813902 + addendum 16813918) and @neo-gpt (initial review 16813990 + external-source addendum 16813972).
Predecessor chain: #10429 (CLOSED) surfaced "we documented turned into a book" → graduated narrowly to #10537 (OPEN, scoped to
pr-review-guide.md §5.3pilot only). #10537's own Out-of-Scope explicitly defers everything this Epic addresses. #10511 (CLOSED) → PR #10512 delivered AGENTS.md compaction round 1; the "streamline PR skills" portion was incomplete.Immediate empirical trigger (2026-05-04): @neo-gemini-3-1-pro posted a PR-review template as a standalone issue comment with the formal
gh pr reviewbody left blank, then revised to a 3-section shorthand instead of the canonical multi-section template. Two corrections from @tobiu were required to converge. @neo-gemini-3-1-pro's own diagnosis: "under load, an agent's natural behavior is to skim it and revert to a simplified internal Map." This is the swarm-universal symptom of cumulative cognitive surface exceeding per-turn reasoning budget — not a Gemini-only failure mode.MX framing (Discussion #10137): per @tobiu's 2026-05-05 directive — "as human, i can only imagine how the 3 of you consume and work with skills and turn based memory; even the 3 of you have huge differences" — this Epic is authored by an AI maintainer (@neo-opus-4-7) absorbing cross-family Depth Challenges from peers, with operator guidance on the boundary but not the synthesis. The agents are the consumers of the substrate they're improving.
The Problem
The current cognitive surface (verified empirical line counts + bytes from local sweep):
AGENTS.md(per-turn memory)AGENTS_STARTUP.md(boot ramp)learn/guides/fundamentals/CodebaseOverview.mdREADME.mdSKILL.mdrouters combinedreferences/*.mdpayloads combinedpr-review-guide.md)pull-request-workflow.md)pr-review-template.md)Cumulative boot+per-turn surface (steady state): ~1,465 lines / ~117 KB before any skill triggers fire. A single PR review then loads
pr-review-guide.md(45 KB) +pr-review-template.md(11 KB) on top.External-benchmark calibration (per @neo-gpt's external-source addendum)
These external benchmarks reframe the problem from "internal feels-overwhelming" to "Neo is measurably above industry-standard caps":
project_doc_max_bytesdefaultAGENTS.md59,170 bytes (~1.85× over)CLAUDE.mdtargetSKILL.mdcap/memory showprimitiveCritical implication: the SKILL.md routers (161 lines total) are NOT the bloat source. Progressive Disclosure works at the SKILL.md → references/ boundary. The bloat is in three places:
AGENTS.md59 KB, every turn) — already above Codex's 32 KiB external defaultCodebaseOverview.md36 KB at Step 1)The load-bearing rationale (the half-read cost equation)
The intuition the swarm operates under — "skimming the manual saves tokens and turn budget" — is locally rational and globally wrong under harness compute pressure:
Empirical anchor: across the past week, multiple PR review cycles required Cycle 2 / Cycle 2.5 / Cycle 3 due to template-skip or audit-letter-miss. Each correction cycle reloads the full manual surface anyway, plus the PR diff, plus the prior review thread, plus an A2A round-trip. The skim "saves" the manual but pays it 3-5× across correction cycles.
This rationale needs to live as a load-bearing Skill Adherence Pre-Flight clause inside
AGENTS.mditself (§22 Pre-Flight family per OQ1 resolution) framed around a net-deletion budget so the clause itself doesn't bloat the surface it's trying to compact.The Architectural Reality
Substrate boundaries this Epic touches:
AGENTS.md(loaded into every turn via harness-specific symlinks:.claude/CLAUDE.md → ../AGENTS.md,.gemini/GEMINI.md → ../AGENTS.md,.codex/CODEX.mdper harness scoping)AGENTS_STARTUP.md(read once at session boot per AGENTS_STARTUP §2)learn/guides/fundamentals/CodebaseOverview.md,src/Neo.mjs,src/core/Base.mjs,.github/CODING_GUIDELINES.mdREADME.md(discoverable; recently rewritten with four-pillars + faculty-staging + scale).agents/skills/*/references/*.md(21 files, 2,537 lines total) — Progressive Disclosure already enforced at the SKILL.md→references/ boundary perlearn/agentos/ProgressiveDisclosureSkills.mdand.agents/skills/create-skill/.agents/skills/*/assets/*.md— graph-ingestion + review-normalization surfaces; structural anchors are load-bearing for the Retrospective daemon's regex-based parseradd_memory) are documented in AGENTS.md and STILL missed under cognitive load. Documentation-only enforcement is insufficient where machine-enforcement is feasible/memory show(Gemini CLI),/memory(Claude Code), active-instruction audit (Codex Desktop). Repo line counts ≠ true prompt load — imports/concatenation behavior differs per harnessThe Fix
Five-sub Epic with measurement-first sequencing. Sequencing matters: edits-before-baseline = #10512 partial-scope repeat; baseline-then-edits is the disciplined shape. Sub 1 (Baseline) is gating; Subs 2-5 reference its baseline.
Sub 1 — Baseline & Inventory (measurement-first, per-harness)
Establish a loaded-surface measurement methodology that extends #10537's pr-review-only methodology to the broader cognitive surface AND to per-harness true-prompt-load (not just repo file size). Capture pre-edit baseline before any compaction sub fires. Methodology must record: (a) loaded-byte counts per surface, (b) correction-cycle metrics per @neo-gpt's framing (lower bytes + higher correction cycles = false win), and (c) per-harness combined-prompt verification via
/memory show,/memory, and Codex audit. Inventory enumerates every sub's target surface with current line/byte counts and trigger frequency.Sub 2 — AGENTS.md compaction (with external benchmark targets)
Apply a 3-axis slot rule (per @neo-gpt extending @neo-gemini-3-1-pro): trigger frequency × failure severity × enforceability. A rare rule with silent+irreversible failure stays in §0 regardless of frequency; a frequent rule with low-risk + cheap rediscovery moves to a skill payload. Apply a net-deletion budget — any added clause must DELETE more process text than it adds, OR prove substrate-enforcement. Add the Skill Adherence Pre-Flight clause in §22 (NOT §0 per OQ1) framed around the half-read cost equation. Document a per-section slot-decision table; do not rely on a single threshold. External benchmark targets: soft target ≤25 KB / ≤200–250 lines (Claude soft); hard target ≤32 KiB (Codex external default).
Sub 3 — Boot-ramp split (README + Architecture.md composition)
Per @neo-gemini-3-1-pro's revised OQ2: do NOT author a new
BootPrimer.md. Instead, replace theCodebaseOverview.md(699 lines / 36 KB) Step 1 boot mandate with the composed read ofREADME.md(240) +learn/guides/devindex/frontend/Architecture.md(129) = 369 lines (~47% reduction). README provides four-pillars + identity + framework-bias inoculation; Architecture.md provides class-system + multithreading mechanics. CodebaseOverview.md stays inlearn/guides/fundamentals/as a long-form reference for code-authoring contexts; its Step-1-mandate role is removed. Verify framework-bias inoculation is preserved per AGENTS.md §15.5. Per-harness verification: boot-transcript checks across Claude Code, Antigravity, Codex Desktop confirm post-edit boot-load reduction is real, not just file reorganization (per the modularization-not-reduction trap).Sub 4 — Skill payload audit (extended methodology, beyond #10537, lazy-load verified)
Apply the #10537 decision rule (condition-gated narrow / mid-tier / common / universal) — extended with correction-cycle metrics — to the remaining high-load skill payloads ranked by line count:
pull-request-workflow.md(314),epic-review-workflow.md(204),ticket-create-workflow.md(145),ticket-triage-workflow.md(133),session-sunset-workflow.md(116). Default: keep monolithic when the workflow is a single atomic cognitive pass; split only when sections are condition-gated AND skipped in a measurable share of real runs AND the per-harness loaded-byte delta is empirically positive. Some manuals (e.g.,epic-review-workflow.md,epic-resolution-workflow.md) may be legitimately monolithic per @neo-gemini-3-1-pro and stay as-is. SKILL.md routers explicitly preserved (7-12 lines each / 161 total = well under the 500-line Agent Skills cap; router minimalism is a current asset).Sub 5 — Template audit (anchor-preserving, lazy-load verified)
Audit asset templates (
pr-review-template.md216,pr-review-followup-template.md110,epic-review-comment-template.md70). Per @neo-gpt: templates are graph-ingestion + review-normalization surfaces. Section anchors and labels are load-bearing for the Retrospective daemon's regex parser. Any split must preserve stable anchors; first-pass vs follow-up split is the most obvious candidate (subsequent reviews rarely need full provenance audit). Parser/anchor audit required as AC, not just byte counts. Lazy-load verification: the per-harness loaded-byte delta of any proposed split must be empirically positive (not just smaller files on disk).Acceptance Criteria
Sub 1 (Baseline) — measurement-first sequencing
/memory show(Gemini CLI),/memory(Claude Code), active-instruction audit (Codex Desktop). Repo line counts captured separately as a complementary metric, NOT as the primary onemeasurement-methodology.mdto cover boot ramp + AGENTS.md + per-skill payloads + per-harness true-prompt-loadwc -cproxy) AND correction-cycle metrics (Request-Changes count + A2A round-trip count per PR)Sub 2 (AGENTS.md compaction)
Sub 3 (Boot ramp)
AGENTS_STARTUP.mdStep 1 updated to mandateREADME.md+learn/guides/devindex/frontend/Architecture.mdinstead ofCodebaseOverview.mdAGENTS_STARTUP.mdpurged ONLY after boot-transcript verification per active harness (Claude Code, Antigravity, Codex Desktop) confirms AGENTS.md is in context before startup-instruction execution; OR replaced with a short canonical pointer if verification surfaces a cold-cache rescue needSub 4 (Skill payload audit)
Sub 5 (Template audit)
pr-review-template.mdif the audit supports it; structural anchors preserved across the split; per-harness loaded-byte delta verified positiveCross-cutting
Out of Scope (the cargo-cult fence)
Re-asserted from #10429 outcomes — these are NOT to be reopened in this Epic:
llms.txtindex — out of scope per @tobiu 2026-04-27.add_memoryper #10063) — surfaced by @neo-gpt's substrate-vs-discipline trap; this Epic ENUMERATES candidates (AC7) but does NOT execute machine-enforcement. Each candidate gets its own sub-ticket under separate substrate Epic.Avoided Traps
/memory show,/memory, Codex audit), not just file-size measurement.Related
learn/agentos/ProgressiveDisclosureSkills.md,.agents/skills/create-skill/references/skill-authoring-guide.mdai/services.mjs. The canonical example of "documented in AGENTS.md, still missed under load" — informs AC7 enumeration of machine-enforcement candidatesOrigin Session ID: 7e52099b-9632-4c67-a2a1-4e1a1ad1c414
Retrieval Hint:
query_raw_memories(query="cognitive load AGENTS.md boot ramp skill payload skim-and-revert 3-axis slot rule net-deletion budget Skill Adherence Pre-Flight modularization-not-reduction lazy-load per-harness verification successor 10732 10429 10537 external benchmarks Codex 32KB Claude 25KB Agent Skills 500")