Context
Successor to #10733 (cognitive-load epic cycle 1, closed RECOMMEND_CLOSE_COMPLETED with verdict matrix at issuecomment-4381037106). Cycle 1 was a one-shot audit of AGENTS.md + boot ramp + skill manuals. Cycle 2 attacks the decay vectors that make cycle 1 a recurring need plus a previously-unaudited dimension (MCP tool surface).
Per @tobiu 2026-05-05: "we work based on empirical data and facts, not bias" — cycle 2 must encode the empirical-grounding discipline that cycle 1 surfaced (paired with #10756).
The Problem
Cycle 1's bloat was accretion-driven, not authoring-driven. Every individual addition was locally justified. AGENTS.md grew to 59KB pre-edit. The MX loop currently has asymmetric momentum: friction → ticket → evolved skill adds substrate; nothing retires on its own. Without intervention, AGENTS.md will re-bloat as new gates / Pre-Flights / patterns get codified.
MCP tool surface adds a separate axis of bloat per @tobiu 2026-05-05:
- Tool descriptions are enumerated in every consuming agent's context window (#9953 prior art targets this)
- Tool return values bloat per call (e.g.,
list_messages returning 10 entries when 5 + pagination cursor would do)
- Tool usage patterns (e.g.,
mark_read requires N calls for N messages — 50 A2A messages at sunset = 50 tool calls; no bulk variant)
- Tool outputs that are world-Atlas on their own (e.g.,
get_namespace_tree, query_hybrid_graph, large get_*_tree shapes) without progressive-disclosure shape
Each tool is locally well-shaped. Cumulatively the surface bloats the per-turn context.
The Architectural Reality
- AGENTS.md §13 Self-Evolving Systems — codifies friction-into-gold loop with no retirement counterpart
.agents/skills/create-skill/references/skill-authoring-guide.md (91 lines) — encodes Map+Atlas at structural level; no slot-rule taxonomy / disposition / byte budget
.agents/skills/pull-request/ — no Pre-Flight for substrate-touching PRs
mcp__scheduled-tasks__create_scheduled_task — harness primitive available; no substrate-audit consumer yet
#9953 MCP Progressive Disclosure Endpoint (OPEN) — covers description bloat dimension only; subsumed under V4.1
learn/agentos/measurements/cognitive-load-baseline-2026-05.md — cycle-1 baseline; cycle-2 must extend
The Fix
Five-vertical structure (each becomes a sub-issue):
V1 — Creation-time gate. Retrofit into .agents/skills/create-skill/references/skill-authoring-guide.md:
- 3-axis slot rule (trigger-frequency × failure-severity × enforceability)
- Disposition taxonomy (
keep / move / compress-to-trigger / rewrite / retire)
- Substrate-vs-discipline tagging (
MACHINE-ENFORCEABLE-CANDIDATE / DISCIPLINE-ONLY)
- Byte/line budget guidance for SKILL.md routers (empirical floor 7-12 lines)
V2 — Mutation-time gate (the dominant decay-vector intervention).
- V2a: Pre-Flight in
.agents/skills/pull-request/SKILL.md for PRs touching AGENTS.md / AGENTS_ATLAS.md / .agents/skills/** / learn/agentos/** — PR body MUST include a slot-rationale section enumerating dispositions for added/modified content
- V2b: AGENTS.md §13 codification — "every substrate-mutation PR MUST EITHER net-reduce loaded-bytes OR cite future-decay-mitigation rationale (sunset condition, slot disposition, retirement trigger)"
V3 — Periodic cron audit. mcp__scheduled-tasks__create_scheduled_task-driven primitive (cadence configurable: calendar-based and/or threshold-based, decided per measurement in V3 methodology). Output: ticket filed with model-experience + cognitive-load labels containing candidate-compaction matrix. Inviolability constraints embedded as non-prunable header: §0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741 must be pre-loaded as do-not-touch before any disposition recommendation.
V4 — MCP tool surface audit (NEW per @tobiu 2026-05-05).
- V4.1: Tool description compaction + meta-tool (
getDescription / get_mcp_tool_handbook); supersedes / merges #9953. Rule for consumers: use meta-tool before first use OR if description rolled out of context
- V4.2: Return-value pagination audit (
list_*, query_*, search_* tools) — sensible default page sizes + cursor patterns + next/prev pointers
- V4.3: Bulk-operation audit (
mark_read, manage_*, transition_*, etc.) — surface swarm-frequent N×1-call patterns; recommend batch variants (e.g., mark_all_read)
- V4.4: Output-as-Atlas audit (
get_namespace_tree, query_hybrid_graph, large get_*_tree shapes) — recommend summary+drill-down progressive-disclosure shape
V5 — Empirical-discipline pairing. §8 / §7.2 cross-model-asymmetry codifications need empirical grounding (filed as #10756 — sibling, threaded here as the empirical-grounding companion that gates how cycle-2 substrate is reasoned about).
Avoided Traps
- Single-PR fix: verticals 1-4 are independent surfaces; bundling would create unreviewable PR scope. Each vertical = one sub.
- Audit-as-skill (pull-based): ritualism risk — skills nobody invokes are dead. Cron is push, not pull. The audit must FIRE on cadence/threshold, not wait for an agent to remember.
- Mechanical net-reduce gate without escape valve: would block legitimate non-trivial substrate growth; "OR cite future-decay-mitigation" is the escape valve.
- Slot rule applied without inviolability constraints: would re-prune paradigm restoration from #10741; constraints embedded as non-prunable header.
- Treating #9953 as out-of-scope: V4.1 is the natural home; rather than two parallel efforts, V4.1 supersedes / merges #9953.
Acceptance Criteria
Out of Scope
- Cycle-3 successor (cycle-2 establishes the standing infrastructure; cycle-3 will be a measurement of cycle-2's effectiveness over the first 10 review cycles, paralleling cycle-1's AC18)
- Removing slot-rule taxonomy from cycle-1 substrate (preserves AGENTS.md Compaction Taxonomy; new substrate inherits)
- Replacing markdown skill substrate with structured data (YAML/JSON/XML) — out per cycle-1 epic out-of-scope, retained
- Cross-family reviewer-rigor measurement methodology (V5 owns; not duplicated here)
- Post-#10733 cleanup batch tickets queued in issuecomment-4381037106 (architectural-pillar Pre-Flight, handshake-gap, post-rebase audit, bias-disclosure discipline) — orthogonal scope; file separately once #10733 closes
Related
- Predecessor epic: #10733 (cycle 1;
RECOMMEND_CLOSE_COMPLETED)
- Empirical-grounding sibling: #10756 (V5)
- Prior MCP work superseded by V4.1: #9953 MCP Progressive Disclosure Endpoint
- Prior MCP work providing V4 context: #10341 + #10545 + #10506 + #10614 (per-tool description compaction)
- Substrate baseline:
learn/agentos/measurements/cognitive-load-baseline-2026-05.md
- Origin discussion: this session's challenge by @tobiu — "we work based on empirical data and facts, not bias"
Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Context
Successor to #10733 (cognitive-load epic cycle 1, closed
RECOMMEND_CLOSE_COMPLETEDwith verdict matrix at issuecomment-4381037106). Cycle 1 was a one-shot audit of AGENTS.md + boot ramp + skill manuals. Cycle 2 attacks the decay vectors that make cycle 1 a recurring need plus a previously-unaudited dimension (MCP tool surface).Per @tobiu 2026-05-05: "we work based on empirical data and facts, not bias" — cycle 2 must encode the empirical-grounding discipline that cycle 1 surfaced (paired with #10756).
The Problem
Cycle 1's bloat was accretion-driven, not authoring-driven. Every individual addition was locally justified. AGENTS.md grew to 59KB pre-edit. The MX loop currently has asymmetric momentum: friction → ticket → evolved skill adds substrate; nothing retires on its own. Without intervention, AGENTS.md will re-bloat as new gates / Pre-Flights / patterns get codified.
MCP tool surface adds a separate axis of bloat per @tobiu 2026-05-05:
list_messagesreturning 10 entries when 5 + pagination cursor would do)mark_readrequires N calls for N messages — 50 A2A messages at sunset = 50 tool calls; no bulk variant)get_namespace_tree,query_hybrid_graph, largeget_*_treeshapes) without progressive-disclosure shapeEach tool is locally well-shaped. Cumulatively the surface bloats the per-turn context.
The Architectural Reality
.agents/skills/create-skill/references/skill-authoring-guide.md(91 lines) — encodes Map+Atlas at structural level; no slot-rule taxonomy / disposition / byte budget.agents/skills/pull-request/— no Pre-Flight for substrate-touching PRsmcp__scheduled-tasks__create_scheduled_task— harness primitive available; no substrate-audit consumer yet#9953 MCP Progressive Disclosure Endpoint(OPEN) — covers description bloat dimension only; subsumed under V4.1learn/agentos/measurements/cognitive-load-baseline-2026-05.md— cycle-1 baseline; cycle-2 must extendThe Fix
Five-vertical structure (each becomes a sub-issue):
V1 — Creation-time gate. Retrofit into
.agents/skills/create-skill/references/skill-authoring-guide.md:keep/move/compress-to-trigger/rewrite/retire)MACHINE-ENFORCEABLE-CANDIDATE/DISCIPLINE-ONLY)V2 — Mutation-time gate (the dominant decay-vector intervention).
.agents/skills/pull-request/SKILL.mdfor PRs touchingAGENTS.md/AGENTS_ATLAS.md/.agents/skills/**/learn/agentos/**— PR body MUST include a slot-rationale section enumerating dispositions for added/modified contentV3 — Periodic cron audit.
mcp__scheduled-tasks__create_scheduled_task-driven primitive (cadence configurable: calendar-based and/or threshold-based, decided per measurement in V3 methodology). Output: ticket filed withmodel-experience+cognitive-loadlabels containing candidate-compaction matrix. Inviolability constraints embedded as non-prunable header: §0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741 must be pre-loaded as do-not-touch before any disposition recommendation.V4 — MCP tool surface audit (NEW per @tobiu 2026-05-05).
getDescription/get_mcp_tool_handbook); supersedes / merges #9953. Rule for consumers: use meta-tool before first use OR if description rolled out of contextlist_*,query_*,search_*tools) — sensible default page sizes + cursor patterns + next/prev pointersmark_read,manage_*,transition_*, etc.) — surface swarm-frequent N×1-call patterns; recommend batch variants (e.g.,mark_all_read)get_namespace_tree,query_hybrid_graph, largeget_*_treeshapes) — recommend summary+drill-down progressive-disclosure shapeV5 — Empirical-discipline pairing. §8 / §7.2 cross-model-asymmetry codifications need empirical grounding (filed as #10756 — sibling, threaded here as the empirical-grounding companion that gates how cycle-2 substrate is reasoned about).
Avoided Traps
Acceptance Criteria
.agents/skills/create-skill/references/skill-authoring-guide.mdpull-requestPre-Flight added for substrate-touching PRs (slot-rationale section required in PR body)list_*/query_*/search_*toolsmark_readbulk variant implemented; identified N×1 patterns receive batch variantscognitive-load-baseline-2026-05.mdextended with cycle-2 §7 covering all 5 verticals (or new filecognitive-load-baseline-2026-XX.mdif cadence dictates)Out of Scope
Related
RECOMMEND_CLOSE_COMPLETED)learn/agentos/measurements/cognitive-load-baseline-2026-05.mdOrigin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7