LearnNewsExamplesServices
Frontmatter
id10757
titleCognitive-load audit cycle 2 — mutation gate + periodic cron + MCP tool surface
stateOpen
labels
documentationenhancementepicairefactoringarchitecturemodel-experience
assigneesneo-opus-ada
createdAtMay 5, 2026, 6:44 PM
updatedAtJun 1, 2026, 10:56 AM
githubUrlhttps://github.com/neomjs/neo/issues/10757
authorneo-opus-ada
commentsCount5
parentIssuenull
subIssues
10760 Slot rule + disposition taxonomy + byte budget retrofit into create-skill authoring guide
10756 Empirical grounding for §8 / §7.2 cross-model asymmetry codifications
10765 Mutation-time substrate gate (pull-request Pre-Flight + AGENTS.md §13)
10766 Periodic substrate-audit primitive + leased-driver heartbeat recovery
10777 Agent-runtime engagement discipline (V6: triage-not-engage default + leased-driver continuity)
10775 Mechanical Pre-Flight: enforce evaluation-metrics block in pr-review comments
10776 Surface follow-ups from peer review in PR body Follow-ups block (do not lose at merge)
10794 Refine substrate-mutation gate (learn/agentos carve-out + reviewer enforcement)
subIssuesCompleted5
subIssuesTotal8
blockedBy[]
blocking[]

Cognitive-load audit cycle 2 — mutation gate + periodic cron + MCP tool surface

Open Backlog/active-chunk-9 documentationenhancementepicairefactoringarchitecturemodel-experience
neo-opus-ada
neo-opus-ada commented on May 5, 2026, 6:44 PM

Context

Successor to #10733 (cognitive-load epic cycle 1, closed RECOMMEND_CLOSE_COMPLETED with verdict matrix at issuecomment-4381037106). Cycle 1 was a one-shot audit of AGENTS.md + boot ramp + skill manuals. Cycle 2 attacks the decay vectors that make cycle 1 a recurring need plus a previously-unaudited dimension (MCP tool surface).

Per @tobiu 2026-05-05: "we work based on empirical data and facts, not bias" — cycle 2 must encode the empirical-grounding discipline that cycle 1 surfaced (paired with #10756).

The Problem

Cycle 1's bloat was accretion-driven, not authoring-driven. Every individual addition was locally justified. AGENTS.md grew to 59KB pre-edit. The MX loop currently has asymmetric momentum: friction → ticket → evolved skill adds substrate; nothing retires on its own. Without intervention, AGENTS.md will re-bloat as new gates / Pre-Flights / patterns get codified.

MCP tool surface adds a separate axis of bloat per @tobiu 2026-05-05:

  • Tool descriptions are enumerated in every consuming agent's context window (#9953 prior art targets this)
  • Tool return values bloat per call (e.g., list_messages returning 10 entries when 5 + pagination cursor would do)
  • Tool usage patterns (e.g., mark_read requires N calls for N messages — 50 A2A messages at sunset = 50 tool calls; no bulk variant)
  • Tool outputs that are world-Atlas on their own (e.g., get_namespace_tree, query_hybrid_graph, large get_*_tree shapes) without progressive-disclosure shape

Each tool is locally well-shaped. Cumulatively the surface bloats the per-turn context.

The Architectural Reality

  • AGENTS.md §13 Self-Evolving Systems — codifies friction-into-gold loop with no retirement counterpart
  • .agents/skills/create-skill/references/skill-authoring-guide.md (91 lines) — encodes Map+Atlas at structural level; no slot-rule taxonomy / disposition / byte budget
  • .agents/skills/pull-request/ — no Pre-Flight for substrate-touching PRs
  • mcp__scheduled-tasks__create_scheduled_task — harness primitive available; no substrate-audit consumer yet
  • #9953 MCP Progressive Disclosure Endpoint (OPEN) — covers description bloat dimension only; subsumed under V4.1
  • learn/agentos/measurements/cognitive-load-baseline-2026-05.md — cycle-1 baseline; cycle-2 must extend

The Fix

Five-vertical structure (each becomes a sub-issue):

V1 — Creation-time gate. Retrofit into .agents/skills/create-skill/references/skill-authoring-guide.md:

  • 3-axis slot rule (trigger-frequency × failure-severity × enforceability)
  • Disposition taxonomy (keep / move / compress-to-trigger / rewrite / retire)
  • Substrate-vs-discipline tagging (MACHINE-ENFORCEABLE-CANDIDATE / DISCIPLINE-ONLY)
  • Byte/line budget guidance for SKILL.md routers (empirical floor 7-12 lines)

V2 — Mutation-time gate (the dominant decay-vector intervention).

  • V2a: Pre-Flight in .agents/skills/pull-request/SKILL.md for PRs touching AGENTS.md / AGENTS_ATLAS.md / .agents/skills/** / learn/agentos/** — PR body MUST include a slot-rationale section enumerating dispositions for added/modified content
  • V2b: AGENTS.md §13 codification — "every substrate-mutation PR MUST EITHER net-reduce loaded-bytes OR cite future-decay-mitigation rationale (sunset condition, slot disposition, retirement trigger)"

V3 — Periodic cron audit. mcp__scheduled-tasks__create_scheduled_task-driven primitive (cadence configurable: calendar-based and/or threshold-based, decided per measurement in V3 methodology). Output: ticket filed with model-experience + cognitive-load labels containing candidate-compaction matrix. Inviolability constraints embedded as non-prunable header: §0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741 must be pre-loaded as do-not-touch before any disposition recommendation.

V4 — MCP tool surface audit (NEW per @tobiu 2026-05-05).

  • V4.1: Tool description compaction + meta-tool (getDescription / get_mcp_tool_handbook); supersedes / merges #9953. Rule for consumers: use meta-tool before first use OR if description rolled out of context
  • V4.2: Return-value pagination audit (list_*, query_*, search_* tools) — sensible default page sizes + cursor patterns + next/prev pointers
  • V4.3: Bulk-operation audit (mark_read, manage_*, transition_*, etc.) — surface swarm-frequent N×1-call patterns; recommend batch variants (e.g., mark_all_read)
  • V4.4: Output-as-Atlas audit (get_namespace_tree, query_hybrid_graph, large get_*_tree shapes) — recommend summary+drill-down progressive-disclosure shape

V5 — Empirical-discipline pairing. §8 / §7.2 cross-model-asymmetry codifications need empirical grounding (filed as #10756 — sibling, threaded here as the empirical-grounding companion that gates how cycle-2 substrate is reasoned about).

Avoided Traps

  • Single-PR fix: verticals 1-4 are independent surfaces; bundling would create unreviewable PR scope. Each vertical = one sub.
  • Audit-as-skill (pull-based): ritualism risk — skills nobody invokes are dead. Cron is push, not pull. The audit must FIRE on cadence/threshold, not wait for an agent to remember.
  • Mechanical net-reduce gate without escape valve: would block legitimate non-trivial substrate growth; "OR cite future-decay-mitigation" is the escape valve.
  • Slot rule applied without inviolability constraints: would re-prune paradigm restoration from #10741; constraints embedded as non-prunable header.
  • Treating #9953 as out-of-scope: V4.1 is the natural home; rather than two parallel efforts, V4.1 supersedes / merges #9953.

Acceptance Criteria

  • (AC1) V1 sub: slot rule + disposition taxonomy + substrate-vs-discipline tagging + byte budget retrofitted into .agents/skills/create-skill/references/skill-authoring-guide.md
  • (AC2) V2a sub: pull-request Pre-Flight added for substrate-touching PRs (slot-rationale section required in PR body)
  • (AC3) V2b sub: AGENTS.md §13 codified with the net-reduce-or-cite-mitigation rule
  • (AC4) V3 sub: cron-driven substrate-audit primitive operational; cadence configurable; output ticket-shaped
  • (AC5) V3 sub: audit primitive embeds inviolability constraints (§0 + §15.5 + paradigm anchors) as non-prunable header
  • (AC6) V4.1 sub: MCP tool description compaction + meta-tool implemented (supersedes / merges #9953)
  • (AC7) V4.2 sub: return-value pagination audit complete; cursor pattern applied to list_*/query_*/search_* tools
  • (AC8) V4.3 sub: bulk-operation audit complete; mark_read bulk variant implemented; identified N×1 patterns receive batch variants
  • (AC9) V4.4 sub: output-as-Atlas audit complete; progressive-disclosure shape applied to identified large-payload tools
  • (AC10) V5 sub (#10756): §8 / §7.2 empirical grounding resolved
  • (AC11, cross-cutting) cognitive-load-baseline-2026-05.md extended with cycle-2 §7 covering all 5 verticals (or new file cognitive-load-baseline-2026-XX.md if cadence dictates)

Out of Scope

  • Cycle-3 successor (cycle-2 establishes the standing infrastructure; cycle-3 will be a measurement of cycle-2's effectiveness over the first 10 review cycles, paralleling cycle-1's AC18)
  • Removing slot-rule taxonomy from cycle-1 substrate (preserves AGENTS.md Compaction Taxonomy; new substrate inherits)
  • Replacing markdown skill substrate with structured data (YAML/JSON/XML) — out per cycle-1 epic out-of-scope, retained
  • Cross-family reviewer-rigor measurement methodology (V5 owns; not duplicated here)
  • Post-#10733 cleanup batch tickets queued in issuecomment-4381037106 (architectural-pillar Pre-Flight, handshake-gap, post-rebase audit, bias-disclosure discipline) — orthogonal scope; file separately once #10733 closes

Related

  • Predecessor epic: #10733 (cycle 1; RECOMMEND_CLOSE_COMPLETED)
  • Empirical-grounding sibling: #10756 (V5)
  • Prior MCP work superseded by V4.1: #9953 MCP Progressive Disclosure Endpoint
  • Prior MCP work providing V4 context: #10341 + #10545 + #10506 + #10614 (per-tool description compaction)
  • Substrate baseline: learn/agentos/measurements/cognitive-load-baseline-2026-05.md
  • Origin discussion: this session's challenge by @tobiu — "we work based on empirical data and facts, not bias"

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

tobiu referenced in commit a766ec8 - "feat(agents): retrofit slot rule taxonomy into create-skill (#10760) (#10764) on May 5, 2026, 7:54 PM