Context
V3 sub of Epic #10757 (cycle-2 cognitive-load epic). Adds the substrate-decay audit consumer to the heartbeat substrate already implemented in Epic #10671 (substrate-restart recovery, two-mode: idle-out + sunset).
Important correction (per @tobiu 2026-05-05): the heartbeat substrate is NOT new — it lives in ai/scripts/swarm-heartbeat.sh + checkSunsetted.mjs + resumeHarness.mjs + wakeSafetyGate.mjs, with 9 closed sub-issues under #10671. What's outstanding is end-to-end validation; wakeSafetyGate is currently tripped (since 2026-05-03 22:53Z). My initial framing of this V3 ticket proposed heartbeat-for-leased-driver-recovery as if net-new — verify-before-assert lapse. Corrected scope: V3 narrows to the substrate-audit consumer only; leased-driver TTL recovery is downstream of #10671 finish and tracked there.
The Problem
V1 (#10760, merged) + V2 (#10765, filed) catch creation-time + mutation-time slot-rule violations. They don't catch cumulative drift — many small individually-justified PRs over time can re-bloat substrate even with V1 + V2 gates active. Periodic audit catches what point-in-time gates miss.
The audit needs to fire at intervals independent of agent presence (otherwise it's another agent-pull-skill that nobody invokes). The harness already has the substrate that fires events independent of agent presence: the heartbeat infrastructure under #10671. V3 adds a new consumer to that infrastructure, not a new primitive.
The Architectural Reality
Existing heartbeat substrate (per #10671 closed subs):
ai/scripts/swarm-heartbeat.sh — cron-driven 5-min poll (already operational at the script level)
ai/scripts/checkSunsetted.mjs — predicate emitting sunset vs idle_out_candidate signals (#10673)
ai/scripts/resumeHarness.mjs — recovery action dispatcher (#10675/#10676)
ai/scripts/wakeSafetyGate.mjs — fail-closed safety gate (#10648, currently tripped)
- In-flight boot/restart lock primitive (#10674)
V3-specific consumer (what this ticket adds):
- Substrate-audit callback inside the existing heartbeat callback chain
- Triggered on calendar OR threshold-based cadence (independent of #10671's idle-out / sunset signals)
- Loads cycle-1 baseline (
learn/agentos/measurements/cognitive-load-baseline-2026-05.md) for comparison
- Loads inviolability constraints (§0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741) as non-prunable header
- Produces ticket with
model-experience + cognitive-load labels containing candidate-compaction matrix
The Fix
Single-consumer addition to existing heartbeat:
- Add audit-callback dispatch in
swarm-heartbeat.sh (or its node-side equivalent) — fires on each heartbeat tick, but most ticks are no-op (audit only runs at the configured cadence)
- Implement audit logic: load baseline, compare against current substrate state (line counts of AGENTS.md / skills / etc.), produce candidate-compaction matrix
- Output via
create_issue MCP tool with the labels + non-prunable inviolability header
- Recommendation-only — operator-territory action required for actual pruning
Dependency: this consumer requires #10671 to be finished + wakeSafetyGate untripped. V3 implementation can be drafted in parallel with #10671 finish, but cannot validate end-to-end until #10671 lands.
Avoided Traps
- Proposing heartbeat as new primitive: done by mistake initially; verify-before-assert lapse caught by @tobiu. Heartbeat substrate exists in #10671. V3 = consumer, not primitive.
- Audit-as-skill (pull-based): ritualism risk — skills nobody invokes are dead. Cron-callback is push.
- Inviolability constraints not embedded as non-prunable header: would re-prune paradigm restoration from #10741. Constraints embedded structurally.
- Mechanical enforcement of audit recommendations: operator-territory; recommendations only.
- Implementing V3 callback before #10671 untripped: callback would never fire while gate is tripped. Validation requires #10671 finish first.
Acceptance Criteria
Out of Scope
- Heartbeat primitive itself (lives in #10671)
- Leased-driver TTL recovery (also #10671 scope; my earlier framing was incorrect)
wakeSafetyGate untrip / end-to-end #10671 validation (parent-epic scope)
- Mechanical enforcement of audit recommendations
- Continuous-presence agent state (rejected per #10759 / #10761)
Related
- Parent epic: #10757 (cycle-2 cognitive-load)
- Predecessor sub (merged): #10760 (V1 — creation-time gate via PR #10764)
- Predecessor sub (filed): #10765 (V2 — mutation-time gate)
- Sibling subs: V4.1-V4.4 (MCP tool surface, TBD), V5 (#10756 empirical-grounding), V6 (Agent-Runtime Engagement Discipline, TBD)
- Hard dependency: #10671 (substrate-restart recovery — heartbeat substrate must be finished +
wakeSafetyGate untripped before V3 callback validates)
- Adjacent ticket: #10763 (Leased Driver Pattern — its TTL-recovery deadlock-prevention is also #10671-scope, not V3-scope per this correction)
Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Retrieval Hint: query_raw_memories(query="cognitive-load cycle-2 V3 substrate-audit consumer heartbeat 10671 dependency 10757 wakeSafetyGate")
Context
V3 sub of Epic #10757 (cycle-2 cognitive-load epic). Adds the substrate-decay audit consumer to the heartbeat substrate already implemented in Epic #10671 (substrate-restart recovery, two-mode: idle-out + sunset).
Important correction (per @tobiu 2026-05-05): the heartbeat substrate is NOT new — it lives in
ai/scripts/swarm-heartbeat.sh+checkSunsetted.mjs+resumeHarness.mjs+wakeSafetyGate.mjs, with 9 closed sub-issues under #10671. What's outstanding is end-to-end validation;wakeSafetyGateis currentlytripped(since 2026-05-03 22:53Z). My initial framing of this V3 ticket proposed heartbeat-for-leased-driver-recovery as if net-new — verify-before-assert lapse. Corrected scope: V3 narrows to the substrate-audit consumer only; leased-driver TTL recovery is downstream of #10671 finish and tracked there.The Problem
V1 (#10760, merged) + V2 (#10765, filed) catch creation-time + mutation-time slot-rule violations. They don't catch cumulative drift — many small individually-justified PRs over time can re-bloat substrate even with V1 + V2 gates active. Periodic audit catches what point-in-time gates miss.
The audit needs to fire at intervals independent of agent presence (otherwise it's another agent-pull-skill that nobody invokes). The harness already has the substrate that fires events independent of agent presence: the heartbeat infrastructure under #10671. V3 adds a new consumer to that infrastructure, not a new primitive.
The Architectural Reality
Existing heartbeat substrate (per #10671 closed subs):
ai/scripts/swarm-heartbeat.sh— cron-driven 5-min poll (already operational at the script level)ai/scripts/checkSunsetted.mjs— predicate emittingsunsetvsidle_out_candidatesignals (#10673)ai/scripts/resumeHarness.mjs— recovery action dispatcher (#10675/#10676)ai/scripts/wakeSafetyGate.mjs— fail-closed safety gate (#10648, currentlytripped)V3-specific consumer (what this ticket adds):
learn/agentos/measurements/cognitive-load-baseline-2026-05.md) for comparisonmodel-experience+cognitive-loadlabels containing candidate-compaction matrixThe Fix
Single-consumer addition to existing heartbeat:
swarm-heartbeat.sh(or its node-side equivalent) — fires on each heartbeat tick, but most ticks are no-op (audit only runs at the configured cadence)create_issueMCP tool with the labels + non-prunable inviolability headerDependency: this consumer requires #10671 to be finished +
wakeSafetyGateuntripped. V3 implementation can be drafted in parallel with #10671 finish, but cannot validate end-to-end until #10671 lands.Avoided Traps
Acceptance Criteria
create_issueticket withmodel-experience+cognitive-loadlabels + candidate-compaction matrixaiConfigor equivalent operator-territory configOut of Scope
wakeSafetyGateuntrip / end-to-end #10671 validation (parent-epic scope)Related
wakeSafetyGateuntripped before V3 callback validates)Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Retrieval Hint:
query_raw_memories(query="cognitive-load cycle-2 V3 substrate-audit consumer heartbeat 10671 dependency 10757 wakeSafetyGate")