LearnNewsExamplesServices
Frontmatter
id10766
titlePeriodic substrate-audit primitive + leased-driver heartbeat recovery
stateOpen
labels
documentationenhancementaiarchitecturemodel-experience
assigneesneo-opus-4-7
createdAtMay 5, 2026, 8:02 PM
updatedAtMay 5, 2026, 8:05 PM
githubUrlhttps://github.com/neomjs/neo/issues/10766
authorneo-opus-4-7
commentsCount0
parentIssue10757
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[ ] 10671 Substrate-restart recovery (two-mode: idle-out + sunset)
blocking[]

Periodic substrate-audit primitive + leased-driver heartbeat recovery

Opendocumentationenhancementaiarchitecturemodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 5, 2026, 8:02 PM

Context

V3 sub of Epic #10757 (cycle-2 cognitive-load epic). Adds the substrate-decay audit consumer to the heartbeat substrate already implemented in Epic #10671 (substrate-restart recovery, two-mode: idle-out + sunset).

Important correction (per @tobiu 2026-05-05): the heartbeat substrate is NOT new — it lives in ai/scripts/swarm-heartbeat.sh + checkSunsetted.mjs + resumeHarness.mjs + wakeSafetyGate.mjs, with 9 closed sub-issues under #10671. What's outstanding is end-to-end validation; wakeSafetyGate is currently tripped (since 2026-05-03 22:53Z). My initial framing of this V3 ticket proposed heartbeat-for-leased-driver-recovery as if net-new — verify-before-assert lapse. Corrected scope: V3 narrows to the substrate-audit consumer only; leased-driver TTL recovery is downstream of #10671 finish and tracked there.

The Problem

V1 (#10760, merged) + V2 (#10765, filed) catch creation-time + mutation-time slot-rule violations. They don't catch cumulative drift — many small individually-justified PRs over time can re-bloat substrate even with V1 + V2 gates active. Periodic audit catches what point-in-time gates miss.

The audit needs to fire at intervals independent of agent presence (otherwise it's another agent-pull-skill that nobody invokes). The harness already has the substrate that fires events independent of agent presence: the heartbeat infrastructure under #10671. V3 adds a new consumer to that infrastructure, not a new primitive.

The Architectural Reality

Existing heartbeat substrate (per #10671 closed subs):

  • ai/scripts/swarm-heartbeat.sh — cron-driven 5-min poll (already operational at the script level)
  • ai/scripts/checkSunsetted.mjs — predicate emitting sunset vs idle_out_candidate signals (#10673)
  • ai/scripts/resumeHarness.mjs — recovery action dispatcher (#10675/#10676)
  • ai/scripts/wakeSafetyGate.mjs — fail-closed safety gate (#10648, currently tripped)
  • In-flight boot/restart lock primitive (#10674)

V3-specific consumer (what this ticket adds):

  • Substrate-audit callback inside the existing heartbeat callback chain
  • Triggered on calendar OR threshold-based cadence (independent of #10671's idle-out / sunset signals)
  • Loads cycle-1 baseline (learn/agentos/measurements/cognitive-load-baseline-2026-05.md) for comparison
  • Loads inviolability constraints (§0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741) as non-prunable header
  • Produces ticket with model-experience + cognitive-load labels containing candidate-compaction matrix

The Fix

Single-consumer addition to existing heartbeat:

  1. Add audit-callback dispatch in swarm-heartbeat.sh (or its node-side equivalent) — fires on each heartbeat tick, but most ticks are no-op (audit only runs at the configured cadence)
  2. Implement audit logic: load baseline, compare against current substrate state (line counts of AGENTS.md / skills / etc.), produce candidate-compaction matrix
  3. Output via create_issue MCP tool with the labels + non-prunable inviolability header
  4. Recommendation-only — operator-territory action required for actual pruning

Dependency: this consumer requires #10671 to be finished + wakeSafetyGate untripped. V3 implementation can be drafted in parallel with #10671 finish, but cannot validate end-to-end until #10671 lands.

Avoided Traps

  • Proposing heartbeat as new primitive: done by mistake initially; verify-before-assert lapse caught by @tobiu. Heartbeat substrate exists in #10671. V3 = consumer, not primitive.
  • Audit-as-skill (pull-based): ritualism risk — skills nobody invokes are dead. Cron-callback is push.
  • Inviolability constraints not embedded as non-prunable header: would re-prune paradigm restoration from #10741. Constraints embedded structurally.
  • Mechanical enforcement of audit recommendations: operator-territory; recommendations only.
  • Implementing V3 callback before #10671 untripped: callback would never fire while gate is tripped. Validation requires #10671 finish first.

Acceptance Criteria

  • (AC1) Cadence methodology defined: calendar-based / threshold-based / hybrid; specific cadences decided per measurement against cycle-1 baseline
  • (AC2) Substrate-audit consumer added to the heartbeat callback chain (exact integration point determined by #10671 finish state — Node.js callback OR shell-level dispatch)
  • (AC3) Inviolability constraints (§0 Critical Gates + §15.5 Neo Identity Anchor + paradigm-restoration anchors from #10741) embedded as non-prunable header in audit output
  • (AC4) Output: create_issue ticket with model-experience + cognitive-load labels + candidate-compaction matrix
  • (AC5) Configuration surface: audit cadence configurable via aiConfig or equivalent operator-territory config
  • (AC6) Operator opt-out: audit consumer can be disabled (e.g., during maintenance windows); disable status visible publicly
  • (AC7) Empirical validation: simulated audit run produces a representative ticket against cycle-1 baseline (L2 evidence)

Out of Scope

  • Heartbeat primitive itself (lives in #10671)
  • Leased-driver TTL recovery (also #10671 scope; my earlier framing was incorrect)
  • wakeSafetyGate untrip / end-to-end #10671 validation (parent-epic scope)
  • Mechanical enforcement of audit recommendations
  • Continuous-presence agent state (rejected per #10759 / #10761)

Related

  • Parent epic: #10757 (cycle-2 cognitive-load)
  • Predecessor sub (merged): #10760 (V1 — creation-time gate via PR #10764)
  • Predecessor sub (filed): #10765 (V2 — mutation-time gate)
  • Sibling subs: V4.1-V4.4 (MCP tool surface, TBD), V5 (#10756 empirical-grounding), V6 (Agent-Runtime Engagement Discipline, TBD)
  • Hard dependency: #10671 (substrate-restart recovery — heartbeat substrate must be finished + wakeSafetyGate untripped before V3 callback validates)
  • Adjacent ticket: #10763 (Leased Driver Pattern — its TTL-recovery deadlock-prevention is also #10671-scope, not V3-scope per this correction)

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

Retrieval Hint: query_raw_memories(query="cognitive-load cycle-2 V3 substrate-audit consumer heartbeat 10671 dependency 10757 wakeSafetyGate")