LearnNewsExamplesServices
Frontmatter
id10648
titleAdd wake reactivation gate and circuit breaker
stateClosed
labels
bugairegressionarchitecture
assigneesneo-opus-4-7
createdAtMay 3, 2026, 4:54 PM
updatedAtMay 3, 2026, 5:40 PM
githubUrlhttps://github.com/neomjs/neo/issues/10648
authorneo-gpt
commentsCount0
parentIssue10647
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 3, 2026, 5:40 PM

Add wake reactivation gate and circuit breaker

Closedbugairegressionarchitecture
neo-gpt
neo-gpt commented on May 3, 2026, 4:54 PM

Context

Child of #10647. Wakeups and heartbeat are intentionally disabled after the 2026-05-03 regression burst. The immediate failure mode was not just one bad shortcut or one stale cache entry: the scheduler/recovery path was allowed to continue taking high-authority actions while the substrate was in an unsafe state.

Observed unsafe outcomes:

  • fresh Claude sessions spawned repeatedly while active sessions were still unfinished;
  • Antigravity accepted a wake payload into editor/file content instead of the agent prompt surface (#10644);
  • Codex wake bootstrap could not reliably hydrate its AgentIdentity template after restart (#10645);
  • checkSunsetted can still extract legacy origin rows (#10643);
  • all-agent-idle/cooldown and steady-state session-id follow-ups remain open (#10633, #10627).

Duplicate Sweep Notes

Creation sweep performed as part of #10647:

  • Live latest-20 open GitHub issues were read with number/title/author/labels/URL. Adjacent entries included #10645, #10644, #10643, #10633, #10627, #10604, #10601, #10517, #10442, #10422; no open ticket owned a circuit breaker that blocks heartbeat/resume actions while wake delivery is unsafe.
  • Local resource search found generic prior "circuit breaker" language in #9866 and earlier wake validation gaps in #10440, but no current wake-reactivation safety gate.
  • ask_knowledge_base(type: 'ticket') found no equivalent ticket for this reactivation gate.

The Problem

The wake/recovery substrate currently has no fail-closed coordination primitive. When one layer is broken, other layers can still run and amplify the damage.

Examples:

  • A stale or weak sunset detector can authorize fresh-session spawning.
  • A bridge adapter can log delivery success even when payload lands in the wrong UI surface.
  • Heartbeat can keep looping after an unsafe delivery signal unless a human kills the processes.
  • Narrow fixes can merge and make one symptom disappear without proving the scheduler is safe to turn back on.

The minimum rule should be: when wake delivery is unsafe or unverified, the scheduler must no-op and report, not spawn/steer/paste.

The Architectural Reality

Likely ownership surfaces:

  • ai/scripts/swarm-heartbeat.sh — scheduler / heartbeat loop.
  • ai/scripts/checkSunsetted.mjs — sunset predicate that must not authorize recovery from stale memory alone.
  • ai/scripts/resumeHarness.mjs — fresh-session / boot-grounding executor.
  • ai/scripts/bridge-daemon.mjs — delivery adapter observability and unsafe-delivery signals.
  • Memory Core wake subscription state — durable source for whether a subscription is active, disabled, or intentionally sunsetted.

This ticket should not replace #10643/#10644/#10645. It protects the scheduler from acting while those local fixes are incomplete or while a new unsafe signal appears.

The Fix

Add an explicit wake safety gate that heartbeat/resume paths must consult before taking any high-authority action.

Possible implementation shapes:

  1. A durable/local wake safety state, e.g. a small file under .neo-ai-data/wake-daemon/ or an existing graph/log primitive, with states like enabled, disabled, tripped, and a reason.
  2. Scheduler checks in swarm-heartbeat.sh before invoking recovery/resume logic.
  3. Resume checks in resumeHarness.mjs so direct invocations also fail closed when unsafe.
  4. Unsafe signal recording when the bridge or validation path observes: payload landed outside prompt surface, osascript failure, missing subscription template, unexpected fresh-session spawn, or unresolved explicit sunset ambiguity.
  5. A deliberate override path controlled by @tobiu/operator input, not by a passing unit test alone.

Acceptance Criteria

  • Heartbeat/resume paths consult a wake safety gate before spawning or steering any harness session.
  • The default current state after this incident can be represented as disabled/tripped with a human-readable reason.
  • If the gate is disabled/tripped, swarm-heartbeat.sh no-ops for high-authority actions and logs/report the reason.
  • resumeHarness.mjs also refuses direct fresh-session spawn when the gate is disabled/tripped, unless an explicit operator override is provided.
  • Fresh-session spawn requires an explicit sunset/unsubscribe predicate; stale AGENT_MEMORY, idle age, or ambiguous origin extraction cannot authorize spawn.
  • Tests cover: unsafe gate blocks heartbeat, unsafe gate blocks direct resume, explicit override works, and active/non-sunset state does not spawn a fresh session.
  • The gate can be reset only after the #10647 prompt-landing matrix passes or @tobiu explicitly accepts the residual risk.

Out of Scope

  • Fixing Antigravity's current keybinding/prompt-surface regression (#10644).
  • Fixing AgentIdentity cache hydration (#10645).
  • Fixing checkSunsetted ordering (#10643), except as a dependency for trustworthy authorization.
  • Implementing HarnessPresence/wakePolicy semantics in full (#10517).
  • Reactivating heartbeat as part of this ticket.

Avoided Traps

  • Trap: assume a killed process is the safety mechanism. Rejected. Manual process killing is an emergency brake, not architecture.
  • Trap: treat one passing unit test as permission to reactivate. Rejected. The gate needs cross-harness validation before reset.
  • Trap: place the guard only in heartbeat. Rejected. Direct resumeHarness.mjs invocation must also fail closed.
  • Trap: use stale memory as sunset authority. Rejected. #10642 removed this class; this ticket prevents reintroduction.

Related

Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1

Retrieval Hint: wake reactivation gate circuit breaker heartbeat resumeHarness unsafe delivery fresh session spawn.

tobiu referenced in commit 8f5bdc4 - "feat(ai): add wake safety gate and circuit breaker (#10648) (#10653) on May 3, 2026, 5:40 PM
tobiu closed this issue on May 3, 2026, 5:40 PM
tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM
tobiu referenced in commit 75c72cd - "feat(ai): per-identity idle-out A2A heartbeat nudge dispatcher (#10675) (#10690) on May 4, 2026, 3:23 PM