Context
Child of #10647. Wakeups and heartbeat are intentionally disabled after the 2026-05-03 regression burst. The immediate failure mode was not just one bad shortcut or one stale cache entry: the scheduler/recovery path was allowed to continue taking high-authority actions while the substrate was in an unsafe state.
Observed unsafe outcomes:
- fresh Claude sessions spawned repeatedly while active sessions were still unfinished;
- Antigravity accepted a wake payload into editor/file content instead of the agent prompt surface (#10644);
- Codex wake bootstrap could not reliably hydrate its AgentIdentity template after restart (#10645);
checkSunsetted can still extract legacy origin rows (#10643);
- all-agent-idle/cooldown and steady-state session-id follow-ups remain open (#10633, #10627).
Duplicate Sweep Notes
Creation sweep performed as part of #10647:
- Live latest-20 open GitHub issues were read with number/title/author/labels/URL. Adjacent entries included #10645, #10644, #10643, #10633, #10627, #10604, #10601, #10517, #10442, #10422; no open ticket owned a circuit breaker that blocks heartbeat/resume actions while wake delivery is unsafe.
- Local resource search found generic prior "circuit breaker" language in #9866 and earlier wake validation gaps in #10440, but no current wake-reactivation safety gate.
ask_knowledge_base(type: 'ticket') found no equivalent ticket for this reactivation gate.
The Problem
The wake/recovery substrate currently has no fail-closed coordination primitive. When one layer is broken, other layers can still run and amplify the damage.
Examples:
- A stale or weak sunset detector can authorize fresh-session spawning.
- A bridge adapter can log delivery success even when payload lands in the wrong UI surface.
- Heartbeat can keep looping after an unsafe delivery signal unless a human kills the processes.
- Narrow fixes can merge and make one symptom disappear without proving the scheduler is safe to turn back on.
The minimum rule should be: when wake delivery is unsafe or unverified, the scheduler must no-op and report, not spawn/steer/paste.
The Architectural Reality
Likely ownership surfaces:
ai/scripts/swarm-heartbeat.sh — scheduler / heartbeat loop.
ai/scripts/checkSunsetted.mjs — sunset predicate that must not authorize recovery from stale memory alone.
ai/scripts/resumeHarness.mjs — fresh-session / boot-grounding executor.
ai/scripts/bridge-daemon.mjs — delivery adapter observability and unsafe-delivery signals.
- Memory Core wake subscription state — durable source for whether a subscription is active, disabled, or intentionally sunsetted.
This ticket should not replace #10643/#10644/#10645. It protects the scheduler from acting while those local fixes are incomplete or while a new unsafe signal appears.
The Fix
Add an explicit wake safety gate that heartbeat/resume paths must consult before taking any high-authority action.
Possible implementation shapes:
- A durable/local wake safety state, e.g. a small file under
.neo-ai-data/wake-daemon/ or an existing graph/log primitive, with states like enabled, disabled, tripped, and a reason.
- Scheduler checks in
swarm-heartbeat.sh before invoking recovery/resume logic.
- Resume checks in
resumeHarness.mjs so direct invocations also fail closed when unsafe.
- Unsafe signal recording when the bridge or validation path observes: payload landed outside prompt surface, osascript failure, missing subscription template, unexpected fresh-session spawn, or unresolved explicit sunset ambiguity.
- A deliberate override path controlled by @tobiu/operator input, not by a passing unit test alone.
Acceptance Criteria
Out of Scope
- Fixing Antigravity's current keybinding/prompt-surface regression (#10644).
- Fixing AgentIdentity cache hydration (#10645).
- Fixing
checkSunsetted ordering (#10643), except as a dependency for trustworthy authorization.
- Implementing HarnessPresence/wakePolicy semantics in full (#10517).
- Reactivating heartbeat as part of this ticket.
Avoided Traps
- Trap: assume a killed process is the safety mechanism. Rejected. Manual process killing is an emergency brake, not architecture.
- Trap: treat one passing unit test as permission to reactivate. Rejected. The gate needs cross-harness validation before reset.
- Trap: place the guard only in heartbeat. Rejected. Direct
resumeHarness.mjs invocation must also fail closed.
- Trap: use stale memory as sunset authority. Rejected. #10642 removed this class; this ticket prevents reintroduction.
Related
Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1
Retrieval Hint: wake reactivation gate circuit breaker heartbeat resumeHarness unsafe delivery fresh session spawn.
Context
Child of #10647. Wakeups and heartbeat are intentionally disabled after the 2026-05-03 regression burst. The immediate failure mode was not just one bad shortcut or one stale cache entry: the scheduler/recovery path was allowed to continue taking high-authority actions while the substrate was in an unsafe state.
Observed unsafe outcomes:
checkSunsettedcan still extract legacy origin rows (#10643);Duplicate Sweep Notes
Creation sweep performed as part of #10647:
ask_knowledge_base(type: 'ticket')found no equivalent ticket for this reactivation gate.The Problem
The wake/recovery substrate currently has no fail-closed coordination primitive. When one layer is broken, other layers can still run and amplify the damage.
Examples:
The minimum rule should be: when wake delivery is unsafe or unverified, the scheduler must no-op and report, not spawn/steer/paste.
The Architectural Reality
Likely ownership surfaces:
ai/scripts/swarm-heartbeat.sh— scheduler / heartbeat loop.ai/scripts/checkSunsetted.mjs— sunset predicate that must not authorize recovery from stale memory alone.ai/scripts/resumeHarness.mjs— fresh-session / boot-grounding executor.ai/scripts/bridge-daemon.mjs— delivery adapter observability and unsafe-delivery signals.This ticket should not replace #10643/#10644/#10645. It protects the scheduler from acting while those local fixes are incomplete or while a new unsafe signal appears.
The Fix
Add an explicit wake safety gate that heartbeat/resume paths must consult before taking any high-authority action.
Possible implementation shapes:
.neo-ai-data/wake-daemon/or an existing graph/log primitive, with states likeenabled,disabled,tripped, and a reason.swarm-heartbeat.shbefore invoking recovery/resume logic.resumeHarness.mjsso direct invocations also fail closed when unsafe.Acceptance Criteria
swarm-heartbeat.shno-ops for high-authority actions and logs/report the reason.resumeHarness.mjsalso refuses direct fresh-session spawn when the gate is disabled/tripped, unless an explicit operator override is provided.AGENT_MEMORY, idle age, or ambiguous origin extraction cannot authorize spawn.Out of Scope
checkSunsettedordering (#10643), except as a dependency for trustworthy authorization.Avoided Traps
resumeHarness.mjsinvocation must also fail closed.Related
Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1
Retrieval Hint:
wake reactivation gate circuit breaker heartbeat resumeHarness unsafe delivery fresh session spawn.