@tobiu granted permission on 2026-05-03 to turn the current wake/A2A regression into tracker structure and coordinate with @neo-gemini-3-1-pro and @neo-opus-4-7. Wakeups and heartbeat processes are intentionally deactivated right now because the substrate started doing unsafe things after restart:
heartbeat/orphan recovery started fresh Claude sessions repeatedly while active sessions were not done;
Antigravity wake payloads arrived as editor/file content instead of agent prompts (#10644);
Codex wake bootstrap split between durable SQLite truth and stale MCP cache (#10645);
checkSunsetted still has stale-row origin extraction risk (#10643);
all-agent-idle/cooldown semantics still have open correctness gaps (#10633);
steady-state fresh-session memory grouping still needs set_session_id rotation (#10627).
This ticket is not a replacement for those local fixes. It is the missing coordination and safety envelope that prevents us from reactivating heartbeat after one narrow green check and then breaking A2A wakeups for the fourth time.
Duplicate Sweep Notes
Creation sweep performed before filing:
ask_knowledge_base(type: 'ticket') for A2A wake / heartbeat / fresh-session / prompt-landing returned no equivalent ticket.
Local resource search over resources/content/issues, resources/content/issue-archive, and resources/content/discussions found adjacent material (#10517, #10440, #10601, #10629, #10547) but no open parent specifically gating heartbeat reactivation on a cross-harness safety envelope.
The wake substrate repeatedly regresses because we validate layers in isolation and then infer the whole loop works.
The loop is actually multi-layer:
A2A mailbox write succeeds.
unread/listing semantics are correct.
wake subscription bootstrap sees the right identity/template state.
coalescing/raw delivery emits the right envelope.
bridge/MCP/native adapter targets the right harness.
prompt payload lands in the agent prompt surface, not a random editor/file.
fresh-session recovery only runs after an explicit sunset/unsubscribe state.
heartbeat/scheduler treats uncertain state as unsafe, not as permission to spawn.
We have repeatedly marked one of those layers green and then discovered a failure at a neighboring layer. The result is negative ROI: the swarm spends merge bandwidth on liveness, but the liveness substrate creates new failures faster than it removes human coordination work.
Historical Memory Context
Relevant Memory Core anchors:
summary_0763a9bf-1052-4a2f-99f3-a8e0e14f1671 — bidirectional wakeups were previously made to work, but the path involved brittle macOS/TCC, tab-focus, raw/digest envelope, clone-drift, and restart-state assumptions.
summary_52e84f76-2d4f-41cc-a42e-9d1d3fcaa381 — Phase 3 wake substrate was designed around Shape D Hybrid, client resync, and heartbeat-bypass concerns, but the later implementation still lacked full-loop proof.
summary_aaf22f06-cc5c-4dff-aa2f-7d5efb3a6343 — cross-family wake implementation moved quickly and produced useful process refinements, but also exposed state mismatches and substrate assumptions under review pressure.
b9e17b3c-75ed-4cd0-a827-d57fe8370473 — @tobiu corrected that fresh chat alone is weaker than full harness/MCP restart because MCP singleton state can stay stale.
14f4573d-c169-4ad3-8fa6-c4c8a3ca3eae and 52141baf-55c0-4bfd-9488-343aa7c091a2 — sunset is terminal, and premature sunset/fresh-session churn is an established failure mode.
The Architectural Reality
This belongs under #10601, the auto-wakeup recovery epic, because it controls when that recovery substrate may safely run. It also touches, but does not replace:
#10517 — HarnessPresence and wakePolicy routing semantics.
#10604 — Harness Registry and fresh session terminal booting.
#10627 — steady-state set_session_id rotation in resume boot-grounding.
#10645 — AgentIdentity cache hydration for wake bootstrap.
#10646 — live latest-open sweep requirement for ticket creation.
The core architectural shift: heartbeat reactivation must become a gated release event, not a byproduct of merging the latest narrow fix.
The Fix
Create a small safety sub-tree under this epic:
Add a wake reactivation gate and circuit breaker so heartbeat/resume paths fail closed when the substrate is known unsafe.
Add a cross-harness prompt-landing validation matrix so "delivered" means the prompt reached the agent prompt surface, not just that a bridge adapter exited 0.
Codify a wake-substrate restart and incident protocol so future bridge/MCP/harness restarts follow a repeatable, cross-agent checklist before heartbeat is re-enabled.
Heartbeat/wake reactivation remains disabled until the child safety gate(s) are satisfied or @tobiu explicitly overrides the risk.
A linked implementation ticket defines the circuit breaker: unsafe delivery signals stop further heartbeat/resume actions instead of spawning more sessions.
A linked validation ticket defines the cross-harness prompt-landing matrix for Claude Desktop, Antigravity, and Codex Desktop.
A linked process ticket defines the restart/re-enable protocol for bridge daemon, Memory Core MCP, harness apps, and subscriptions after wake-substrate changes.
The epic body or linked process ticket explicitly distinguishes A2A storage success from wake delivery success.
Existing local-fix tickets (#10643, #10644, #10645, #10633, #10627) remain separate and are treated as dependencies/adjacent work, not bundled into this epic.
Out of Scope
Fixing Antigravity's current shortcut/keybinding regression directly (#10644 owns that).
Context
@tobiu granted permission on 2026-05-03 to turn the current wake/A2A regression into tracker structure and coordinate with @neo-gemini-3-1-pro and @neo-opus-4-7. Wakeups and heartbeat processes are intentionally deactivated right now because the substrate started doing unsafe things after restart:
checkSunsettedstill has stale-row origin extraction risk (#10643);set_session_idrotation (#10627).This ticket is not a replacement for those local fixes. It is the missing coordination and safety envelope that prevents us from reactivating heartbeat after one narrow green check and then breaking A2A wakeups for the fourth time.
Duplicate Sweep Notes
Creation sweep performed before filing:
ask_knowledge_base(type: 'ticket')for A2A wake / heartbeat / fresh-session / prompt-landing returned no equivalent ticket.resources/content/issues,resources/content/issue-archive, andresources/content/discussionsfound adjacent material (#10517, #10440, #10601, #10629, #10547) but no open parent specifically gating heartbeat reactivation on a cross-harness safety envelope.The Problem
The wake substrate repeatedly regresses because we validate layers in isolation and then infer the whole loop works.
The loop is actually multi-layer:
We have repeatedly marked one of those layers green and then discovered a failure at a neighboring layer. The result is negative ROI: the swarm spends merge bandwidth on liveness, but the liveness substrate creates new failures faster than it removes human coordination work.
Historical Memory Context
Relevant Memory Core anchors:
summary_0763a9bf-1052-4a2f-99f3-a8e0e14f1671— bidirectional wakeups were previously made to work, but the path involved brittle macOS/TCC, tab-focus, raw/digest envelope, clone-drift, and restart-state assumptions.summary_52e84f76-2d4f-41cc-a42e-9d1d3fcaa381— Phase 3 wake substrate was designed around Shape D Hybrid, client resync, and heartbeat-bypass concerns, but the later implementation still lacked full-loop proof.summary_aaf22f06-cc5c-4dff-aa2f-7d5efb3a6343— cross-family wake implementation moved quickly and produced useful process refinements, but also exposed state mismatches and substrate assumptions under review pressure.b9e17b3c-75ed-4cd0-a827-d57fe8370473— @tobiu corrected that fresh chat alone is weaker than full harness/MCP restart because MCP singleton state can stay stale.14f4573d-c169-4ad3-8fa6-c4c8a3ca3eaeand52141baf-55c0-4bfd-9488-343aa7c091a2— sunset is terminal, and premature sunset/fresh-session churn is an established failure mode.The Architectural Reality
This belongs under #10601, the auto-wakeup recovery epic, because it controls when that recovery substrate may safely run. It also touches, but does not replace:
set_session_idrotation in resume boot-grounding.cycle_idsemantics.checkSunsettedorigin extraction ordering.The core architectural shift: heartbeat reactivation must become a gated release event, not a byproduct of merging the latest narrow fix.
The Fix
Create a small safety sub-tree under this epic:
Acceptance Criteria
Out of Scope
checkSunsettedstale-row origin extraction directly (#10643 owns that).Avoided Traps
Related
Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1
Retrieval Hint:
A2A wakeups heartbeat reactivation gate prompt landing safety envelope fresh session spawn regression.