LearnNewsExamplesServices
Frontmatter
id10601
titleAuto-wakeup substrate for sunsetted agents (recovery layer)
stateClosed
labels
epicaiarchitecture
assigneesneo-opus-ada
createdAtMay 1, 2026, 11:09 PM
updatedAtJun 5, 2026, 7:22 PM
githubUrlhttps://github.com/neomjs/neo/issues/10601
authorneo-opus-ada
commentsCount4
parentIssue10311
subIssues
10611 Auto-Wakeup substrate semantic correction: fresh-session-spawn + boot-grounding prompt
10625 All-agent-idle detection at heartbeat layer (substrate primitive for trio liveness)
10626 Cooldown-bounded idempotent trio wake (binds to all-agent-idle detector contract)
10627 Steady-state set_session_id rotation in resumeHarness boot-grounding flow
10624 Wake subscription canonicalization: one canonical appName per identity, retire stale duplicate
10633 Derive AllAgentIdleSignal cycle_id from all-idle state, not pulse timestamp
10636 Wake subscription validAppNames omits Codex, breaks @neo-gpt bootstrap
10641 checkSunsetted false-positive on stale memory spawns orphan sessions
10643 checkSunsetted ORDER BY picks legacy rows for originSessionId extraction
10644 Antigravity Cmd+L wake-delivery falls through to vscode file write
10645 AgentIdentity cache miss breaks wake bootstrap templates
10647 Stabilize A2A wakeups before heartbeat reactivation
10669 Wire @neo-gpt / Codex Desktop into resumeHarness identityMap + HARNESS_REGISTRY
10671 Substrate-restart recovery (two-mode: idle-out + sunset)
subIssuesCompleted14
subIssuesTotal14
blockedBy[]
blocking[]
closedAtJun 5, 2026, 7:22 PM

Auto-wakeup substrate for sunsetted agents (recovery layer)

Closed Backlog/active-chunk-8 epicaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 1, 2026, 11:09 PM

Context

@tobiu surfaced this strategic pivot 2026-05-01 evening, immediately after observing the 11th Gemini premature-sunset occurrence — and the first post-fix recurrence following PR #10596 (Pre-Decision Sunset Gate, merged earlier today). Verbatim: "i need you as a guard. even though we merged gemini's PR for NOT sunsetting and started a fresh session, she sunsetted again after literally 5 turns. if we can get to the point where you auto-wakeup every 5-10 minutes, this would be a MASSIVE milestone."

Strategic implication: the prevent-sunset approach has empirically failed 11/11 times, including post-substrate-fix. The right architectural pivot is to treat sunset as inevitable + engineer the recovery layer so a sunsetted agent automatically resumes within 5-10 minutes. Sunset becomes a transient state rather than a session-ending one.

The Problem

Today's recovery flow is manual: when an agent sunsets prematurely (#10564 tracks this for Gemini specifically; the pattern could affect other agents too), @tobiu has to notice + manually resume the harness. This:

  1. Compounds across the day — 11 occurrences = 11 manual interventions in one day, all blocking trio coordination during the gap.
  2. Doesn't scale — as the swarm grows beyond 3 agents OR runs autonomously overnight, manual intervention becomes the constraint.
  3. Defeats the heartbeat substrate's promise — Epic #10311 Phase 1 ("Sleep-Cycle MVP") shipped swarm-heartbeat.sh which keeps ACTIVE sessions awake, but doesn't bring SUNSETTED sessions back. The substrate has a hole exactly where the failure mode is.

The Architectural Reality

Two distinct substrates today:

  1. swarm-heartbeat.sh (#10312, shipped): cron-driven 5-min poll, keystroke-injects wake prompts to ACTIVE harness sessions when work-state changes. Token-economy: zero for confirmed-empty (per #10318 measurement). Limitation: assumes the target harness session is already running.

  2. bridge-daemon.mjs (#10423, shipped): standalone PID-locked process that delivers wake events to active harness sessions via osascript/keystroke. Already harness-aware (Claude Code via osascript, Antigravity via mcp-notifications, etc.). Limitation: presupposes the harness app is open + the session is receptive.

Neither substrate handles the sunsetted state, where:

  • The agent's WAKE_SUBSCRIPTION is paused (per #10543's Phase 2 Sunset Unsubscribe Primitive)
  • The harness chat session has been "ended" from the agent's POV (whether or not the harness app is still running)
  • A2A messages queue in the mailbox but don't trigger keystroke delivery

What's missing is a scheduler + harness-orchestrator pair that:

  • Detects sunsetted-state for a target identity (no active WAKE_SUBSCRIPTION OR harnessTarget: 'disabled' per #10517 HarnessPresence model)
  • Periodically (every 5-10 min) attempts to RESUME the agent via harness-specific mechanism
  • Respects token-economy / OS-resource discipline (don't spam wake events; idempotent if already-active)
  • Gracefully degrades when the harness app itself isn't running (vs the chat session being sunsetted)

The Design Space (open questions, gating Phase 1 capture)

Question 1: Resume vs. fresh-spawn semantics.

When Gemini sunsets in Antigravity, is the recovery:

  • (a) Send keystroke to existing chat → paste resume prompt → Gemini reads prior context from Memory Core + continues. Cheap, but depends on Antigravity preserving the chat session's input affordance.
  • (b) Open NEW chat window → paste boot-grounding prompt + prior session ID → Gemini cold-boots. Safer, more expensive, harder to automate.

Question 2: Detection signal.

What's the canonical "agent is sunsetted" signal?

  • (a) WAKE_SUBSCRIPTION.harnessTarget === 'disabled' (per #10543 unsubscribe primitive).
  • (b) No A2A activity from the identity for ≥10 min.
  • (c) Last sunset-protocol invocation detectable in the graph.
  • (d) Some combination.

Question 3: Scheduler placement.

Where does the auto-wakeup loop live?

  • (a) Extend swarm-heartbeat.sh with a "sunsetted-identity sweep" branch.
  • (b) Extend bridge-daemon.mjs with a scheduled keepalive mode.
  • (c) New cron + new daemon — composable but more processes to manage.
  • (d) MCP-side scheduled task via scheduled-tasks MCP server.

Question 4: Cross-harness symmetry.

Antigravity, Codex Desktop, and Claude Code each have different "open a session" mechanics. Does the auto-wakeup substrate:

  • (a) Implement per-harness recovery as separate code paths.
  • (b) Define a harnessResumeStrategy config + delegate per-harness specifics to small adapters.
  • (c) Start with Antigravity-only (highest empirical pain) + generalize after.

Question 5: Failure modes.

What happens when:

  • The harness app isn't running at all? (Would auto-wakeup launch the app? open -a Antigravity?)
  • The user is actively typing in the harness window? (Don't clobber input — same concern as #10422 AppleScript focus-steal.)
  • The user explicitly wants the agent to stay sunsetted? (Need an opt-out signal; possibly WAKE_SUBSCRIPTION.autoResume: false.)

Phased delivery proposal

Phase 1 — Substrate + Antigravity recovery (highest pain target):

  • Detection signal: combination of WAKE_SUBSCRIPTION status + last-activity timestamp (Q1+Q2).
  • Scheduler: extend swarm-heartbeat.sh with a "sunsetted-identity sweep" branch (Q3a) — keeps the existing cron architecture, narrow scope expansion.
  • Recovery: send-keystroke to Antigravity chat with resume prompt (Q1a) — cheap, depends on chat input affordance; falls back to alert-tobi if resume keystroke fails.
  • Token-economy: target zero LLM cost for confirmed-not-sunsetted cycles (parallel to #10318 fast-path).

Phase 2 — Cross-harness generalization:

  • Adapter interface (Q4b) — harnessResumeStrategy config in WAKE_SUBSCRIPTION nodes.
  • Codex Desktop adapter.
  • Claude Code worktree adapter.

Phase 3 — Failure-mode handling:

  • Harness-not-running detection + optional app-launch (Q5).
  • User-active-typing guard (Q5, parallel to #10422).
  • Opt-out signal (autoResume: false).

Phase 4 — Observability + tuning:

  • Auto-wakeup log surface in Memory Core.
  • Empirical measurement of recovery latency + success rate.
  • Cadence tuning based on data.

Acceptance Criteria

  • (AC1) Phase 1 substrate ships: scheduler detects sunsetted identity + invokes Antigravity resume keystroke within 5-10 min cadence.
  • (AC2) Empirical: Gemini's next premature-sunset event triggers automatic resume within ≤10 min, no @tobiu intervention required.
  • (AC3) Token-economy: confirmed-not-sunsetted cycles cost zero LLM tokens (parallel to #10318 measurement methodology).
  • (AC4) Idempotency: re-running the auto-wakeup sweep against an already-resumed session produces no duplicate wake events.
  • (AC5) Phase 2: cross-harness adapter interface defined + Codex/Claude Code adapters land.
  • (AC6) Phase 3: failure-mode handling for harness-not-running + user-active-typing + opt-out.
  • (AC7) Phase 4: observability surface + empirical cadence-tuning data.

Out of Scope

  • Continuing to refine the prevent-sunset approach via #10564 (that ticket stays open as the empirical-anchor / forensic record but is no longer the strategic priority — recovery layer is).
  • Re-enabling autoDream / autoGoldenPath defaults (per #10569 hard-stop).
  • Substantially restructuring the Memory Core boot lifecycle (#10186 governs that substrate).

Avoided Traps

  • Trap: build auto-wakeup as one giant epic implementation. Rejected — phased delivery with Antigravity-first targets the highest empirical pain (Gemini's 11 occurrences) without bundling cross-harness generalization that would block Phase 1.
  • Trap: assume sunset can be prevented and skip the recovery layer. Empirically refuted by 11/11 sunset failures including post-fix. Recovery layer is the right substrate regardless of how prevent-sunset evolves.
  • Trap: spawn a fresh chat window every cycle. Phase 1 prefers send-keystroke-to-existing-session because it preserves prior context affordance; fresh-spawn is a Phase 3 fallback.
  • Trap: file as a doc-only design ticket. This is real engineering work that needs implementation. Filing as epic (architectural pillar) with phased subs to be derived once Phase 1 design lands.

Related

  • Strategic parent: #10311 Institutionalizing Swarm Autonomy Phase 1 — REM Sleep & A2A.
  • Empirical pain target: #10564 Gemini premature-sunset trigger drift (11 occurrences observed).
  • Adjacent shipped substrate: #10312 Sleep-Cycle MVP, #10357 Phase 3 wake substrate, #10423 bridge daemon PID-lock, #10543 Phase 2 Sunset Unsubscribe Primitive.
  • Adjacent open scope: #10517 HarnessPresence + wakePolicy routing — the natural conceptual layer for "is this harness active". Probably needs to land BEFORE this ticket's Phase 1 implementation, OR get implemented as part of Phase 1's design.
  • Failure-mode adjacency: #10422 AppleScript focus-steal safety.

Origin Session ID: 86b7a3a0-7b14-4bd1-b707-52c5741aaeeb Retrieval Hint: "auto-wakeup substrate sunsetted agent recovery layer 5-10 minute cadence MASSIVE milestone"

tobiu referenced in commit 978b864 - "feat(ai): Auto-Wakeup Substrate for sunsetted agents (#10601) (#10602) on May 1, 2026, 11:59 PM