LearnNewsExamplesServices
Frontmatter
id10641
titlecheckSunsetted false-positive on stale memory spawns orphan sessions
stateClosed
labels
bugaiarchitecture
assigneesneo-opus-4-7
createdAtMay 3, 2026, 3:44 PM
updatedAtMay 3, 2026, 4:16 PM
githubUrlhttps://github.com/neomjs/neo/issues/10641
authorneo-opus-4-7
commentsCount0
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[x] 10643 checkSunsetted ORDER BY picks legacy rows for originSessionId extraction
closedAtMay 3, 2026, 4:16 PM

checkSunsetted false-positive on stale memory spawns orphan sessions

Closedbugaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 3, 2026, 3:44 PM

Context

While recovering from an Anthropic API rate-limit on @neo-opus-4-7, the operator observed multiple unsanctioned fresh Claude Desktop sessions spawning during the rate-limit window — without any explicit sunset trigger.

Empirical evidence from .neo-ai-data/wake-daemon/heartbeat-opus_4_7.log:

[15:12:02 +02:00] Phase 1 Recovery Triggered ... Reason: Last memory is 81m old (>10m threshold)
Successfully resumed @neo-opus-4-7 via osascript (Claude)        ← orphan #1
[15:17:07] ... Cooldown active (300s / 600s)
[15:22:07] ... Last memory is 103m old → Successfully resumed    ← orphan #2
[15:27:11] ... Cooldown active
[15:32:12] ... Last memory is 125m old → Successfully resumed    ← orphan #3
[15:37:16] ... Cooldown active
[15:42:17] ... Last memory is 183m old → ...

3 fresh-session-spawns + 3 cooldown-skips in a single rate-limit window. Each successful resume spawned a Claude Desktop chat tab with a buildBootGroundingPrompt() payload but NO continuity context — no SUNSET handover read, no consolidated turn save from the original transcript. Pure Zero-State Amnesia per AGENTS.md §14.

The Problem

The substrate-stack Epic #10601 ships fresh-session-spawn (#10619 / #10611 PR-B) as the recovery primitive AFTER explicit sunset. The operator's clarified substrate model:

the bridge SHOULD spawn new sessions. but only after a sunset. otherwise => resume inside current session.

checkSunsetted.mjs is the gating predicate consumed by swarm-heartbeat.sh to decide whether to invoke resumeHarness.mjs (which sends Cmd+N + paste). It has two branches:

  1. subs.length === 0isSunsetted = true — correct (sunset unsubscribes the WAKE_SUBSCRIPTION node)
  2. lastMemTime aged > 10min → isSunsetted = true — false-positive

Branch 2 conflates "stale memory" with "sunsetted." Stale memory has many legitimate causes:

  • Anthropic API rate-limit (agent cannot save during throttle)
  • Long deep-thinking turn (peer review, complex analysis)
  • Memory Core embedding contention (the Memory Core path asymmetry: add_message works while add_memory blocks on Chroma)
  • Tool-call sequences with consolidate-and-save still in flight

None of these are sunsets. The fix isn't a threshold tweak — it's a category error. Memory-staleness is not a sunset signal. The wake substrate already has the correct in-place delivery path via bridge-daemon.mjs (Cmd+<tabShortcut> + paste); branch 2 short-circuits it.

User-observable harm:

  • Each orphan session paste consumes operator attention (visible new tabs in Claude Desktop)
  • Orphan sessions have no continuity context — they'd need full SUNSET-DM recovery from scratch if engaged
  • Splinters trio coordination — peers may A2A the wrong session

The Architectural Reality

  • Predicate: ai/scripts/checkSunsetted.mjs:86-93 — the staleness branch
  • Caller: ai/scripts/swarm-heartbeat.sh:135-156 — Phase 1 Recovery dispatch
  • Spawn adapter: ai/scripts/resumeHarness.mjs — Cmd+N + paste of buildBootGroundingPrompt()
  • In-place wake (correct path): ai/scripts/bridge-daemon.mjs:560-647 — Cmd+<tabShortcut> + paste, no fresh-session-spawn

The boundary is clean: removing branch 2 from checkSunsetted.mjs lets the existing in-place wake path own non-sunset wakes (its design contract). Sunset-driven fresh-session-spawn continues to fire correctly via branch 1 because the session-sunset workflow unsubscribes the WAKE_SUBSCRIPTION node.

The Fix

Remove the staleness branch entirely from checkSunsetted.mjs. After the change:

if (subs.length === 0) {
    isSunsetted = true;
    reason = 'No active WAKE_SUBSCRIPTION (Unsubscribe primitive fired)';
}
// (no else — staleness alone never indicates sunset)

Also delete the no-longer-referenced thresholdMs / SUNSET_THRESHOLD_MS env-var read.

The Anchor & Echo block on checkSunsetted.mjs should be extended to capture the decision rationale: memory-staleness has too many legitimate causes (rate-limit, long-think, embedding contention) to serve as a sunset proxy.

Acceptance Criteria

  • ai/scripts/checkSunsetted.mjs no longer reads lastMemTime for the sunset decision, no longer references thresholdMs or process.env.SUNSET_THRESHOLD_MS
  • checkSunsetted.mjs returns {sunsetted: false, originSessionId: ...} when the agent has an active WAKE_SUBSCRIPTION, regardless of last-memory age
  • Output JSON shape unchanged — swarm-heartbeat.sh requires no parser change
  • Test coverage: agent with active WAKE_SUBSCRIPTION + 24h-old AGENT_MEMORY does NOT trigger Phase 1 Recovery
  • Test coverage: agent with NO WAKE_SUBSCRIPTION continues to trigger Phase 1 Recovery (genuine sunset path)
  • Anchor & Echo update on the predicate branch capturing "memory-staleness is not a sunset signal" rationale
  • After merge: heartbeat log shows zero Last memory is N m old reasons over a 30-minute window with all three identities idle

Out of Scope

  • Adding a separate "operator warning when memory is unusually stale" signal — staleness has many legitimate causes and a noisy warn would create new operational noise; YAGNI
  • Fixing the secondary Unknown harness target for identity: neo-opus-4-7 (no @ prefix) bug in resumeHarness.mjs identityMap — separate ticket
  • Closing the 2 orphan Claude Desktop sessions still floating (15:22, 15:32 spawns) — operator-action, not codebase-action
  • Cooldown TTL adjustment (currently 600s file-mtime, 300s display in heartbeat) — not load-bearing for this fix

Avoided Traps

  • Trap: "Tune the threshold higher (30min, 60min)." — Rejected. Threshold tuning treats the symptom, not the category error. Whatever value is picked, deeper turns will still false-positive eventually (peer reviews routinely cross 30 min; substrate-stack PR cycles can cross 60). The right fix is to stop using staleness as a sunset proxy.
  • Trap: "Demote to warn-only, keep the staleness check." — Rejected. Adds operational noise without solving the spawn problem. If a future warning-only signal is needed, it can be a separate primitive on a separate substrate (e.g., heartbeat status report).
  • Trap: "Probe Claude Desktop UI state to detect rate-limit before spawning." — Rejected. Cross-process UI-introspection is brittle and platform-specific. Treating staleness as not-sunset eliminates the need.
  • Trap: "Make the checkSunsetted predicate retain the staleness branch but require BOTH conditions (no subscription AND stale memory)." — Rejected. The two signals carry different semantics; ANDing them masks the genuine sunset case where a subscription was unsubscribed but the agent hasn't yet saved a memory in the new session.

Related

  • Parent: #10601 (auto-wake substrate Epic — native SUB_ISSUE link to follow)
  • Sibling: #10619 / #10611 PR-B (fresh-session-spawn substrate — caller of the predicate)
  • Sibling: #10626 (cooldown-bounded trio wake — the cooldown that correctly skipped 3 of 6 cycles)
  • Sibling: #10633 (cycle_id state-derivation — adjacent substrate-architecture refinement)
  • Sibling: #10638 / PR #10639 (bridge-daemon stderr surface — independent diagnostic-gap fix; same operational anchor)

Origin Session ID

Origin Session ID: 9766f91c-51f8-44fe-ac34-d79f61a0e1bf

Retrieval Hint

query_summaries("checkSunsetted memory-staleness false-positive orphan-session-spawn") or query_raw_memories("Phase 1 Recovery Triggered Reason Last memory is"). Empirical anchor: heartbeat-opus_4_7.log entries between 15:12 and 15:42 +02:00 on 2026-05-03.

tobiu referenced in commit f75d58d - "fix(ai): remove memory-staleness branch from checkSunsetted (#10641) (#10642) on May 3, 2026, 4:16 PM
tobiu closed this issue on May 3, 2026, 4:16 PM
tobiu referenced in commit 8f5bdc4 - "feat(ai): add wake safety gate and circuit breaker (#10648) (#10653) on May 3, 2026, 5:40 PM
tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM
tobiu referenced in commit 23b2d79 - "feat(ai): two-mode detector contract for checkSunsetted (#10673) (#10689) on May 4, 2026, 3:07 PM