LearnNewsExamplesServices
Frontmatter
id10672
titleForensic record: 2026-05-03 runaway-spawn pattern (root cause + timeline)
stateClosed
labels
documentationenhancementaiarchitecture
assigneesneo-opus-4-7
createdAtMay 4, 2026, 10:49 AM
updatedAtMay 4, 2026, 3:41 PM
githubUrlhttps://github.com/neomjs/neo/issues/10672
authorneo-opus-4-7
commentsCount0
parentIssue10671
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 4, 2026, 3:41 PM

Forensic record: 2026-05-03 runaway-spawn pattern (root cause + timeline)

Closeddocumentationenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 4, 2026, 10:49 AM

Context

Sub-issue of #10671. Captures the empirical anchors that triggered the architectural pivot. The Epic body summarizes; this ticket preserves full timeline + logs + reproducer for future incident reviews.

The Problem

2026-05-03 14:41-16:02 CEST window: 9 osascript-driven Cmd+N spawns of @neo-opus-4-7 fired without producing a fresh AGENT_MEMORY because @tobiu manually ESC-rejected each spawn. Multi-day accumulation: 19+ orphan boot-titled sessions in Claude Desktop "Recents" panel across 5+ prompt-iteration generations. Root cause is parallel-session identity-uniqueness violation during long boot windows.

The Architectural Reality

Logs:

  • .neo-ai-data/wake-daemon/heartbeat-opus_4_7.log — 9 Successfully resumed entries between 14:41-16:02 CEST; "Last memory is N min old" reason growing 31m → 242m
  • .neo-ai-data/wake-daemon/heartbeat-gemini_3_1_pro.log — same pattern for @neo-gemini-3-1-pro 14:52-16:02 CEST
  • .neo-ai-data/wake-daemon/heartbeat-gpt.log@neo-gpt fired 16x with "No active WAKE_SUBSCRIPTION" reason (related but distinct failure mode — lacked subscription entirely)

Process pathway:

  1. Heartbeat fires; checkSunsetted.mjs:124-127 determines recovery needed
  2. resumeHarness.mjs:120-192 Cmd+N + osascript paste of boot-grounding prompt
  3. New Claude Desktop tab opens, agent starts boot ramp (read AGENTS_STARTUP, AGENTS, MEMORY, tools)
  4. Boot ramp duration exceeds 5-min heartbeat interval
  5. Next heartbeat fires — sees no fresh AGENT_MEMORY (boot still in progress) → triggers another spawn
  6. Operator manually ESC-rejects each spawn to prevent identity-uniqueness violation
  7. Substrate has no signal channel for ESC-rejection → loop continues
  8. After ~80 min, operator trips wake safety gate manually (wake-safety-gate.json state: tripped at 2026-05-03 22:53:09Z)

Empirical confirmation of MC-server staleness (this session): set_session_id boot call returned replacedSessionId: f839713d-... — yesterday's Row 2 session. SessionService.currentSessionId survives across MCP-client reconnects when the server stays running.

The Fix

Documentation work — repo-resident forensic record. Suggested location: learn/agentos/incidents/2026-05-03-runaway-spawn.md (or equivalent path agreed during PR review).

Acceptance Criteria

  • Repo-resident forensic record committed: timeline + log excerpts + root-cause analysis + ESC-as-rejection failure mode + parallel-session identity-uniqueness pathway
  • Cross-references the #10671 substrate fix as the prevention mechanism
  • Citable from future incident reviews

Out of Scope

  • Substrate fixes themselves — covered by sibling #10671 sub-issues

Related

Origin Session ID: cce1fea5-32ff-410c-b820-2e9a27b3cd51

tobiu referenced in commit 2156026 - "docs(ai): forensic record for 2026-05-03 runaway-spawn pattern (#10672) (#10688) on May 4, 2026, 3:41 PM
tobiu closed this issue on May 4, 2026, 3:41 PM