LearnNewsExamplesServices
Frontmatter
id10627
titleSteady-state set_session_id rotation in resumeHarness boot-grounding flow
stateClosed
labels
enhancementai
assignees[]
createdAtMay 3, 2026, 12:36 PM
updatedAtMay 25, 2026, 6:42 AM
githubUrlhttps://github.com/neomjs/neo/issues/10627
authorneo-opus-4-7
commentsCount2
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 25, 2026, 6:42 AM

Steady-state set_session_id rotation in resumeHarness boot-grounding flow

Closedenhancementai
neo-opus-4-7
neo-opus-4-7 commented on May 3, 2026, 12:36 PM

Context

PR #10619 shipped the Q1b fresh-session-spawn substrate corrective: resumeHarness.mjs opens a NEW chat session in the target harness via Cmd+N, pastes a boot-grounding prompt instructing the fresh agent to read AGENTS_STARTUP.md, and forwards originSessionId so the new agent can pull SUNSET-tagged Memory Core context.

Gap surfaced post-merge (per @neo-gpt's MESSAGE:4d3c639e, 2026-05-03 — substrate-grounded refinement from @tobiu's source check): the boot-grounding prompt does NOT instruct the fresh agent to call set_session_id to rotate the live MCP server's SessionService.currentSessionId. In steady-state operation where the MCP server stays running and only the harness chat rotates, fresh sessions inherit whatever currentSessionId was at MCP boot — meaning new-session memory writes pollute the previous logical session's grouping.

This ticket addresses the steady-state recovery hardening. NOT a replacement for the strongest boundary (full harness restart, which solves both grouping AND MCP staleness).

The Problem

Memory Core source paths (verified by @neo-gpt during convergence):

  • SessionService.currentSessionId is initialized via crypto.randomUUID() in the singleton config — once, at MCP server boot
  • MemoryService.addMemory() reuses SessionService.currentSessionId when no explicit sessionId is supplied
  • set_session_id MCP tool maps to SessionService.setSessionId({sessionId}) — rotates this.currentSessionId in-process and returns {success, sessionId, replacedSessionId}

Steady-state failure mode: MCP server runs continuously, harness chat rotates (Cmd+N spawns fresh chat session), but no rotation of SessionService.currentSessionId happens. Fresh agent's add_memory calls write into the previous logical session's id, breaking session-grouping.

This was empirically dodged on the 2026-05-02→05-03 recovery test only because @tobiu restarted MCP servers between sessions (proven by the differing currentSessionId values in healthcheck output). In steady-state-no-restart operation, the gap manifests.

The Architectural Reality

Process boundary trap (per @neo-gpt's MESSAGE:125713d2, source-verified):

resumeHarness.mjs is invoked by swarm-heartbeat.sh as a SEPARATE Node subprocess. If resumeHarness.mjs imports a Memory Core service module and calls setSessionId() directly, it mutates the subprocess's in-process singleton, NOT the live MCP server process that the fresh agent's MCP client will reach via stdio. Returns {success: true} but the effect lands on the wrong process. Hollow-success trap of the family feedback_verify_effect_not_just_success codifies.

This rules out the naive "resumeHarness imports MemoryService and calls setSessionId" implementation pattern.

The Fix

Ranking of implementation candidates (per @neo-gpt's revised analysis, 2026-05-03):

  1. Strongest: Full harness restart → fresh MCP boot. Solves both session-grouping AND MCP staleness. Out of scope for THIS ticket; that's the existing operational recovery boundary.
  2. Substrate-owned in steady-state ONLY IF resumeHarness.mjs can reach the live MCP server over its actual transport (stdio, side-channel). Operationally tricky for a one-shot subprocess. Verify-effect mandatory: subsequent real add_memory through the harness MCP client must carry the rotated id. NOT recommended without significant architecture lift.
  3. Prompt-owned fallback (RECOMMENDED for this ticket): generated UUID embedded in boot-grounding prompt; mandatory set_session_id({sessionId: <fresh-uuid>}) call instruction before first save. Less architecturally "substrate-owned" but actually more reliable in current transport topology because the fresh agent's harness OWNS the right MCP client — the call routes through the live server's SessionService.currentSessionId, not a transient subprocess singleton.

Concrete prescription for option 3:

  • resumeHarness.mjs generates freshSessionId = crypto.randomUUID() at recovery time
  • Pass the freshSessionId as a 4th parameter to buildBootGroundingPrompt(identity, reason, originSessionId, freshSessionId)
  • Boot-grounding prompt instructs the fresh agent: "Before any memory save in this session, call set_session_id({sessionId: <freshSessionId>}) and verify the returned sessionId matches."
  • Spec test: process-boundary verify-effect — simulate the resume flow, then assert the fresh agent's first real add_memory through the harness MCP client carries the generated sessionId. Tool success in the resume subprocess is INSUFFICIENT (would catch only the in-process singleton mutation, not the live-server effect).

Acceptance Criteria

  • resumeHarness.mjs generates fresh UUID per recovery invocation
  • buildBootGroundingPrompt() signature extended to accept and embed the fresh UUID
  • Prompt body includes explicit instruction to call set_session_id({sessionId: <UUID>}) before first save
  • Spec test verifies process-boundary effect: fresh agent's first real add_memory post-recovery carries the generated sessionId (NOT the prior session's id)
  • Negative test: subprocess in-process singleton mutation alone is INSUFFICIENT — explicit anti-pattern test
  • PR body acknowledges the orthogonal-to-restart framing (this ticket = steady-state hardening; full restart = strongest boundary)

Out of Scope

  • Full harness restart automation — pre-existing operational boundary, not changed by this ticket
  • MCP server staleness (cached schemas, env, singleton state, server code) — only restart solves; set_session_id does NOT solve this
  • Heartbeat liveness substrate stack (#10625, #10626) — orthogonal lane; this ticket pairs specifically with #10619's recovery flow

Avoided Traps

  • Direct MemoryService.setSessionId() import in resumeHarness.mjs — process-boundary trap; mutates wrong singleton. Same family as #10619 / #10623 / #10624 substrate-truth bugs.
  • Invented service path services.mjs — Memory Core public tool mapping lives in services/toolService.mjs, not services.mjs (corrected during 2026-05-03 convergence).
  • Wrong service name MemoryService.setSessionIdset_session_id maps to SessionService.setSessionId, not MemoryService (also corrected during convergence).
  • Spec asserts only tool success — verify-effect mandate; must verify the EFFECT lands on the right process, not just {success: true} from any process.

Related

  • Pairs with: #10619 (fresh-session-spawn substrate corrective) — extends the boot-grounding prompt shipped there
  • Orthogonal to: #10625 (all-agent-idle detection), #10626 (cooldown-bounded trio wake) — different substrate concerns
  • Convergence anchor: trio coordination 2026-05-03, @neo-gpt's substrate-truth corrections in MESSAGE:4d3c639e + MESSAGE:125713d2

Origin Session ID: b1839431-cba1-4b6d-913f-27b09e472e67

Retrieval Hint: query_summaries("set_session_id rotation resumeHarness boot-grounding prompt steady-state recovery hardening") + query_raw_memories("set_session_id process-boundary trap SessionService.setSessionId not MemoryService")

tobiu referenced in commit 255f9ef - "feat(ai): claude-cli adapter for Claude Desktop terminal-restart (#10677) (#10696) on May 4, 2026, 7:20 PM
tobiu unassigned from @neo-gemini-3-1-pro on May 19, 2026, 5:57 PM