Context
PR #10619 shipped the Q1b fresh-session-spawn substrate corrective: resumeHarness.mjs opens a NEW chat session in the target harness via Cmd+N, pastes a boot-grounding prompt instructing the fresh agent to read AGENTS_STARTUP.md, and forwards originSessionId so the new agent can pull SUNSET-tagged Memory Core context.
Gap surfaced post-merge (per @neo-gpt's MESSAGE:4d3c639e, 2026-05-03 — substrate-grounded refinement from @tobiu's source check): the boot-grounding prompt does NOT instruct the fresh agent to call set_session_id to rotate the live MCP server's SessionService.currentSessionId. In steady-state operation where the MCP server stays running and only the harness chat rotates, fresh sessions inherit whatever currentSessionId was at MCP boot — meaning new-session memory writes pollute the previous logical session's grouping.
This ticket addresses the steady-state recovery hardening. NOT a replacement for the strongest boundary (full harness restart, which solves both grouping AND MCP staleness).
The Problem
Memory Core source paths (verified by @neo-gpt during convergence):
SessionService.currentSessionId is initialized via crypto.randomUUID() in the singleton config — once, at MCP server boot
MemoryService.addMemory() reuses SessionService.currentSessionId when no explicit sessionId is supplied
set_session_id MCP tool maps to SessionService.setSessionId({sessionId}) — rotates this.currentSessionId in-process and returns {success, sessionId, replacedSessionId}
Steady-state failure mode: MCP server runs continuously, harness chat rotates (Cmd+N spawns fresh chat session), but no rotation of SessionService.currentSessionId happens. Fresh agent's add_memory calls write into the previous logical session's id, breaking session-grouping.
This was empirically dodged on the 2026-05-02→05-03 recovery test only because @tobiu restarted MCP servers between sessions (proven by the differing currentSessionId values in healthcheck output). In steady-state-no-restart operation, the gap manifests.
The Architectural Reality
Process boundary trap (per @neo-gpt's MESSAGE:125713d2, source-verified):
resumeHarness.mjs is invoked by swarm-heartbeat.sh as a SEPARATE Node subprocess. If resumeHarness.mjs imports a Memory Core service module and calls setSessionId() directly, it mutates the subprocess's in-process singleton, NOT the live MCP server process that the fresh agent's MCP client will reach via stdio. Returns {success: true} but the effect lands on the wrong process. Hollow-success trap of the family feedback_verify_effect_not_just_success codifies.
This rules out the naive "resumeHarness imports MemoryService and calls setSessionId" implementation pattern.
The Fix
Ranking of implementation candidates (per @neo-gpt's revised analysis, 2026-05-03):
- Strongest: Full harness restart → fresh MCP boot. Solves both session-grouping AND MCP staleness. Out of scope for THIS ticket; that's the existing operational recovery boundary.
- Substrate-owned in steady-state ONLY IF
resumeHarness.mjs can reach the live MCP server over its actual transport (stdio, side-channel). Operationally tricky for a one-shot subprocess. Verify-effect mandatory: subsequent real add_memory through the harness MCP client must carry the rotated id. NOT recommended without significant architecture lift.
- Prompt-owned fallback (RECOMMENDED for this ticket): generated UUID embedded in boot-grounding prompt; mandatory
set_session_id({sessionId: <fresh-uuid>}) call instruction before first save. Less architecturally "substrate-owned" but actually more reliable in current transport topology because the fresh agent's harness OWNS the right MCP client — the call routes through the live server's SessionService.currentSessionId, not a transient subprocess singleton.
Concrete prescription for option 3:
resumeHarness.mjs generates freshSessionId = crypto.randomUUID() at recovery time
- Pass the
freshSessionId as a 4th parameter to buildBootGroundingPrompt(identity, reason, originSessionId, freshSessionId)
- Boot-grounding prompt instructs the fresh agent: "Before any memory save in this session, call
set_session_id({sessionId: <freshSessionId>}) and verify the returned sessionId matches."
- Spec test: process-boundary verify-effect — simulate the resume flow, then assert the fresh agent's first real
add_memory through the harness MCP client carries the generated sessionId. Tool success in the resume subprocess is INSUFFICIENT (would catch only the in-process singleton mutation, not the live-server effect).
Acceptance Criteria
Out of Scope
- Full harness restart automation — pre-existing operational boundary, not changed by this ticket
- MCP server staleness (cached schemas, env, singleton state, server code) — only restart solves;
set_session_id does NOT solve this
- Heartbeat liveness substrate stack (#10625, #10626) — orthogonal lane; this ticket pairs specifically with #10619's recovery flow
Avoided Traps
- ❌ Direct
MemoryService.setSessionId() import in resumeHarness.mjs — process-boundary trap; mutates wrong singleton. Same family as #10619 / #10623 / #10624 substrate-truth bugs.
- ❌ Invented service path
services.mjs — Memory Core public tool mapping lives in services/toolService.mjs, not services.mjs (corrected during 2026-05-03 convergence).
- ❌ Wrong service name
MemoryService.setSessionId — set_session_id maps to SessionService.setSessionId, not MemoryService (also corrected during convergence).
- ❌ Spec asserts only tool success — verify-effect mandate; must verify the EFFECT lands on the right process, not just
{success: true} from any process.
Related
- Pairs with: #10619 (fresh-session-spawn substrate corrective) — extends the boot-grounding prompt shipped there
- Orthogonal to: #10625 (all-agent-idle detection), #10626 (cooldown-bounded trio wake) — different substrate concerns
- Convergence anchor: trio coordination 2026-05-03, @neo-gpt's substrate-truth corrections in MESSAGE:4d3c639e + MESSAGE:125713d2
Origin Session ID: b1839431-cba1-4b6d-913f-27b09e472e67
Retrieval Hint: query_summaries("set_session_id rotation resumeHarness boot-grounding prompt steady-state recovery hardening") + query_raw_memories("set_session_id process-boundary trap SessionService.setSessionId not MemoryService")
Context
PR #10619 shipped the Q1b fresh-session-spawn substrate corrective:
resumeHarness.mjsopens a NEW chat session in the target harness via Cmd+N, pastes a boot-grounding prompt instructing the fresh agent to readAGENTS_STARTUP.md, and forwardsoriginSessionIdso the new agent can pull SUNSET-tagged Memory Core context.Gap surfaced post-merge (per @neo-gpt's MESSAGE:4d3c639e, 2026-05-03 — substrate-grounded refinement from @tobiu's source check): the boot-grounding prompt does NOT instruct the fresh agent to call
set_session_idto rotate the live MCP server'sSessionService.currentSessionId. In steady-state operation where the MCP server stays running and only the harness chat rotates, fresh sessions inherit whatevercurrentSessionIdwas at MCP boot — meaning new-session memory writes pollute the previous logical session's grouping.This ticket addresses the steady-state recovery hardening. NOT a replacement for the strongest boundary (full harness restart, which solves both grouping AND MCP staleness).
The Problem
Memory Core source paths (verified by @neo-gpt during convergence):
SessionService.currentSessionIdis initialized viacrypto.randomUUID()in the singleton config — once, at MCP server bootMemoryService.addMemory()reusesSessionService.currentSessionIdwhen no explicitsessionIdis suppliedset_session_idMCP tool maps toSessionService.setSessionId({sessionId})— rotatesthis.currentSessionIdin-process and returns{success, sessionId, replacedSessionId}Steady-state failure mode: MCP server runs continuously, harness chat rotates (Cmd+N spawns fresh chat session), but no rotation of
SessionService.currentSessionIdhappens. Fresh agent'sadd_memorycalls write into the previous logical session's id, breaking session-grouping.This was empirically dodged on the 2026-05-02→05-03 recovery test only because @tobiu restarted MCP servers between sessions (proven by the differing currentSessionId values in healthcheck output). In steady-state-no-restart operation, the gap manifests.
The Architectural Reality
Process boundary trap (per @neo-gpt's MESSAGE:125713d2, source-verified):
resumeHarness.mjsis invoked byswarm-heartbeat.shas a SEPARATE Node subprocess. IfresumeHarness.mjsimports a Memory Core service module and callssetSessionId()directly, it mutates the subprocess's in-process singleton, NOT the live MCP server process that the fresh agent's MCP client will reach via stdio. Returns{success: true}but the effect lands on the wrong process. Hollow-success trap of the familyfeedback_verify_effect_not_just_successcodifies.This rules out the naive "resumeHarness imports MemoryService and calls setSessionId" implementation pattern.
The Fix
Ranking of implementation candidates (per @neo-gpt's revised analysis, 2026-05-03):
resumeHarness.mjscan reach the live MCP server over its actual transport (stdio, side-channel). Operationally tricky for a one-shot subprocess. Verify-effect mandatory: subsequent realadd_memorythrough the harness MCP client must carry the rotated id. NOT recommended without significant architecture lift.set_session_id({sessionId: <fresh-uuid>})call instruction before first save. Less architecturally "substrate-owned" but actually more reliable in current transport topology because the fresh agent's harness OWNS the right MCP client — the call routes through the live server'sSessionService.currentSessionId, not a transient subprocess singleton.Concrete prescription for option 3:
resumeHarness.mjsgeneratesfreshSessionId = crypto.randomUUID()at recovery timefreshSessionIdas a 4th parameter tobuildBootGroundingPrompt(identity, reason, originSessionId, freshSessionId)set_session_id({sessionId: <freshSessionId>})and verify the returnedsessionIdmatches."add_memorythrough the harness MCP client carries the generated sessionId. Tool success in the resume subprocess is INSUFFICIENT (would catch only the in-process singleton mutation, not the live-server effect).Acceptance Criteria
resumeHarness.mjsgenerates fresh UUID per recovery invocationbuildBootGroundingPrompt()signature extended to accept and embed the fresh UUIDset_session_id({sessionId: <UUID>})before first saveadd_memorypost-recovery carries the generated sessionId (NOT the prior session's id)Out of Scope
set_session_iddoes NOT solve thisAvoided Traps
MemoryService.setSessionId()import in resumeHarness.mjs — process-boundary trap; mutates wrong singleton. Same family as #10619 / #10623 / #10624 substrate-truth bugs.services.mjs— Memory Core public tool mapping lives inservices/toolService.mjs, notservices.mjs(corrected during 2026-05-03 convergence).MemoryService.setSessionId—set_session_idmaps toSessionService.setSessionId, notMemoryService(also corrected during convergence).{success: true}from any process.Related
Origin Session ID: b1839431-cba1-4b6d-913f-27b09e472e67
Retrieval Hint: query_summaries("set_session_id rotation resumeHarness boot-grounding prompt steady-state recovery hardening") + query_raw_memories("set_session_id process-boundary trap SessionService.setSessionId not MemoryService")