Context
Authored 2026-05-08 in response to @tobiu's chief-architect mandate after two compounding architectural-hallucination errors over the prior session day:
- Factory pattern misread: I initially conflated "remote multi-tenant Memory Core" with "high-availability primary/secondary deployment". Gemini correctly flagged this in
MESSAGE:0ca25e5b-d59d-4cae-a66e-c6bfd669953e — the multi-tenant remote shape is ONE endpoint with per-request RequestContextService + Mcp-Session-Id binding, not multi-instance shared-substrate.
- NEO_MC_PRIMARY scoping over-correction: my and Gemini's first reaction was "strip it entirely". @tobiu corrected: "isPrimary => LOCAL mcp servers. if agent harnesses spawn multiple local server instances. however: we WANT to move summarization items into daemons." The flag is still valid for LOCAL multi-harness fleets TODAY; it becomes obsolete only after daemonization lands.
@tobiu's chief-architect mandate: "how do we get from right here to v13 => path. and this needs to get documented."
The Problem
The swarm has accumulated several architectural fragments without a coordinated path:
- 5 MCP servers with no shared base class (each
Server.mjs independent, ~80% boilerplate duplication)
- Factory pattern adoption uneven (2 of 5 servers — memory-core direct, knowledge-base via
TransportService)
ai/services.mjs SDK boundary mature but underutilized (most service logic still in MCP server services/ directories)
- Daemon-shaped services in
ai/daemons/ (DreamService, SwarmHeartbeatService, decomposed services from #10013/#10028) without an orchestrator-daemon process to schedule them
- Bridge-daemon (per-host singleton) specialized for wake-event delivery; no sibling daemon for summarization/sandman/golden-path
- Summarization gated off (#9942 daemon-collision fix); operator-runs only via
npm run ai:summarize-sessions
NEO_MC_PRIMARY in-process gate exists as workaround for the missing daemon — keeps LOCAL multi-harness fleets from racing on shared substrate
Without a documented path, the swarm risks:
- Iterating in conflicting directions (e.g., my Piece C in-process implementation vs the original ticket's bridge-daemon-or-sibling guidance)
- Losing context between sessions on which architectural decisions are load-bearing vs incidental
- Cowboy-coding without verifying the substrate (the failure mode of today's session)
The Architectural Reality
Current substrate state (verified empirically 2026-05-08):
ai/mcp/server/{file-system,github-workflow,knowledge-base,memory-core,neural-link}/Server.mjs — 5 MCP server entry points, no common base
ai/mcp/server/shared/services/RequestContextService.mjs — Factory pattern (per-request identity binding via AsyncLocalStorage)
ai/mcp/server/shared/services/TransportService.mjs — SSE transport setup; wraps dispatch in RequestContextService.run(...)
ai/services.mjs — mature SDK boundary; consumed by scripts, tests, and MC server's processPendingSummarizations
ai/scripts/bridge-daemon.mjs — standalone wake-event-delivery daemon, per-host singleton via PID file
ai/daemons/ — in-process daemon-shaped services (DreamService, SwarmHeartbeatService, services/* from #10013 decomposition)
ai/scripts/summarize-sessions.mjs — operator-trigger script (calls Memory_SessionService.summarizeSessions({}) via SDK)
The end-state vision (per @tobiu's mandate): slim MCP servers + mature SDK + clean daemon architecture + single-source-of-truth for summaries/sandman/golden-path (daemon-driven).
The Fix
Produce, peer-review, and merge learn/agentos/v13-path.md as the chief-architect document covering:
- Vision (3 load-bearing properties — slim MCP, common base, orchestrator daemon)
- Current State — empirically verified table comparing today vs v13 target
- Critical Architectural Decisions — D1 Factory pattern eval (with challenge framing) / D2 common base server class / D3 orchestrator daemon architecture / D4 SDK migration boundary / D5 NEO_MC_PRIMARY retirement path
- Sequenced Milestones — M1 substrate stabilization (current week) → M7 v13 release cut
- Tickets to file/update
- #9999 sub-issue audit (each open sub-issue triaged: on-trajectory / re-scope / verify)
- Risks
- Outcome Metrics (quantitative v13 readiness targets)
- Provenance
Doc location: learn/agentos/v13-path.md. Linked from ROADMAP.md once reviewed.
Acceptance Criteria
Out of Scope
- Implementing any of the milestones (M2-M7 are separate epics/PRs)
- Re-scoping or closing individual sub-issues of #9999 (audit identifies them; per-ticket updates are separate work)
- Updating
learn/agentos/MX.md or other foundational docs beyond ROADMAP.md cross-link
- Filing the M2/M3/M6 epics this ticket only enumerates them; their creation is downstream of doc approval
- Operator deployment cookbook updates (those land per-milestone, not in the doc itself)
Avoided Traps / Gold Standards Rejected
- Rejected: comprehensive 1000+-line specification. Path docs that try to enumerate every detail become outdated within weeks. This doc is strategic (200-300 lines), not prescriptive — milestones link to per-ticket prescriptions.
- Rejected: rubber-stamp the Factory pattern. Per @tobiu's directive "challenge the factory pattern (probably a reasonably good solution)", D1 explicitly evaluates pros/cons/edge cases rather than assuming. Recommendation is positive but with named caveats.
- Rejected: "strip NEO_MC_PRIMARY immediately" framing (the over-correction the swarm jumped to earlier today). D5 sequences retirement BEHIND the orchestrator landing, not parallel to it.
- Rejected: extend bridge-daemon for summarization (my initial wrong-shape proposal). D3 explicitly separates concerns — bridge stays specialized for wake-domain; orchestrator is its sibling for scheduled work.
- Rejected: file 5+ tickets for each milestone immediately. Milestone tickets land downstream of doc approval; pre-filing pollutes the backlog before the path is endorsed.
Related
- Parent epic: #9999 Cloud-Native Knowledge & Multi-Tenant Memory Core (v13 main epic)
- Substrate triggers (today's correction chain): A2A
MESSAGE:0ca25e5b-d59d-4cae-a66e-c6bfd669953e (Gemini's CRITICAL flagging primary/secondary mental-model error)
- Adjacent in-flight work referenced by the path:
- #10813 Restore session summaries — re-scoped per D3/D5
- #10956 NEO_MC_PRIMARY removal — re-scoped per D5
- #10945 Deployment-pipeline integration coverage — M1 GPT lane
- #10939 Phase 3 unit-row re-add — M1 Gemini lane
- #10013 DreamService Decomposition — M4 prerequisite (already mostly shipped)
- #10028 Finalize DreamService Decomposition — M4 work
- Referenced from ROADMAP.md (post-approval cross-link)
Origin Session ID: 005b6edf-85d8-4980-9e17-486b6b8bed3f
Retrieval Hint: query_raw_memories(query="v13 architectural path slim MCP servers orchestrator daemon SDK migration NEO_MC_PRIMARY retirement Factory pattern common base server class")
Context
Authored 2026-05-08 in response to @tobiu's chief-architect mandate after two compounding architectural-hallucination errors over the prior session day:
MESSAGE:0ca25e5b-d59d-4cae-a66e-c6bfd669953e— the multi-tenant remote shape is ONE endpoint with per-requestRequestContextService+Mcp-Session-Idbinding, not multi-instance shared-substrate.@tobiu's chief-architect mandate: "how do we get from right here to v13 => path. and this needs to get documented."
The Problem
The swarm has accumulated several architectural fragments without a coordinated path:
Server.mjsindependent, ~80% boilerplate duplication)TransportService)ai/services.mjsSDK boundary mature but underutilized (most service logic still in MCP serverservices/directories)ai/daemons/(DreamService, SwarmHeartbeatService, decomposed services from #10013/#10028) without an orchestrator-daemon process to schedule themnpm run ai:summarize-sessionsNEO_MC_PRIMARYin-process gate exists as workaround for the missing daemon — keeps LOCAL multi-harness fleets from racing on shared substrateWithout a documented path, the swarm risks:
The Architectural Reality
Current substrate state (verified empirically 2026-05-08):
ai/mcp/server/{file-system,github-workflow,knowledge-base,memory-core,neural-link}/Server.mjs— 5 MCP server entry points, no common baseai/mcp/server/shared/services/RequestContextService.mjs— Factory pattern (per-request identity binding viaAsyncLocalStorage)ai/mcp/server/shared/services/TransportService.mjs— SSE transport setup; wraps dispatch inRequestContextService.run(...)ai/services.mjs— mature SDK boundary; consumed by scripts, tests, and MC server'sprocessPendingSummarizationsai/scripts/bridge-daemon.mjs— standalone wake-event-delivery daemon, per-host singleton via PID fileai/daemons/— in-process daemon-shaped services (DreamService, SwarmHeartbeatService, services/* from #10013 decomposition)ai/scripts/summarize-sessions.mjs— operator-trigger script (callsMemory_SessionService.summarizeSessions({})via SDK)The end-state vision (per @tobiu's mandate): slim MCP servers + mature SDK + clean daemon architecture + single-source-of-truth for summaries/sandman/golden-path (daemon-driven).
The Fix
Produce, peer-review, and merge
learn/agentos/v13-path.mdas the chief-architect document covering:Doc location:
learn/agentos/v13-path.md. Linked from ROADMAP.md once reviewed.Acceptance Criteria
learn/agentos/v13-path.mdcommitted via PR; covers all 9 sections enumerated in The FixOut of Scope
learn/agentos/MX.mdor other foundational docs beyond ROADMAP.md cross-linkAvoided Traps / Gold Standards Rejected
Related
MESSAGE:0ca25e5b-d59d-4cae-a66e-c6bfd669953e(Gemini's CRITICAL flagging primary/secondary mental-model error)Origin Session ID:
005b6edf-85d8-4980-9e17-486b6b8bed3fRetrieval Hint:
query_raw_memories(query="v13 architectural path slim MCP servers orchestrator daemon SDK migration NEO_MC_PRIMARY retirement Factory pattern common base server class")