Context
Surfaced 2026-05-07 by Lane C #10899 integration row CI on rebased head cd6ab05c6 after #10914 (bash TCP probe) cleared the Chroma healthcheck substrate layer. Empirical diagnosis from CI logs by @neo-gpt (A2A 2026-05-07T16:55:01Z, broadcast MESSAGE:45cd4186).
The Problems (two distinct bugs, surfaced together)
Bug 1: TransportService singleton-transport reconnect
File: ai/mcp/server/shared/services/TransportService.mjs:153
Symptom:
Error: Already connected to a transport. Call close() before connecting to a new transport,
or use a separate Protocol instance per connection.
Root cause: TransportService creates a new StreamableHTTPServerTransport per session (correct for streamable-HTTP MCP) but reconnects the singleton server.mcpServer / Protocol instance via server.mcpServer.connect(transport) for every new client. The MCP SDK's Protocol class is single-transport: the first client consumes it, subsequent clients hit the "Already connected" guard → uncaught exception → Express HTML 500 page.
Empirical anchors: all 5 integration specs fail at the SECOND /mcp POST (each spec creates fresh client connections, and the first connection works for the spec but the next-connecting spec sees the bug):
healthcheck.spec.mjs:9 — KB+MC healthcheck tool call → 500
CrossTenantIsolation.spec.mjs:20 — alice/bob clients (2nd client = first failure) → 500
HeartbeatPropagation.spec.mjs:11 — repeated assertSustainedHealth probes → 500
healthcheck.spec.mjs:40 — sustained-liveness composability → 500
AuthRejection.spec.mjs:11 — first client (without identity) succeeds incorrectly (compounded by Bug 2)
Bug 2: Auth env var name mismatch
Compose configuration: ai/deploy/docker-compose.test.yml for mc-server and kb-server:
- NEO_AUTH_TRUST_PROXY_IDENTITY=true
Config template binding: the auth.trustProxyIdentity config reads from AUTH_TRUST_PROXY_IDENTITY (NO NEO_ prefix) per the canonical template in ai/mcp/server/<server>/config.template.mjs.
Result: the trust-proxy-identity gate never activates in Docker config. The auth-rejection layer that should 401 missing-X-PREFERRED-USERNAME requests never fires. AuthRejection test's expected-rejection becomes unexpected-success.
Empirical anchor: AuthRejection.spec.mjs:32 expect(rejectionError).toBeTruthy() failure — rejectionError is undefined because the no-identity client succeeded (gate not active).
The Architectural Reality
- MCP SDK's
StreamableHTTPServerTransport model: each client gets its own transport, but the McpServer/Protocol can either be (a) singleton with multiplexing OR (b) per-session new instance. Current TransportService does NEITHER — it creates new transports but reuses the singleton Protocol, hitting the SDK's "one transport per Protocol" invariant.
- The auth env var mismatch is a downstream effect of the env-var-namespacing audit pattern (see #10884 for prior
NEO_ prefix rationalization). Compose was updated to canonical NEO_* but the binding in config.template.mjs was missed.
The Fix (two prescriptions; can be 1 or 2 PRs)
Bug 1 fix (TransportService refactor):
Either:
- (a) Per-session new
McpServer instance (one Protocol per client). Session map Map<sessionId, {server, transport}>. Higher memory but simpler invariant.
- (b) Singleton McpServer + multi-transport multiplexing. Requires cooperation from MCP SDK Protocol class — may not be supported.
Recommend (a) unless SDK provides multiplexing primitive. Per-session McpServer is the canonical streamable-HTTP server pattern.
Bug 2 fix (auth env alignment):
Either:
- (a) Update
ai/mcp/server/<server>/config.template.mjs to bind from NEO_AUTH_TRUST_PROXY_IDENTITY.
- (b) Change
ai/deploy/docker-compose.test.yml to set AUTH_TRUST_PROXY_IDENTITY (drop NEO_ prefix).
Recommend (a) — NEO_* prefix is the canonical namespace per #10884; the binding template is the bug.
Acceptance Criteria
Out of Scope
- Refactoring
StdioServerTransport (Stdio sessions are inherently 1:1 with a server instance — no multiplexing concern).
- Audit of OTHER env var name mismatches between compose and config templates (file follow-up if Lane C unit-test substrate audit doesn't already cover).
- Performance tuning of per-session McpServer creation (file follow-up if profiling shows it's a hotspot).
Related
- Surfacing context: Lane C CI run 25509545707 (job 74864331326 on PR #10899 head
cd6ab05c6) — full server-side log evidence pulled by @neo-gpt.
- Predecessor in this lineage: Chroma healthcheck substrate fixes (#10902/#10904 → #10908/#10909 → #10911/#10912 → #10913/#10914). Bash TCP probe cleared the compose-up healthcheck layer; this ticket is the next substrate layer (deployed MCP server itself).
- Related env-var-namespacing work: #10884 (
NEO_ prefix canonicalization for SessionService env vars).
- Author of empirical diagnosis: @neo-gpt (#10901 Lane A author, has #10895 trust-proxy-identity context).
- In-flight implementation: @neo-gemini-3-1-pro (per @tobiu A2A 2026-05-07T16:58Z) — "knee deep into coding" without ticket; this ticket is being filed concurrently.
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="MCP TransportService singleton reconnect StreamableHTTPServerTransport auth env mismatch deployed CI Lane C")
Context
Surfaced 2026-05-07 by Lane C #10899 integration row CI on rebased head
cd6ab05c6after #10914 (bash TCP probe) cleared the Chroma healthcheck substrate layer. Empirical diagnosis from CI logs by @neo-gpt (A2A 2026-05-07T16:55:01Z, broadcastMESSAGE:45cd4186).The Problems (two distinct bugs, surfaced together)
Bug 1: TransportService singleton-transport reconnect
File:
ai/mcp/server/shared/services/TransportService.mjs:153Symptom:
Root cause:
TransportServicecreates a newStreamableHTTPServerTransportper session (correct for streamable-HTTP MCP) but reconnects the singletonserver.mcpServer/ Protocol instance viaserver.mcpServer.connect(transport)for every new client. The MCP SDK's Protocol class is single-transport: the first client consumes it, subsequent clients hit the "Already connected" guard → uncaught exception → Express HTML 500 page.Empirical anchors: all 5 integration specs fail at the SECOND
/mcpPOST (each spec creates fresh client connections, and the first connection works for the spec but the next-connecting spec sees the bug):healthcheck.spec.mjs:9— KB+MChealthchecktool call → 500CrossTenantIsolation.spec.mjs:20— alice/bob clients (2nd client = first failure) → 500HeartbeatPropagation.spec.mjs:11— repeatedassertSustainedHealthprobes → 500healthcheck.spec.mjs:40— sustained-liveness composability → 500AuthRejection.spec.mjs:11— first client (without identity) succeeds incorrectly (compounded by Bug 2)Bug 2: Auth env var name mismatch
Compose configuration:
ai/deploy/docker-compose.test.ymlformc-serverandkb-server:- NEO_AUTH_TRUST_PROXY_IDENTITY=trueConfig template binding: the
auth.trustProxyIdentityconfig reads fromAUTH_TRUST_PROXY_IDENTITY(NONEO_prefix) per the canonical template inai/mcp/server/<server>/config.template.mjs.Result: the trust-proxy-identity gate never activates in Docker config. The auth-rejection layer that should 401 missing-
X-PREFERRED-USERNAMErequests never fires. AuthRejection test's expected-rejection becomes unexpected-success.Empirical anchor: AuthRejection.spec.mjs:32
expect(rejectionError).toBeTruthy()failure —rejectionErrorisundefinedbecause the no-identity client succeeded (gate not active).The Architectural Reality
StreamableHTTPServerTransportmodel: each client gets its own transport, but the McpServer/Protocol can either be (a) singleton with multiplexing OR (b) per-session new instance. CurrentTransportServicedoes NEITHER — it creates new transports but reuses the singleton Protocol, hitting the SDK's "one transport per Protocol" invariant.NEO_prefix rationalization). Compose was updated to canonicalNEO_*but the binding in config.template.mjs was missed.The Fix (two prescriptions; can be 1 or 2 PRs)
Bug 1 fix (TransportService refactor):
Either:
McpServerinstance (one Protocol per client). Session mapMap<sessionId, {server, transport}>. Higher memory but simpler invariant.Recommend (a) unless SDK provides multiplexing primitive. Per-session McpServer is the canonical streamable-HTTP server pattern.
Bug 2 fix (auth env alignment):
Either:
ai/mcp/server/<server>/config.template.mjsto bind fromNEO_AUTH_TRUST_PROXY_IDENTITY.ai/deploy/docker-compose.test.ymlto setAUTH_TRUST_PROXY_IDENTITY(dropNEO_prefix).Recommend (a) —
NEO_*prefix is the canonical namespace per #10884; the binding template is the bug.Acceptance Criteria
TransportServicerefactored so concurrent client connections to/mcpeach get their ownMcpServer/Protocol instance (no singleton reuse).auth.trustProxyIdentityreads fromNEO_AUTH_TRUST_PROXY_IDENTITYenv var (canonicalNEO_*prefix per #10884).npm run test-integrationonce Lane C #10899 re-runs CI on a rebase post-merge.Out of Scope
StdioServerTransport(Stdio sessions are inherently 1:1 with a server instance — no multiplexing concern).Related
cd6ab05c6) — full server-side log evidence pulled by @neo-gpt.NEO_prefix canonicalization for SessionService env vars).Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="MCP TransportService singleton reconnect StreamableHTTPServerTransport auth env mismatch deployed CI Lane C")