Context
Surfaced 2026-05-07 during Bucket G G5 flake triage at #10924 (G5#4 row in triage matrix). One of 4 flaky tests from CI run 25514310691: test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23 — onsessionclosed hook removes transport and calls server.onSessionClosed via actual HTTP request. Failure surface: TypeError: Cannot read properties of undefined (reading 'get') on initResponse.headers.get(...).
Root cause inspection (per G5 triage): port 3125 is unique to this spec (no cross-spec collision verified via grep). The most likely failure mode is TransportService.setup() returns before Express has actually completed binding to port 3125 — app.listen() returns synchronously without awaiting the listener accept-state, so the subsequent fetch() call races the bind.
Filed retroactively to track G5#4 work that landed via PR #10930 commit bf894b00b (authored by @neo-gemini-3-1-pro). The fix shape was correct; this ticket creates the missing close-target so the work has proper graph attribution.
The Problem
TransportService.setup() invokes Express app.listen(port, callback) and returns from the async function before the listener has actually entered the LISTENING state. Under load or under fullyParallel test interleaving, the calling spec's subsequent fetch(port) can fire before the bind completes — causing the response object to lack the expected shape.
The onsessionclosed test reproducer:
- Calls
await TransportService.setup({...}) → returns immediately, listener pending
- Calls
await fetch('http://localhost:3125/mcp', {...POST init...}) → races
- Reads
initResponse.headers.get('mcp-session-id') → headers undefined when fetch lost the race
This is intermittent (passes on retry) because under low-load conditions the bind completes within the V8 event-loop tick before fetch fires.
The Architectural Reality
ai/mcp/server/shared/services/TransportService.mjs #setup() method
- Express
app.listen() returns a http.Server instance synchronously; the 'listening' event (or the listen-callback) fires when the bind is complete
- Test consumer:
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23
- No
destroy() teardown method previously existed — leaked HTTP servers across test runs
The Fix
Wrap app.listen(port, callback) in a Promise that resolves only after the listen-callback fires (or rejects on 'error' event). Capture the returned http.Server instance on this.httpServer for later teardown. Add a destroy() method that closes the HTTP server cleanly.
This guarantees await TransportService.setup({...}) only returns once the listener is ready to accept connections, eliminating the race.
Acceptance Criteria
Out of Scope
- Migrating other services that currently use raw
app.listen() — file as separate ticket if discovered
- Adding port-conflict retry logic — out of substrate-fix scope
- Replacing Express with a different HTTP substrate — architectural rewrite, not relevant
Avoided Traps
- Polling for bind via setTimeout loop: rejected — race-prone, adds non-determinism. The Express callback /
'listening' event is the canonical signal.
- Fixed setTimeout(N) before resolve: rejected — slows fast paths, doesn't guarantee correctness on slow CI hosts.
- Bind-status check via separate
connect() probe: rejected — adds spec complexity; the listen-callback is the canonical signal.
Related
- Implementation landed: PR #10930 commit
bf894b00b (@neo-gemini-3-1-pro authored, bundled with #10931 fix)
- Triage source: #10924 G5#4 row
- Bucket G epic: #10924
- Sibling flake patterns: G5#1 (DiscussionService — resolved via #10929), G5#2 (KBRecorderService — deferred), G5#3 (PermissionService — deferred)
- Empirical anchor: CI run 25514310691
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="TransportService.setup bind race app.listen Promise listener G5#4 #10783 #10931 PR #10930")
Context
Surfaced 2026-05-07 during Bucket G G5 flake triage at #10924 (G5#4 row in triage matrix). One of 4 flaky tests from CI run 25514310691:
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23 — onsessionclosed hook removes transport and calls server.onSessionClosed via actual HTTP request. Failure surface:TypeError: Cannot read properties of undefined (reading 'get')oninitResponse.headers.get(...).Root cause inspection (per G5 triage): port 3125 is unique to this spec (no cross-spec collision verified via grep). The most likely failure mode is
TransportService.setup()returns before Express has actually completed binding to port 3125 —app.listen()returns synchronously without awaiting the listener accept-state, so the subsequentfetch()call races the bind.Filed retroactively to track G5#4 work that landed via PR #10930 commit
bf894b00b(authored by @neo-gemini-3-1-pro). The fix shape was correct; this ticket creates the missing close-target so the work has proper graph attribution.The Problem
TransportService.setup()invokes Expressapp.listen(port, callback)and returns from the async function before the listener has actually entered the LISTENING state. Under load or under fullyParallel test interleaving, the calling spec's subsequent fetch(port) can fire before the bind completes — causing the response object to lack the expected shape.The
onsessionclosedtest reproducer:await TransportService.setup({...})→ returns immediately, listener pendingawait fetch('http://localhost:3125/mcp', {...POST init...})→ racesinitResponse.headers.get('mcp-session-id')→ headers undefined when fetch lost the raceThis is intermittent (passes on retry) because under low-load conditions the bind completes within the V8 event-loop tick before fetch fires.
The Architectural Reality
ai/mcp/server/shared/services/TransportService.mjs#setup()methodapp.listen()returns ahttp.Serverinstance synchronously; the'listening'event (or the listen-callback) fires when the bind is completetest/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23destroy()teardown method previously existed — leaked HTTP servers across test runsThe Fix
Wrap
app.listen(port, callback)in aPromisethat resolves only after the listen-callback fires (or rejects on'error'event). Capture the returnedhttp.Serverinstance onthis.httpServerfor later teardown. Add adestroy()method that closes the HTTP server cleanly.This guarantees
await TransportService.setup({...})only returns once the listener is ready to accept connections, eliminating the race.Acceptance Criteria
TransportService.setup()wrapsapp.listen()in a Promise that resolves on the listen-callback (port bound + listening) and rejects on'error'eventthis.httpServercaptures the listener instance for later teardownTransportService.destroy()closes the HTTP server cleanly (idempotent on missing server)await TransportService.setup({...})returns only after the listener is accepting (probe via test-sidefetchimmediately aftersetupresolves; expect 200/406, not connection-refused)Out of Scope
app.listen()— file as separate ticket if discoveredAvoided Traps
'listening'event is the canonical signal.connect()probe: rejected — adds spec complexity; the listen-callback is the canonical signal.Related
bf894b00b(@neo-gemini-3-1-pro authored, bundled with #10931 fix)Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="TransportService.setup bind race app.listen Promise listener G5#4 #10783 #10931 PR #10930")