Context
Surfaced 2026-05-08 during PR #10933 (Phase 3 unit-row re-add) CI run 25524203756. The test test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23 — onsessionclosed hook removes transport and calls server.onSessionClosed via actual HTTP request flakes with:
TypeError: Cannot read properties of undefined (reading 'get')
at: const sessionId = initResponse.headers.get('mcp-session-id'); // line 69
This is the SAME flake originally classified as G5#4 in #10924 triage, and supposedly resolved by #10932 → PR #10930 commit bf894b00b (Promise-wrapped app.listen() + finally restore + new deterministic regression test in setup() awaits listener accept-state before resolving (#10932)).
The fix did land in dev — verified ai/mcp/server/shared/services/TransportService.mjs:224-241 has the new Promise wrap + httpServer capture + destroy method. Yet the flake persists in CI (workers:1 substrate).
The Problem
The setup() await-bind fix was correct for the bind-race surface I tested (single fetch immediately after setup → no ECONNREFUSED). My new regression test [TransportService.spec:94 setup() awaits listener accept-state ... (#10932)](commit 79fa24e67) passes deterministically.
But the original flake's surface is different: it does await TransportService.setup(...) THEN await fetch(/mcp, POST init) and expects initResponse.headers to be defined. Empirically initResponse.headers is undefined — meaning either:
initResponse is not a real Response object (impossible per fetch spec UNLESS fetch is mocked/replaced)
- OR there's a different race surface where the listener accepts the connection but Express middleware chain isn't fully wired before responding
Hypothesis: under workers:1, a sibling spec runs first and replaces global.fetch (or some other process-level state) without restoring it. When TransportService.spec:23 runs and calls fetch, it gets a stub/mock that returns a Response-like object without headers.
The Architectural Reality
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23-92 — the flaky test (uses port 3125, real fetch)
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:94+ — my new regression-guard test (uses port 3126, real fetch — passes deterministically)
ai/mcp/server/shared/services/TransportService.mjs:224-241 — Promise-wrapped app.listen() (already fixed per #10932)
The asymmetry between the two tests: original uses port 3125 + 50-line POST-DELETE init; my new test uses port 3126 + simple GET. Either the POST-DELETE flow itself has a race (Express middleware setup race on /mcp POST handler) OR a sibling spec is interfering.
The Fix (TBD via investigation)
Investigation steps:
- Run
TransportService.spec.mjs in isolation with workers:1 — does it flake? If YES → race intra-spec (Express middleware not fully wired). If NO → cross-spec interference.
- If cross-spec: identify which sibling replaces
global.fetch or holds state on port 3125
- If intra-spec: add
await new Promise(r => setTimeout(r, 50)) post-setup AS DIAGNOSTIC ONLY to confirm timing-shape, then look for the actual unwired-middleware lag
Acceptance Criteria
Out of Scope
- Re-architecting Express middleware setup (out of substrate-fix scope)
- Refactoring the test to avoid POST-DELETE flow (the test SHAPE is correct; it tests real onsessionclosed hook semantics)
Avoided Traps
- Skip-guard via NEO_TEST_SKIP_CI: applied as immediate ship-the-PR move on PR #10933, NOT the proper fix. The skip preserves the unit-suite gate while this investigation lands.
- Bumping retries to 5: papers over without addressing root cause.
Related
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="TransportService bind race residual second surface workers 1 flake PR 10933 #10932 followup")
Context
Surfaced 2026-05-08 during PR #10933 (Phase 3 unit-row re-add) CI run 25524203756. The test
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23 — onsessionclosed hook removes transport and calls server.onSessionClosed via actual HTTP requestflakes with:TypeError: Cannot read properties of undefined (reading 'get') at: const sessionId = initResponse.headers.get('mcp-session-id'); // line 69This is the SAME flake originally classified as G5#4 in #10924 triage, and supposedly resolved by #10932 → PR #10930 commit
bf894b00b(Promise-wrappedapp.listen()+ finally restore + new deterministic regression test insetup() awaits listener accept-state before resolving (#10932)).The fix did land in dev — verified
ai/mcp/server/shared/services/TransportService.mjs:224-241has the new Promise wrap + httpServer capture + destroy method. Yet the flake persists in CI (workers:1 substrate).The Problem
The
setup()await-bind fix was correct for the bind-race surface I tested (single fetch immediately after setup → no ECONNREFUSED). My new regression test [TransportService.spec:94setup() awaits listener accept-state ... (#10932)](commit 79fa24e67) passes deterministically.But the original flake's surface is different: it does
await TransportService.setup(...)THENawait fetch(/mcp, POST init)and expectsinitResponse.headersto be defined. EmpiricallyinitResponse.headersisundefined— meaning either:initResponseis not a real Response object (impossible per fetch spec UNLESSfetchis mocked/replaced)Hypothesis: under
workers:1, a sibling spec runs first and replacesglobal.fetch(or some other process-level state) without restoring it. When TransportService.spec:23 runs and callsfetch, it gets a stub/mock that returns a Response-like object withoutheaders.The Architectural Reality
test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:23-92— the flaky test (uses port 3125, real fetch)test/playwright/unit/ai/mcp/server/shared/services/TransportService.spec.mjs:94+— my new regression-guard test (uses port 3126, real fetch — passes deterministically)ai/mcp/server/shared/services/TransportService.mjs:224-241— Promise-wrappedapp.listen()(already fixed per #10932)The asymmetry between the two tests: original uses port 3125 + 50-line POST-DELETE init; my new test uses port 3126 + simple GET. Either the POST-DELETE flow itself has a race (Express middleware setup race on
/mcpPOST handler) OR a sibling spec is interfering.The Fix (TBD via investigation)
Investigation steps:
TransportService.spec.mjsin isolation withworkers:1— does it flake? If YES → race intra-spec (Express middleware not fully wired). If NO → cross-spec interference.global.fetchor holds state on port 3125await new Promise(r => setTimeout(r, 50))post-setup AS DIAGNOSTIC ONLY to confirm timing-shape, then look for the actual unwired-middleware lagAcceptance Criteria
workers:1npm run test-unitinvocations on CI substrateOut of Scope
Avoided Traps
Related
Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="TransportService bind race residual second surface workers 1 flake PR 10933 #10932 followup")