Context
Surfaced 2026-05-08 during PR #10940 (TestLifecycleHelper substrate migration) Cycle 1 review. Empirical full-unit-suite run (CI=true npm run test-unit, matches CI workers:1 substrate) showed test/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250 flakes — passes on retry. This is a NEW surface, not in the prior Bucket G G5 triage matrix.
The Problem
Test getPullRequestDiff (#10748) › file parameter filters the diff output (line 250) flakes under workers:1. Specific failure shape needs empirical capture from a fresh CI run + log inspection — my Cycle 1 review ran the suite but the flake's first-attempt error wasn't pulled into evidence (Playwright reports flakes as "passed eventually" in summary; the failed attempt's trace is in test-results/ but not pulled).
Triage hypotheses (not yet empirically verified):
- Singleton state pollution from sibling github-workflow specs — DiscussionService.spec, ConfigCompleteness.spec, IssueService.spec, etc. all live under
ai/mcp/server/github-workflow and may share GraphqlService or similar singleton state. Same root-cause class as G5 patterns elsewhere.
- Race in GraphqlService.query mocking — the test likely overrides
GraphqlService.query for the test scenario; under fullyParallel-ish ordering with workers:1, sibling spec's cleanup may not restore the original query binding cleanly.
- Genuine race in
getPullRequestDiff file-parameter filtering logic — if hypothesis 1+2 both fail, this is an actual production race.
The Architectural Reality
test/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250 — the flaky test (getPullRequestDiff #10748 file parameter filters the diff output)
ai/mcp/server/github-workflow/services/PullRequestService.mjs#getPullRequestDiff — the production target (added in #10748)
- Sibling github-workflow specs that share GraphqlService:
DiscussionService.spec.mjs (G5#1, fixed via #10929)
ConfigCompleteness.spec.mjs (G4 cascade, fixed via #10929)
IssueService.spec.mjs (Zod validator surface)
The Fix (TBD via investigation)
Investigation steps:
- Pull the actual failure trace from a CI run —
gh run view ... --log-failed for the first-attempt error
- Categorize: GraphqlService mock leak vs new substrate vs production race
- Implement targeted fix; coordinate with @neo-gpt if it's another github-workflow Config namespace cascade (his G4 lane)
Skip-guarded in Phase 3 PR (forthcoming) per NEO_TEST_SKIP_CI pattern; this ticket tracks the proper investigation + fix.
Acceptance Criteria
Out of Scope
- Migrating PullRequestService.spec to TestLifecycleHelper unconditionally (unknown if applicable until hypothesis is identified)
- Rewriting
getPullRequestDiff (out of scope unless hypothesis 3 holds)
Avoided Traps
- Pre-emptive TestLifecycleHelper migration: rejected without empirical hypothesis evidence. Migration is the right shape for singleton-state-pollution but unnecessary if the shape is mock-leak or production race.
- Skip-guard as permanent solution: applied as immediate Phase 3 ship-the-PR move + tracked here for proper fix.
Related
- Surfacing CI run: my empirical CI=true full-suite run on PR #10940 c.7f454ee2 (1055 passed, 4 flaky)
- Cycle 1 review citation: PR #10940 commentId IC_kwDODSospM8AAAABBodQ9A — listed as new surface not in prior triage
- Production target: #10748 (
getPullRequestDiff introduced)
- Sibling github-workflow flake patterns: #10929 G4 namespace collision (resolved), G5#1 cascade (resolved)
- Bucket G epic: #10924
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="PullRequestService getPullRequestDiff file parameter flake workers 1 #10940 CI 25542845735")
Context
Surfaced 2026-05-08 during PR #10940 (TestLifecycleHelper substrate migration) Cycle 1 review. Empirical full-unit-suite run (
CI=true npm run test-unit, matches CI workers:1 substrate) showedtest/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250flakes — passes on retry. This is a NEW surface, not in the prior Bucket G G5 triage matrix.The Problem
Test
getPullRequestDiff (#10748) › file parameter filters the diff output(line 250) flakes under workers:1. Specific failure shape needs empirical capture from a fresh CI run + log inspection — my Cycle 1 review ran the suite but the flake's first-attempt error wasn't pulled into evidence (Playwright reports flakes as "passed eventually" in summary; the failed attempt's trace is in test-results/ but not pulled).Triage hypotheses (not yet empirically verified):
ai/mcp/server/github-workflowand may share GraphqlService or similar singleton state. Same root-cause class as G5 patterns elsewhere.GraphqlService.queryfor the test scenario; under fullyParallel-ish ordering with workers:1, sibling spec's cleanup may not restore the originalquerybinding cleanly.getPullRequestDifffile-parameter filtering logic — if hypothesis 1+2 both fail, this is an actual production race.The Architectural Reality
test/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250— the flaky test (getPullRequestDiff #10748 file parameter filters the diff output)ai/mcp/server/github-workflow/services/PullRequestService.mjs#getPullRequestDiff— the production target (added in #10748)DiscussionService.spec.mjs(G5#1, fixed via #10929)ConfigCompleteness.spec.mjs(G4 cascade, fixed via #10929)IssueService.spec.mjs(Zod validator surface)The Fix (TBD via investigation)
Investigation steps:
gh run view ... --log-failedfor the first-attempt errorSkip-guarded in Phase 3 PR (forthcoming) per NEO_TEST_SKIP_CI pattern; this ticket tracks the proper investigation + fix.
Acceptance Criteria
CI=true npm run test-unitinvocationsOut of Scope
getPullRequestDiff(out of scope unless hypothesis 3 holds)Avoided Traps
Related
getPullRequestDiffintroduced)Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="PullRequestService getPullRequestDiff file parameter flake workers 1 #10940 CI 25542845735")