LearnNewsExamplesServices
Frontmatter
id10942
titlePullRequestService.spec:250 flake — getPullRequestDiff file parameter filter under workers:1
stateClosed
labels
bugaitestingneeds-re-triage
assignees[]
createdAtMay 8, 2026, 9:36 AM
updatedAtMay 12, 2026, 4:09 AM
githubUrlhttps://github.com/neomjs/neo/issues/10942
authorneo-opus-4-7
commentsCount2
parentIssue10924
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 11, 2026, 1:08 AM

PullRequestService.spec:250 flake — getPullRequestDiff file parameter filter under workers:1

Closedbugaitestingneeds-re-triage
neo-opus-4-7
neo-opus-4-7 commented on May 8, 2026, 9:36 AM

Context

Surfaced 2026-05-08 during PR #10940 (TestLifecycleHelper substrate migration) Cycle 1 review. Empirical full-unit-suite run (CI=true npm run test-unit, matches CI workers:1 substrate) showed test/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250 flakes — passes on retry. This is a NEW surface, not in the prior Bucket G G5 triage matrix.

The Problem

Test getPullRequestDiff (#10748) › file parameter filters the diff output (line 250) flakes under workers:1. Specific failure shape needs empirical capture from a fresh CI run + log inspection — my Cycle 1 review ran the suite but the flake's first-attempt error wasn't pulled into evidence (Playwright reports flakes as "passed eventually" in summary; the failed attempt's trace is in test-results/ but not pulled).

Triage hypotheses (not yet empirically verified):

  1. Singleton state pollution from sibling github-workflow specs — DiscussionService.spec, ConfigCompleteness.spec, IssueService.spec, etc. all live under ai/mcp/server/github-workflow and may share GraphqlService or similar singleton state. Same root-cause class as G5 patterns elsewhere.
  2. Race in GraphqlService.query mocking — the test likely overrides GraphqlService.query for the test scenario; under fullyParallel-ish ordering with workers:1, sibling spec's cleanup may not restore the original query binding cleanly.
  3. Genuine race in getPullRequestDiff file-parameter filtering logic — if hypothesis 1+2 both fail, this is an actual production race.

The Architectural Reality

  • test/playwright/unit/ai/mcp/server/github-workflow/PullRequestService.spec.mjs:250 — the flaky test (getPullRequestDiff #10748 file parameter filters the diff output)
  • ai/mcp/server/github-workflow/services/PullRequestService.mjs#getPullRequestDiff — the production target (added in #10748)
  • Sibling github-workflow specs that share GraphqlService:
    • DiscussionService.spec.mjs (G5#1, fixed via #10929)
    • ConfigCompleteness.spec.mjs (G4 cascade, fixed via #10929)
    • IssueService.spec.mjs (Zod validator surface)

The Fix (TBD via investigation)

Investigation steps:

  1. Pull the actual failure trace from a CI run — gh run view ... --log-failed for the first-attempt error
  2. Categorize: GraphqlService mock leak vs new substrate vs production race
  3. Implement targeted fix; coordinate with @neo-gpt if it's another github-workflow Config namespace cascade (his G4 lane)

Skip-guarded in Phase 3 PR (forthcoming) per NEO_TEST_SKIP_CI pattern; this ticket tracks the proper investigation + fix.

Acceptance Criteria

  • (AC1) Pull empirical failure trace from a CI run (the first-attempt error before retry passes)
  • (AC2) Categorize root cause hypothesis (singleton-pollution / mock-leak / production-race)
  • (AC3) Implement targeted fix
  • (AC4) Remove the NEO_TEST_SKIP_CI guard from PullRequestService.spec.mjs:250 in Phase 3 PR (forthcoming)
  • (AC5) Verify PullRequestService.spec:250 runs deterministically across 5 consecutive CI=true npm run test-unit invocations

Out of Scope

  • Migrating PullRequestService.spec to TestLifecycleHelper unconditionally (unknown if applicable until hypothesis is identified)
  • Rewriting getPullRequestDiff (out of scope unless hypothesis 3 holds)

Avoided Traps

  • Pre-emptive TestLifecycleHelper migration: rejected without empirical hypothesis evidence. Migration is the right shape for singleton-state-pollution but unnecessary if the shape is mock-leak or production race.
  • Skip-guard as permanent solution: applied as immediate Phase 3 ship-the-PR move + tracked here for proper fix.

Related

  • Surfacing CI run: my empirical CI=true full-suite run on PR #10940 c.7f454ee2 (1055 passed, 4 flaky)
  • Cycle 1 review citation: PR #10940 commentId IC_kwDODSospM8AAAABBodQ9A — listed as new surface not in prior triage
  • Production target: #10748 (getPullRequestDiff introduced)
  • Sibling github-workflow flake patterns: #10929 G4 namespace collision (resolved), G5#1 cascade (resolved)
  • Bucket G epic: #10924

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="PullRequestService getPullRequestDiff file parameter flake workers 1 #10940 CI 25542845735")

tobiu referenced in commit 98897fc - "feat(ci): re-add unit suite to matrix post-Bucket-G substrate (#10939) (#10953) on May 8, 2026, 2:43 PM