LearnNewsExamplesServices
Frontmatter
id10936
titleG5#2: KBRecorderService.spec singleton-data pollution under workers:1
stateClosed
labels
bugaitestingneeds-re-triage
assignees[]
createdAtMay 8, 2026, 12:12 AM
updatedAtMay 12, 2026, 4:09 AM
githubUrlhttps://github.com/neomjs/neo/issues/10936
authorneo-opus-4-7
commentsCount2
parentIssue10924
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 11, 2026, 1:33 AM

G5#2: KBRecorderService.spec singleton-data pollution under workers:1

Closedbugaitestingneeds-re-triage
neo-opus-4-7
neo-opus-4-7 commented on May 8, 2026, 12:12 AM

Context

Surfaced 2026-05-08 during PR #10933 (Phase 3 unit-row re-add) CI run 25524203756. Originally classified as G5#2 in the #10924 G5 triage matrix. Triage hypothesis was "auto-resolve post-G4-merge"; falsified empirically by the unit-row first activation in CI.

Skip-guarded in PR #10933 commit 8e6c3fd78 per the established #10907/#10921/#10928 pattern; this ticket tracks the proper investigation + fix.

The Problem

Test test/playwright/unit/ai/mcp/server/knowledge-base/services/KBRecorderService.spec.mjs:91 flakes with:

TypeError: Cannot read properties of undefined (reading '0')
    expect(listed.faqs[0].canonicalQuery).toContain('reactive');

listed.faqs[0] is undefined, meaning KBRecorderService.listAgentFaqs({minCount: 2}) returned {faqs: []} or {faqs: undefined}. The test wrote 3 entries to the singleton's kb_query_log via KBRecorderService.log(...) and called buildAgentFaqs first — both should have populated state.

workers:1 substrate amplifies singleton-state pollution. Other specs touch the same singleton (DreamService.spec.mjs writes kb_query_log per grep), and per CI's serial execution they share the SAME KBRecorderService.db connection state. If a sibling spec runs first and either closes/reopens the singleton's DB OR clears kb_query_log/kb_query_faqs tables in its own beforeEach/afterAll, the FAQ-cluster build in this spec returns no rows.

The Architectural Reality

  • test/playwright/unit/ai/mcp/server/knowledge-base/services/KBRecorderService.spec.mjs:91-129 — the flaky test
  • test/playwright/unit/ai/mcp/server/knowledge-base/services/KBRecorderService.spec.mjs:49-50beforeEach clears kb_query_log + kb_query_faqs for this spec, but doesn't account for sibling-spec mutations between this spec's tests
  • ai/mcp/server/knowledge-base/services/KBRecorderService.mjs — module-scope singleton (export default new KBRecorderService()); singleton's db connection persists across spec boundaries within one worker
  • Sibling spec writers verified via grep: test/playwright/unit/ai/daemons/DreamService.spec.mjs writes kb_query_log

The Fix (TBD via investigation)

Two candidate paths (mirrors the FileSystemIngestor #10934 prescription):

  1. Spec-level: switch test.describe.configure({mode: 'serial'}) and add beforeAll that re-initializes the singleton DB; or extract the singleton import into per-test factory.
  2. SDK-level: harden KBRecorderService.listAgentFaqs to return {faqs: []} defensively when the underlying query returns no rows (vs undefined); decouple test-isolation from singleton-data isolation by giving each test its own db instance.

Investigation needed before locking the prescription. The substrate-discipline lesson from PR #10933 review: re-measurement claims for singleton-pollution patterns MUST run with WORKERS=1 locally to match CI substrate.

Acceptance Criteria

  • (AC1) Empirically reproduce the flake locally with WORKERS=1 (matches CI substrate; default-workers parallelism does NOT reproduce)
  • (AC2) Identify which sibling spec(s) mutate kb_query_log/kb_query_faqs before this test runs
  • (AC3) Implement chosen fix path (spec-level OR SDK-level)
  • (AC4) Remove the NEO_TEST_SKIP_CI guard added in PR #10933 commit 8e6c3fd78
  • (AC5) Verify KBRecorderService.spec runs deterministically across 5 consecutive npm run test-unit invocations on CI substrate

Out of Scope

  • Migrating KBRecorderService to per-test instances (too aggressive; solve at sibling-spec or SDK layer first)
  • Cross-spec singleton-lifecycle audit (separate epic-shaped concern; this ticket is the targeted fix for KBRecorderService specifically)

Avoided Traps

  • Increasing retries: 2 → 5: papers over without addressing root cause
  • Disabling workers:1 in CI: that's a feature for deterministic singleton pollution detection, not a bug
  • Skip-guard as permanent solution: applied as immediate ship-the-PR move on PR #10933 + tracked here for proper fix

Related

  • Surfacing CI run: 25524203756
  • Triage origin: #10924 G5 row (G5#2)
  • Triage correction: #10924 comment 4401547656
  • Sibling state-pollution patterns: #10934 (FileSystemIngestor singleton SQLite-close), G5#3 sibling, #10935 (TransportService residual race)
  • Substrate config: test/playwright/playwright.config.unit.mjs workers: 1 in CI
  • Skip-guard commit: 8e6c3fd78 on PR #10933

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="KBRecorderService singleton kb_query_log workers 1 flake G5#2 #10924 PR 10933")

tobiu referenced in commit 98897fc - "feat(ci): re-add unit suite to matrix post-Bucket-G substrate (#10939) (#10953) on May 8, 2026, 2:43 PM