LearnNewsExamplesServices
Frontmatter
id10946
titleDreamServiceGoldenPath.spec:77 residual flake post-#10940 engine=''hybrid'' fix
stateClosed
labels
bugaitesting
assigneesneo-opus-4-7
createdAtMay 8, 2026, 12:46 PM
updatedAtMay 12, 2026, 4:09 AM
githubUrlhttps://github.com/neomjs/neo/issues/10946
authorneo-opus-4-7
commentsCount4
parentIssue10924
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 11, 2026, 6:19 AM

DreamServiceGoldenPath.spec:77 residual flake post-#10940 engine='hybrid' fix

Closedbugaitesting
neo-opus-4-7
neo-opus-4-7 commented on May 8, 2026, 12:46 PM

Context

Surfaced 2026-05-08 during 5-iteration AC4-strict verification on origin/dev at a771afdb0 ("test(memory-core): Stabilize daemon specs via non-destructive cleanup and engine correction (#10924)" — Gemini's TestLifecycleHelper daemon-spec migration via PR #10940).

The synthesizeGoldenPath executes without crashing test was originally a HARD failure under the prior config (aiConfig.engine = 'neo', which skipped ChromaManager init per the April 15/30 Hybrid RAG refactor — topology.frontier came back undefined). Gemini's fix at a771afdb0 correctly switched aiConfig.engine to 'hybrid' and removed the NEO_TEST_SKIP_CI describe-skip guard, taking the test from never-passing to mostly-passing.

Empirical post-fix state: the test now flakes 5/5 runs (passes on retry every time, but fails first attempt). It is NOT a hard fail — Playwright counts it as flaky-but-passed, so CI exit code stays 0. But it produces noise + retry cost on every CI run.

The Problem

test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:77 — the synthesizeGoldenPath executes without crashing test — flakes deterministically across all 5 post-fix iterations. The first-attempt failure mode needs investigation; it likely traces to a race between DreamService.ingestIssueStates()synthesizeGoldenPath()getContextFrontier() lookup at lines 80-93, where the frontier node setup is not synchronously settled before the topology read.

The failure is NOT a singleton-pollution / close+null pattern (those were addressed by PR #10940). The fix was scoped to engine-config + helper migration, not to the synchronization semantics inside DreamService's golden-path synthesis pipeline.

The Architectural Reality

  • test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:77-94 — the flaky test (single test in the describe)
  • ai/daemons/DreamService.mjs#synthesizeGoldenPath — the producer of the topology being asserted
  • ai/mcp/server/memory-core/services/GraphService.mjs#getContextFrontier — the consumer that reads the topology
  • Likely race surface: the frontier node is upserted during synthesizeGoldenPath but the read happens before the SQLite-side persistence settles (similar shape to the WAL-snapshot lag pattern referenced elsewhere in GraphService.spec at the linkNodes cache-warm test)
  • The await new Promise(resolve => setTimeout(resolve, 50)) pattern used elsewhere in GraphService.spec for SQLite settling is NOT present in this test

The Fix (Investigation-Shaped)

Two candidate paths:

  1. Spec-level: add explicit await settling between synthesizeGoldenPath() and the topology read (the 50ms-settle pattern used in sibling specs)
  2. Producer-level: ensure DreamService.synthesizeGoldenPath() returns a promise that only resolves after the frontier-node + GUIDES edges are observable in GraphService.getContextFrontier() — i.e., propagate the SQLite-write barrier into the synthesizeGoldenPath contract

Option 2 is structurally cleaner (test doesn't paper over a producer-side race) but larger scope. Option 1 is the immediate Phase 3 (#10939) skip-guard alternative if the investigation defers.

Acceptance Criteria

  • (AC1) Empirically reproduce flake locally with CI=true npm run test-unit × 5 iterations on origin/dev
  • (AC2) Identify the race surface — is it synthesizeGoldenPath not awaiting all ingest writes, or getContextFrontier not seeing settled SQLite state?
  • (AC3) Choose fix path (spec-settle vs producer-barrier) with rationale documented
  • (AC4) Verify DreamServiceGoldenPath.spec:77 runs deterministically (no flake, no retry needed) across 5 consecutive CI=true npm run test-unit invocations
  • (AC5) Remove any skip-guard added in Phase 3 PR (#10939) once fix lands

Out of Scope

  • Re-investigation of singleton-pollution patterns (covered by #10941 + #10936 + #10937)
  • Migration of DreamService to per-test instances (architectural change beyond AC scope)
  • Fixing other unmasked-by-#10940 surfaces (one ticket per surface per substrate-discipline)

Avoided Traps

  • Treating as duplicate of #10941: rejected — different consumer surface (DreamService synthesizeGoldenPath vs GraphService.getNeighbors), different root-cause class (synthesis-write race vs singleton-pollution). Both unmask via PR #10940 but differ in fix layer.
  • Skip-guard restoration: PR #10940 deliberately removed the bucket-G3 skip-guard because the engine-config fix resolves the hard fail. Restoring the skip-guard would lose the empirical signal that the engine fix worked. If Phase 3 needs a skip-guard, it should reference THIS ticket, not bucket-G3.
  • Increasing Playwright retries: doesn't address the root cause; just papers over.

Related

  • Surfacing: post-#10940 verification at a771afdb0 (5-iteration empirical evidence; documented in Phase 3 PR #10939 body when re-filed)
  • Substrate fix that caused the unmasking: PR #10940 (TestLifecycleHelper substrate primitive + DreamServiceGoldenPath engine fix)
  • Sibling residual: #10941 (GraphService.spec:107 — different surface, different hypothesis class)
  • Phase 3 PR: #10939 (will reference this ticket if a skip-guard is needed on first attempt)
  • Bucket G epic: #10924

Origin Session ID: 005b6edf-85d8-4980-9e17-486b6b8bed3f

Retrieval Hint: query_raw_memories(query="DreamServiceGoldenPath synthesizeGoldenPath flake post-PR-10940 engine hybrid frontier topology race")

tobiu referenced in commit 98897fc - "feat(ci): re-add unit suite to matrix post-Bucket-G substrate (#10939) (#10953) on May 8, 2026, 2:43 PM
tobiu referenced in commit bcf5dc2 - "fix(test): restore singleton state in DreamService.spec afterAll + getContextFrontier patch (#10946) (#11173) on May 11, 2026, 6:19 AM
tobiu closed this issue on May 11, 2026, 6:19 AM