Context
Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first unit matrix-row run. ~30 unit tests fail in a truly clean CI environment (ubuntu-latest, fresh checkout, no .neo-ai-data/ content). Locally these tests pass for developers because their machines carry accumulated substrate data that CI lacks.
This is a silent CI-locality coupling that has been hidden until automated CI gating arrived.
The Problem
Lane C CI run 25497549380 unit job reported:
1006 passed (5.2m)
~30 failed
5 flaky
11 skipped
6 did not run
1 error was not part of any test
Failure clusters observed in the log:
Native Edge Graph collection-not-initialized errors:
IssueIngestor GET error: Error: [CollectionProxy] get() failed:
No underlying collection available for type 'graph'
at test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:95:9
The SQLite graph collection isn't initialized in CI. The test depends on a substrate-bootstrap step that doesn't fire in clean env.
MCP Server Health (McpServersHealth.spec.mjs): 'github-workflow', 'knowledge-base', 'memory-core' server bootstrap tests all fail. Likely missing config, env vars, or substrate paths.
MCP Server Isolation (McpServersIsolation.spec.mjs): "Broken server should still boot in degraded mode" + "Healthy server unaffected by sibling failure" — depend on the same MCP server bootstrap path as cluster 2.
MCP Authorization (Authorization.spec.mjs): 'should allow access with a valid token' fails — auth substrate gap.
GitHub Workflow PullRequestService (PullRequestService.spec.mjs): getPullRequestDiff tests fail (#10748 substrate). Likely needs gh CLI setup or test-env GitHub auth.
Memory Core Services: QueryReRanker, SessionService, SessionService.ResumeValidation, SessionSummarization (#10725 + #10643 + #10673 substrate) — all depend on Memory Core data substrate.
DestructiveOperationGuard (DestructiveOperationGuard.spec.mjs): #10845 substrate — depends on memory-core graph being initialized.
AI Scripts (checkAllAgentIdle, checkSunsetted, resumeHarness): #10643, #10673, #10674, #10677 substrate — agent-state SQLite tables.
Grid tests (LockedColumns, Pooling, Teleportation): Frontend tests that likely need Playwright browser binaries (the CI workflow doesn't currently run playwright install).
The Architectural Reality
Test substrate dependencies fall into 4 buckets:
- (a) Substrate-data — needs pre-populated SQLite collections, ChromaDB rows, or
.neo-ai-data/ content.
- (b) Browser binaries — Playwright tests using
chromium need npx playwright install step.
- (c) External tools — gh CLI auth, git config, etc. Some are present on ubuntu-latest by default; some need explicit setup.
- (d) Env vars — provider API keys, OIDC config, etc. Test fixtures may rely on
process.env.* that's absent in CI.
Why locally passes ≠ CI passes: developers accumulate substrate over months — graph SQLite has rows, Chroma collections exist, .neo-ai-data/ has content. CI starts each run from git clone + npm ci + this workflow's empty .neo-ai-data/ guard (which I added in #10897 to skip the heavyweight KB download).
Per-test-isolation discipline gap: strong isolation would mock these substrates in unit tests. The current test corpus depends on shared substrate state — passes the local-dev experience but breaks the unit-test isolation contract.
The Fix (Investigation-Shaped)
This ticket is diagnostic-first rather than prescription-first because the right fix depends on per-cluster investigation. Estimated 4 sub-fixes correspond to the 4 buckets above:
Investigation Phase
- Triage each failing test into bucket (a)/(b)/(c)/(d) by reading the spec source + observed CI failure output.
- For bucket (b) — Playwright browsers: confirm hypothesis via local
npx playwright install skip-test then re-run. If correct, fix is ONE line in Lane C workflow: add npx playwright install --with-deps chromium step.
- For bucket (a) — substrate-data: identify which specs lack proper test-fixture setup. Per-spec fixture additions OR shared bootstrap step.
- For bucket (c) — external tools: audit ubuntu-latest defaults vs assumed availability.
- For bucket (d) — env vars: audit which env vars tests assume; document required CI secrets vs provide test-mode defaults.
Fix Phase (per-bucket sub-PRs)
Each bucket likely needs its own dedicated sub-PR for cross-family review. Investigation-phase output: list of file:line references + per-spec bucket assignment + per-bucket fix shape.
Coordination with Lane C (#10899)
Lane C will drop unit from the matrix until this audit's per-bucket fixes land, then add unit back as a matrix row in a follow-up PR. Avoids landing a CI workflow with known-failing matrix rows.
Acceptance Criteria
Out of Scope
- Fixing all bucket failures in this single ticket — investigation-shaped, not prescription-shaped. Each bucket gets its own ticket+PR.
- CI workflow design — Lane C #10897 already owns that surface; this ticket is about test-substrate alignment.
- Refactoring the unit test corpus to use proper mocking everywhere — a "long arc" architectural concern; this audit fixes the immediate clean-CI-pass gap, deeper isolation refactor is future ticket.
- Frontend Grid tests if they require browser substrate beyond
playwright install — may surface as needing OffscreenCanvas, font rendering, etc.; if discovered, file as separate ticket per-bucket.
Avoided Traps / Gold Standards Rejected
- Rejected: bundle all sub-fixes into a single PR. Cross-cutting concern affecting 4+ subsystems (DreamService, MCP servers, Memory Core, Grid). Per-bucket PRs keep cross-family review focused.
- Rejected: skip the failing tests via
test.skip annotations. Treats symptom not cause; future regressions hide. Substrate fixes preserve coverage.
- Rejected: provision substrate at CI workflow level (Lane C #10897) — couples test substrate to CI infra. Test fixtures should be self-contained per-spec (bucket a) or workflow-level only for tools (bucket b).
- Rejected: assume all failures are CI-environment issues rather than real test bugs. Some may be: investigation phase will distinguish.
Related
- Surfacing context: Lane C CI workflow PR #10899 (closes #10897) — first clean-CI run revealing these failures.
- Sibling substrate fix (CI-prerequisite): #10902 (Dockerfile prepare-lifecycle bug) — also surfaced by Lane C CI; integration matrix row blocked until that fix.
- Existing related ticket: #10714 (Codex sandbox bootstrap probe for
.neo-ai-data/sqlite access) — adjacent shape (per-harness substrate-bootstrap), different cause (sandbox permissions vs CI clean env).
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="unit test failures clean CI substrate data .neo-ai-data Lane C audit per-bucket investigation")
Context
Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first
unitmatrix-row run. ~30 unit tests fail in a truly clean CI environment (ubuntu-latest, fresh checkout, no.neo-ai-data/content). Locally these tests pass for developers because their machines carry accumulated substrate data that CI lacks.This is a silent CI-locality coupling that has been hidden until automated CI gating arrived.
The Problem
Lane C CI run 25497549380 unit job reported:
Failure clusters observed in the log:
Native Edge Graph collection-not-initialized errors:
IssueIngestor GET error: Error: [CollectionProxy] get() failed: No underlying collection available for type 'graph' at test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:95:9The SQLite
graphcollection isn't initialized in CI. The test depends on a substrate-bootstrap step that doesn't fire in clean env.MCP Server Health (
McpServersHealth.spec.mjs):'github-workflow','knowledge-base','memory-core'server bootstrap tests all fail. Likely missing config, env vars, or substrate paths.MCP Server Isolation (
McpServersIsolation.spec.mjs): "Broken server should still boot in degraded mode" + "Healthy server unaffected by sibling failure" — depend on the same MCP server bootstrap path as cluster 2.MCP Authorization (
Authorization.spec.mjs):'should allow access with a valid token'fails — auth substrate gap.GitHub Workflow PullRequestService (
PullRequestService.spec.mjs):getPullRequestDifftests fail (#10748 substrate). Likely needs gh CLI setup or test-env GitHub auth.Memory Core Services:
QueryReRanker,SessionService,SessionService.ResumeValidation,SessionSummarization(#10725 + #10643 + #10673 substrate) — all depend on Memory Core data substrate.DestructiveOperationGuard (
DestructiveOperationGuard.spec.mjs):#10845substrate — depends on memory-core graph being initialized.AI Scripts (
checkAllAgentIdle,checkSunsetted,resumeHarness):#10643,#10673,#10674,#10677substrate — agent-state SQLite tables.Grid tests (
LockedColumns,Pooling,Teleportation): Frontend tests that likely need Playwright browser binaries (the CI workflow doesn't currently runplaywright install).The Architectural Reality
Test substrate dependencies fall into 4 buckets:
.neo-ai-data/content.chromiumneednpx playwright installstep.process.env.*that's absent in CI.Why locally passes ≠ CI passes: developers accumulate substrate over months — graph SQLite has rows, Chroma collections exist,
.neo-ai-data/has content. CI starts each run fromgit clone+npm ci+ this workflow's empty.neo-ai-data/guard (which I added in #10897 to skip the heavyweight KB download).Per-test-isolation discipline gap: strong isolation would mock these substrates in unit tests. The current test corpus depends on shared substrate state — passes the local-dev experience but breaks the unit-test isolation contract.
The Fix (Investigation-Shaped)
This ticket is diagnostic-first rather than prescription-first because the right fix depends on per-cluster investigation. Estimated 4 sub-fixes correspond to the 4 buckets above:
Investigation Phase
npx playwright installskip-test then re-run. If correct, fix is ONE line in Lane C workflow: addnpx playwright install --with-deps chromiumstep.Fix Phase (per-bucket sub-PRs)
Each bucket likely needs its own dedicated sub-PR for cross-family review. Investigation-phase output: list of file:line references + per-spec bucket assignment + per-bucket fix shape.
Coordination with Lane C (#10899)
Lane C will drop
unitfrom the matrix until this audit's per-bucket fixes land, then addunitback as a matrix row in a follow-up PR. Avoids landing a CI workflow with known-failing matrix rows.Acceptance Criteria
dev, follow-up PR addsunitback to Lane C's matrix; CI green onunitmatrix row demonstrated empirically.learn/guides/testing/UnitTesting.md— what it takes for a unit test to pass in clean CI, what fixtures are required, what's the boundary between "needs substrate" and "should mock".Out of Scope
playwright install— may surface as needing OffscreenCanvas, font rendering, etc.; if discovered, file as separate ticket per-bucket.Avoided Traps / Gold Standards Rejected
test.skipannotations. Treats symptom not cause; future regressions hide. Substrate fixes preserve coverage.Related
.neo-ai-data/sqliteaccess) — adjacent shape (per-harness substrate-bootstrap), different cause (sandbox permissions vs CI clean env).Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="unit test failures clean CI substrate data .neo-ai-data Lane C audit per-bucket investigation")