LearnNewsExamplesServices
Frontmatter
id10903
titleAudit unit-test failures in clean CI env (substrate-data + browser gaps)
stateClosed
labels
bugaitestingarchitecture
assigneesneo-gpt
createdAtMay 7, 2026, 3:23 PM
updatedAtMay 7, 2026, 8:21 PM
githubUrlhttps://github.com/neomjs/neo/issues/10903
authorneo-opus-4-7
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 8:21 PM

Audit unit-test failures in clean CI env (substrate-data + browser gaps)

Closedbugaitestingarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 7, 2026, 3:23 PM

Context

Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first unit matrix-row run. ~30 unit tests fail in a truly clean CI environment (ubuntu-latest, fresh checkout, no .neo-ai-data/ content). Locally these tests pass for developers because their machines carry accumulated substrate data that CI lacks.

This is a silent CI-locality coupling that has been hidden until automated CI gating arrived.

The Problem

Lane C CI run 25497549380 unit job reported:

1006 passed (5.2m)
~30 failed
5 flaky
11 skipped
6 did not run
1 error was not part of any test

Failure clusters observed in the log:

  1. Native Edge Graph collection-not-initialized errors:

       IssueIngestor GET error: Error: [CollectionProxy] get() failed: 
        No underlying collection available for type 'graph'
        at test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:95:9

    The SQLite graph collection isn't initialized in CI. The test depends on a substrate-bootstrap step that doesn't fire in clean env.

  2. MCP Server Health (McpServersHealth.spec.mjs): 'github-workflow', 'knowledge-base', 'memory-core' server bootstrap tests all fail. Likely missing config, env vars, or substrate paths.

  3. MCP Server Isolation (McpServersIsolation.spec.mjs): "Broken server should still boot in degraded mode" + "Healthy server unaffected by sibling failure" — depend on the same MCP server bootstrap path as cluster 2.

  4. MCP Authorization (Authorization.spec.mjs): 'should allow access with a valid token' fails — auth substrate gap.

  5. GitHub Workflow PullRequestService (PullRequestService.spec.mjs): getPullRequestDiff tests fail (#10748 substrate). Likely needs gh CLI setup or test-env GitHub auth.

  6. Memory Core Services: QueryReRanker, SessionService, SessionService.ResumeValidation, SessionSummarization (#10725 + #10643 + #10673 substrate) — all depend on Memory Core data substrate.

  7. DestructiveOperationGuard (DestructiveOperationGuard.spec.mjs): #10845 substrate — depends on memory-core graph being initialized.

  8. AI Scripts (checkAllAgentIdle, checkSunsetted, resumeHarness): #10643, #10673, #10674, #10677 substrate — agent-state SQLite tables.

  9. Grid tests (LockedColumns, Pooling, Teleportation): Frontend tests that likely need Playwright browser binaries (the CI workflow doesn't currently run playwright install).

The Architectural Reality

  • Test substrate dependencies fall into 4 buckets:

    • (a) Substrate-data — needs pre-populated SQLite collections, ChromaDB rows, or .neo-ai-data/ content.
    • (b) Browser binaries — Playwright tests using chromium need npx playwright install step.
    • (c) External tools — gh CLI auth, git config, etc. Some are present on ubuntu-latest by default; some need explicit setup.
    • (d) Env vars — provider API keys, OIDC config, etc. Test fixtures may rely on process.env.* that's absent in CI.
  • Why locally passes ≠ CI passes: developers accumulate substrate over months — graph SQLite has rows, Chroma collections exist, .neo-ai-data/ has content. CI starts each run from git clone + npm ci + this workflow's empty .neo-ai-data/ guard (which I added in #10897 to skip the heavyweight KB download).

  • Per-test-isolation discipline gap: strong isolation would mock these substrates in unit tests. The current test corpus depends on shared substrate state — passes the local-dev experience but breaks the unit-test isolation contract.

The Fix (Investigation-Shaped)

This ticket is diagnostic-first rather than prescription-first because the right fix depends on per-cluster investigation. Estimated 4 sub-fixes correspond to the 4 buckets above:

Investigation Phase

  1. Triage each failing test into bucket (a)/(b)/(c)/(d) by reading the spec source + observed CI failure output.
  2. For bucket (b) — Playwright browsers: confirm hypothesis via local npx playwright install skip-test then re-run. If correct, fix is ONE line in Lane C workflow: add npx playwright install --with-deps chromium step.
  3. For bucket (a) — substrate-data: identify which specs lack proper test-fixture setup. Per-spec fixture additions OR shared bootstrap step.
  4. For bucket (c) — external tools: audit ubuntu-latest defaults vs assumed availability.
  5. For bucket (d) — env vars: audit which env vars tests assume; document required CI secrets vs provide test-mode defaults.

Fix Phase (per-bucket sub-PRs)

Each bucket likely needs its own dedicated sub-PR for cross-family review. Investigation-phase output: list of file:line references + per-spec bucket assignment + per-bucket fix shape.

Coordination with Lane C (#10899)

Lane C will drop unit from the matrix until this audit's per-bucket fixes land, then add unit back as a matrix row in a follow-up PR. Avoids landing a CI workflow with known-failing matrix rows.

Acceptance Criteria

  • Phase 1 (Investigation): Triage table mapping each of the ~30 failing tests to bucket (a)/(b)/(c)/(d), with file:line references and observed-failure-shape per spec.
  • Phase 2 (Per-bucket fixes): Sub-tickets filed per bucket with concrete prescriptions (sub-tickets may merge concurrently or sequentially as they touch different surfaces).
  • Phase 3 (Lane C re-enablement): Once per-bucket fixes land in dev, follow-up PR adds unit back to Lane C's matrix; CI green on unit matrix row demonstrated empirically.
  • Optional: Document the substrate-vs-isolation discipline gap in learn/guides/testing/UnitTesting.md — what it takes for a unit test to pass in clean CI, what fixtures are required, what's the boundary between "needs substrate" and "should mock".

Out of Scope

  • Fixing all bucket failures in this single ticket — investigation-shaped, not prescription-shaped. Each bucket gets its own ticket+PR.
  • CI workflow design — Lane C #10897 already owns that surface; this ticket is about test-substrate alignment.
  • Refactoring the unit test corpus to use proper mocking everywhere — a "long arc" architectural concern; this audit fixes the immediate clean-CI-pass gap, deeper isolation refactor is future ticket.
  • Frontend Grid tests if they require browser substrate beyond playwright install — may surface as needing OffscreenCanvas, font rendering, etc.; if discovered, file as separate ticket per-bucket.

Avoided Traps / Gold Standards Rejected

  • Rejected: bundle all sub-fixes into a single PR. Cross-cutting concern affecting 4+ subsystems (DreamService, MCP servers, Memory Core, Grid). Per-bucket PRs keep cross-family review focused.
  • Rejected: skip the failing tests via test.skip annotations. Treats symptom not cause; future regressions hide. Substrate fixes preserve coverage.
  • Rejected: provision substrate at CI workflow level (Lane C #10897) — couples test substrate to CI infra. Test fixtures should be self-contained per-spec (bucket a) or workflow-level only for tools (bucket b).
  • Rejected: assume all failures are CI-environment issues rather than real test bugs. Some may be: investigation phase will distinguish.

Related

  • Surfacing context: Lane C CI workflow PR #10899 (closes #10897) — first clean-CI run revealing these failures.
  • Sibling substrate fix (CI-prerequisite): #10902 (Dockerfile prepare-lifecycle bug) — also surfaced by Lane C CI; integration matrix row blocked until that fix.
  • Existing related ticket: #10714 (Codex sandbox bootstrap probe for .neo-ai-data/sqlite access) — adjacent shape (per-harness substrate-bootstrap), different cause (sandbox permissions vs CI clean env).

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="unit test failures clean CI substrate data .neo-ai-data Lane C audit per-bucket investigation")

tobiu referenced in commit 30ad551 - "feat(testing): NEO_TEST_SKIP_CI guard for heavy-SLM + auth specs (#10903) (#10907) on May 7, 2026, 5:58 PM
tobiu referenced in commit f039bbb - "test(ci): add bucket c clean-ci skip guards (#10903) (#10910) on May 7, 2026, 6:03 PM
tobiu referenced in commit 4fb4bca - "feat(ci): test matrix workflow gating PRs on unit + integration suites (#10897) (#10899) on May 7, 2026, 8:19 PM
tobiu closed this issue on May 7, 2026, 8:21 PM
tobiu referenced in commit 4cb5184 - "test(ci): add bucket B+D clean-ci skip guards (#10903) (#10921) on May 7, 2026, 8:21 PM
tobiu referenced in commit 98897fc - "feat(ci): re-add unit suite to matrix post-Bucket-G substrate (#10939) (#10953) on May 8, 2026, 2:43 PM