Context
#10805 closed 2026-05-07 with the test-integration harness merged via #10893. The CI execution side was explicitly deferred per the #10805 Out-of-Scope section: "CI workflow that runs npm run test-integration on PR — file as separate follow-up ticket; benefits all suites uniformly."
Empirical state of .github/workflows/:
close-inactive-issues.yml
codeql-analysis.yml
data-sync-pipeline.yml
npm-publish.yml
prevent-reopen.yml
Zero workflows run any npm test* script on PR. Every test type — unit, components, integration, e2e — is local-only today. The integration harness from #10805 is the most recent example: a substantial Docker stack + 1 spec, with no CI gate.
This is the canonical follow-up #10805 named, executed at the right elegance threshold: a single test.yml workflow with a matrix dimension that future test types plug into symmetrically — not a one-off test-integration.yml that locks in the asymmetry.
The Problem
- Regressions land in
dev undetected. Lane A's spec (#10893) is one example; PRs that touch MCP server code can break it without anyone noticing until next manual npm run test-integration. Same risk applies to all 200+ unit specs.
- Asymmetric workflow files. A standalone
test-integration.yml would lock in a per-test-type proliferation pattern. Every future addition (unit, components, e2e, whitebox-e2e) would mean a new workflow file. The matrix shape resolves this once.
- No shared setup pattern. Each test type currently has its own
playwright.config.*.mjs; a per-workflow setup duplicates Node version, npm install, and bundle-parse5 across files. Matrix with shared setup step is canonical.
- Docker availability handling. Integration tests skip with warning when Docker unavailable per
composeWebServer.mjs readiness gate. CI runner (ubuntu-latest) has Docker pre-installed; a matrix job for integration runs the real stack. Local opt-out preserved.
The Architectural Reality
- GitHub Actions runner availability:
ubuntu-latest has Docker + Docker Compose pre-installed; supports the ai/deploy/docker-compose.test.yml stack out-of-box.
- Node.js setup via
actions/setup-node@v4 with cached package-lock.json.
- Concurrency controls available via
concurrency.group to cancel superseded PR runs.
- Existing test scripts (
package.json):
test-unit — playwright test -c test/playwright/playwright.config.unit.mjs
test-components — playwright test -c test/playwright/playwright.config.component.mjs
test-integration — playwright test -c test/playwright/playwright.config.integration.mjs
test-e2e — playwright test -c test/playwright/playwright.config.e2e.mjs
- Existing CI patterns to mirror:
data-sync-pipeline.yml — uses Node 24, npm install, has a clear concurrency block.
codeql-analysis.yml — matrix-on-language pattern; same shape we want for matrix-on-test-type.
- Worktree bootstrap concern: CI clones fresh;
.neo-ai-data/ and gitignored MCP config.mjs files do NOT exist. Need node ai/scripts/bootstrapWorktree.mjs --link-data --install (or equivalent CI-friendly init) per the worktree-bootstrap memory anchor — except CI is not a worktree, so the --link-data step is no-op. Test scripts may need an npm prepare-equivalent bootstrap (npm run prepare already exists per package.json).
- Empirical install times (memory anchor
ChromaDB defrag perf adjacent: per worktree-bootstrap notes, npm install ~17s on populated cache, ~minutes cold). CI is cold-cache by default → use actions/setup-node cache for npm to bring this down.
The Fix
Single new workflow file: .github/workflows/test.yml
name: Tests
on:
pull_request:
branches: [dev, main]
push:
branches: [dev]
concurrency:
group: test-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
suite: [unit, integration]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 24
cache: npm
- run: npm ci
- run: npm run prepare
- name: Run ${{ matrix.suite }} tests
run: npm run test-${{ matrix.suite }}
env:
NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000
- name: Upload test artifacts on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.suite }}-${{ github.run_id }}
path: test/playwright/test-results/
Why this shape:
- Matrix-on-suite is the elegance — adding
components, e2e, or future whitebox-e2e to the matrix is a one-line change. Zero per-suite workflow proliferation.
fail-fast: false — failing one suite shouldn't kill visibility into the others. Each suite reports independently.
concurrency group with cancel-in-progress: true — superseded PR pushes cancel the prior run; saves CI minutes.
pull_request + push to dev — covers PR review gating AND post-merge verification. Doesn't run on main PRs (per data-sync-pipeline.yml precedent — only dev is the active branch).
- Artifact upload on failure — the
test-results/ directory has Playwright reports; uploading lets reviewers inspect without re-running locally.
NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000 — CI cold-Docker-build is slower than local; existing default 210000 may be tight on cold runner. Override is matrix-row-scoped via if: matrix.suite == 'integration' if we want to be precise — using global env keeps it simple at small cost.
Sequencing note re: sibling lanes
This workflow gates on npm run test-integration working — which assumes Lane A (#10895) and Lane B (#10896) specs are landed and passing locally. The CI workflow itself can land first (gating on the existing single Lane A spec); Lanes A + B's new specs flow through the same CI on their respective PRs.
Branch-protection follow-up (out-of-scope here, file separately)
Once this workflow is green on a few PRs, configure branch protection on dev to require the test (unit) + test (integration) checks. That's a repo-admin operation, not a code change — files as separate follow-up.
Contract Ledger (T3)
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
.github/workflows/test.yml (new) |
This ticket; #10805 explicit out-of-scope follow-up; @tobiu's lead-coordination directive 2026-05-07 |
GitHub Actions matrix workflow running npm run test-{unit,integration} on PR-to-dev and push-to-dev. Matrix-on-suite for symmetric coverage; fail-fast:false for independent suite reporting; concurrency-cancel for superseded runs; artifact upload on failure. |
If integration suite Docker stack times out, suite fails with diagnostic output (composeWebServer logs surface via Playwright webServer.stdout: 'pipe'). Unit suite remains independent. Future suite types (components, e2e, whitebox-e2e) plug into matrix as one-line additions. |
One-line README contributing-guide entry pointing to CI green requirement; cookbook (learn/agentos/DeploymentCookbook.md) Section 8 cross-link. |
L1 — workflow file exists and parses (GitHub Actions linter on push). L3 — workflow runs successfully on a test PR including this very change; both matrix rows green. |
Branch protection on dev requiring test (unit) + test (integration) checks (out-of-scope here) |
This ticket marks it as the natural follow-up; repo-admin operation |
Once workflow green on several PRs, require both check-conclusions for merge. Closes the gate. |
Admin override available per GitHub native control. |
Cookbook + contributing-guide reference. |
Out-of-scope; separate follow-up ticket. |
Acceptance Criteria
Out of Scope
- Branch protection configuration on
dev — repo-admin operation, not a code change. Filed as separate follow-up once workflow is green on multiple PRs.
- Adding
components / e2e / whitebox-e2e to the matrix — extension is a one-line addition once the matrix shape is in place. File as separate follow-ups per test-type as their stability matures.
- Caching
node_modules/ or Playwright browsers — possible optimization later; first version uses default setup-node cache for npm only.
- Self-hosted runners — public ubuntu-latest sufficient for current scale.
- Test result aggregation / dashboards — Playwright artifact upload covers reviewer-side inspection; aggregation is future ticket.
- CI for
npm run build-all / production bundle — different concern (bundle gating); separate follow-up.
Avoided Traps / Gold Standards Rejected
- Rejected: standalone
.github/workflows/test-integration.yml. Locks in per-test-type workflow proliferation. Every future test type would mean a new workflow file. Matrix-on-suite resolves this once.
- Rejected: bundle CI workflow into Lane A or Lane B PR. Cross-cutting concern; benefits all test types uniformly. Filing separately keeps each lane's PR focused on the substrate it owns.
- Rejected:
fail-fast: true on the matrix. Would cancel running suites if one fails; reviewers lose visibility. fail-fast: false is correct for parallel diagnostic info.
- Rejected:
npm install (instead of npm ci). npm ci is reproducible-from-lockfile; npm install is mutating. Reproducibility is the CI virtue.
- Rejected: skip Docker availability probe in CI.
composeWebServer.mjs already has the readiness gate; on ubuntu-latest, Docker is always available so the probe is a no-op. No special-casing needed.
- Rejected: per-matrix-row
if conditions for env vars. Adds complexity; the global env override (NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000) is harmless for unit tests and keeps the workflow file simple.
- Rejected: trigger on
push: main. Per data-sync-pipeline.yml precedent, dev is the active branch; main is the slower release branch. Nothing to gate on main PRs that wasn't already gated when they landed in dev.
Related
- Closes scope from: #10805 (explicit out-of-scope follow-up — "CI workflow that runs npm run test-integration on PR").
- Sibling lanes (filed concurrently): #10895 Lane A (tenant isolation + auth rejection), #10896 Lane B (sustained-liveness heartbeat).
- Substrate dependencies: PR #10880 (Docker artifacts), PR #10893 (Lane A vertical slice +
test-integration script).
- CI pattern precedent:
.github/workflows/data-sync-pipeline.yml (Node 24 + npm + concurrency), .github/workflows/codeql-analysis.yml (matrix shape).
- Cookbook cross-link target:
learn/agentos/DeploymentCookbook.md Section 8.
- Operator framing: lead-coordination handoff 2026-05-07 — "deployment pipelines + heartbeats" → this ticket gates the deployment-pipeline tests in CI.
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="CI workflow matrix unit integration test runner github actions follow-up #10805")
Context
#10805 closed 2026-05-07 with the test-integration harness merged via #10893. The CI execution side was explicitly deferred per the #10805 Out-of-Scope section: "CI workflow that runs
npm run test-integrationon PR — file as separate follow-up ticket; benefits all suites uniformly."Empirical state of
.github/workflows/:Zero workflows run any
npm test*script on PR. Every test type — unit, components, integration, e2e — is local-only today. The integration harness from #10805 is the most recent example: a substantial Docker stack + 1 spec, with no CI gate.This is the canonical follow-up #10805 named, executed at the right elegance threshold: a single
test.ymlworkflow with a matrix dimension that future test types plug into symmetrically — not a one-offtest-integration.ymlthat locks in the asymmetry.The Problem
devundetected. Lane A's spec (#10893) is one example; PRs that touch MCP server code can break it without anyone noticing until next manualnpm run test-integration. Same risk applies to all 200+ unit specs.test-integration.ymlwould lock in a per-test-type proliferation pattern. Every future addition (unit, components, e2e, whitebox-e2e) would mean a new workflow file. The matrix shape resolves this once.playwright.config.*.mjs; a per-workflow setup duplicates Node version, npm install, and bundle-parse5 across files. Matrix with sharedsetupstep is canonical.composeWebServer.mjsreadiness gate. CI runner (ubuntu-latest) has Docker pre-installed; a matrix job for integration runs the real stack. Local opt-out preserved.The Architectural Reality
ubuntu-latesthas Docker + Docker Compose pre-installed; supports theai/deploy/docker-compose.test.ymlstack out-of-box.actions/setup-node@v4with cachedpackage-lock.json.concurrency.groupto cancel superseded PR runs.package.json):test-unit—playwright test -c test/playwright/playwright.config.unit.mjstest-components—playwright test -c test/playwright/playwright.config.component.mjstest-integration—playwright test -c test/playwright/playwright.config.integration.mjstest-e2e—playwright test -c test/playwright/playwright.config.e2e.mjsdata-sync-pipeline.yml— uses Node 24, npm install, has a clearconcurrencyblock.codeql-analysis.yml— matrix-on-language pattern; same shape we want for matrix-on-test-type..neo-ai-data/and gitignored MCPconfig.mjsfiles do NOT exist. Neednode ai/scripts/bootstrapWorktree.mjs --link-data --install(or equivalent CI-friendly init) per the worktree-bootstrap memory anchor — except CI is not a worktree, so the--link-datastep is no-op. Test scripts may need annpm prepare-equivalent bootstrap (npm run preparealready exists per package.json).ChromaDB defrag perfadjacent: per worktree-bootstrap notes,npm install~17s on populated cache, ~minutes cold). CI is cold-cache by default → useactions/setup-nodecache fornpmto bring this down.The Fix
Single new workflow file:
.github/workflows/test.ymlname: Tests on: pull_request: branches: [dev, main] push: branches: [dev] concurrency: group: test-${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: test: runs-on: ubuntu-latest strategy: fail-fast: false matrix: suite: [unit, integration] steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 24 cache: npm - run: npm ci - run: npm run prepare # bundle-parse5 + setup steps - name: Run ${{ matrix.suite }} tests run: npm run test-${{ matrix.suite }} env: # Integration suite needs longer test timeout; matrix-aware NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000 - name: Upload test artifacts on failure if: failure() uses: actions/upload-artifact@v4 with: name: test-results-${{ matrix.suite }}-${{ github.run_id }} path: test/playwright/test-results/Why this shape:
components,e2e, or futurewhitebox-e2eto the matrix is a one-line change. Zero per-suite workflow proliferation.fail-fast: false— failing one suite shouldn't kill visibility into the others. Each suite reports independently.concurrencygroup withcancel-in-progress: true— superseded PR pushes cancel the prior run; saves CI minutes.pull_request+pushtodev— covers PR review gating AND post-merge verification. Doesn't run onmainPRs (per data-sync-pipeline.yml precedent — onlydevis the active branch).test-results/directory has Playwright reports; uploading lets reviewers inspect without re-running locally.NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000— CI cold-Docker-build is slower than local; existing default210000may be tight on cold runner. Override is matrix-row-scoped viaif: matrix.suite == 'integration'if we want to be precise — using global env keeps it simple at small cost.Sequencing note re: sibling lanes
This workflow gates on
npm run test-integrationworking — which assumes Lane A (#10895) and Lane B (#10896) specs are landed and passing locally. The CI workflow itself can land first (gating on the existing single Lane A spec); Lanes A + B's new specs flow through the same CI on their respective PRs.Branch-protection follow-up (out-of-scope here, file separately)
Once this workflow is green on a few PRs, configure branch protection on
devto require thetest (unit)+test (integration)checks. That's a repo-admin operation, not a code change — files as separate follow-up.Contract Ledger (T3)
.github/workflows/test.yml(new)npm run test-{unit,integration}on PR-to-devand push-to-dev. Matrix-on-suite for symmetric coverage;fail-fast:falsefor independent suite reporting; concurrency-cancel for superseded runs; artifact upload on failure.webServer.stdout: 'pipe'). Unit suite remains independent. Future suite types (components, e2e, whitebox-e2e) plug into matrix as one-line additions.learn/agentos/DeploymentCookbook.md) Section 8 cross-link.devrequiringtest (unit)+test (integration)checks (out-of-scope here)Acceptance Criteria
.github/workflows/test.ymlexists per Ledger row 1 with matrix-on-suite shape (unit,integration).npm ci+npm run prepare+npm run test-${suite}per matrix row.concurrency.group+cancel-in-progress: trueconfigured.actions/setup-node@v4withcache: npmconfigured.actions/upload-artifact@v4configured for failure path.devAND push-to-devper existingdata-sync-pipeline.ymlprecedent.package.jsontest scripts,playwright.config.*.mjsfiles, orcomposeWebServer.mjs(the workflow consumes existing surfaces).Out of Scope
dev— repo-admin operation, not a code change. Filed as separate follow-up once workflow is green on multiple PRs.components/e2e/whitebox-e2eto the matrix — extension is a one-line addition once the matrix shape is in place. File as separate follow-ups per test-type as their stability matures.node_modules/or Playwright browsers — possible optimization later; first version uses defaultsetup-nodecache fornpmonly.npm run build-all/ production bundle — different concern (bundle gating); separate follow-up.Avoided Traps / Gold Standards Rejected
.github/workflows/test-integration.yml. Locks in per-test-type workflow proliferation. Every future test type would mean a new workflow file. Matrix-on-suite resolves this once.fail-fast: trueon the matrix. Would cancel running suites if one fails; reviewers lose visibility.fail-fast: falseis correct for parallel diagnostic info.npm install(instead ofnpm ci).npm ciis reproducible-from-lockfile;npm installis mutating. Reproducibility is the CI virtue.composeWebServer.mjsalready has the readiness gate; on ubuntu-latest, Docker is always available so the probe is a no-op. No special-casing needed.ifconditions for env vars. Adds complexity; the global env override (NEO_INTEGRATION_STACK_TIMEOUT_MS: 240000) is harmless for unit tests and keeps the workflow file simple.push: main. Perdata-sync-pipeline.ymlprecedent,devis the active branch;mainis the slower release branch. Nothing to gate onmainPRs that wasn't already gated when they landed indev.Related
test-integrationscript)..github/workflows/data-sync-pipeline.yml(Node 24 + npm + concurrency),.github/workflows/codeql-analysis.yml(matrix shape).learn/agentos/DeploymentCookbook.mdSection 8.Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="CI workflow matrix unit integration test runner github actions follow-up #10805")