Context
Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first integration-job run. The Dockerfile in ai/deploy/Dockerfile (shipped via #10880) fails to build in any clean CI environment.
The Problem
The Dockerfile's COPY/RUN ordering creates a hard dependency cycle:
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
The prepare script in package.json:57 is:
"prepare": "node ./buildScripts/ai/initServerConfigs.mjs && node ./buildScripts/ai/downloadKnowledgeBase.mjs"
npm ci automatically invokes prepare after install. At that step in the Dockerfile, only package*.json exists in /app — buildScripts/ hasn't been COPY'd yet. The script fails with:
Error: Cannot find module '/app/buildScripts/ai/initServerConfigs.mjs'
at Function._resolveFilename (node:internal/modules/cjs/loader:1383:15)
...
npm error code 1
target mc-server: failed to solve: process "/bin/sh -c npm ci --omit=dev" did not complete successfully: exit code: 1
Empirical anchor: Lane C CI run 25497549380 — both kb-server and mc-server Docker builds fail at the same step.
Why this wasn't caught earlier
#10880 (Docker artifacts) and #10893 (Lane A integration test using these artifacts) both passed local verification because:
- Local Docker has cached layers — when the developer ran
docker compose -f ai/deploy/docker-compose.test.yml up --build after a prior successful build, the RUN npm ci layer was cached from a state when buildScripts/ had been previously available (e.g., during local dev with full repo).
- Local
.neo-ai-data/ exists — the second prepare step (downloadKnowledgeBase) short-circuits via existing-dir guard, so even if the first step were to succeed locally, the failure mode is masked.
- No CI gate existed — until #10899, no automated environment was building from a truly clean state. Local builds from scratch would have surfaced this immediately.
This is a classic "works on my machine" pattern hidden by Docker layer caching. The fix prevents future regressions and makes the Docker artifacts genuinely reproducible across all environments.
The Architectural Reality
ai/deploy/Dockerfile — 21 lines, multi-stage build (builder → runtime).
- The Docker images are for production deployment of MCP servers (KB / MC). They do NOT need:
initServerConfigs.mjs — copies template configs to gitignored config.mjs. Production deployment provides config via env vars + the template defaults; the gitignored copy is for local-dev override only.
downloadKnowledgeBase.mjs — downloads release-artifact KB zip into .neo-ai-data/. Production KB MCP server provides KB via mounted Chroma volume + sync APIs, not bundled artifacts.
- Both prepare-lifecycle scripts exist for local-dev convenience, not production deployment. They should not run inside the Docker build.
The Fix
Single-line change to ai/deploy/Dockerfile:5:
- RUN npm ci --omit=dev
+ RUN npm ci --omit=dev --ignore-scripts
--ignore-scripts instructs npm to skip lifecycle hooks (prepare, postinstall, etc.) during install. Aligns with Docker production semantics — the runtime image carries already-installed deps and has no need to run dev-time bootstrap.
Why not "move COPY . . before RUN npm ci"? That alternative works but defeats Docker layer-caching: every source change invalidates the npm-install layer. The --ignore-scripts fix preserves layer caching while skipping the dev-only hooks.
Why not "split prepare into prepare:dev + prepare:prod"? Larger surgery; introduces npm-script proliferation. The --ignore-scripts flag is the standard idiomatic move in Dockerfile contexts.
Contract Ledger (T3)
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
ai/deploy/Dockerfile line 5 |
This ticket; surfaced by #10897 Lane C CI run |
Add --ignore-scripts flag to npm ci --omit=dev. Both kb-server and mc-server builds inherit the fix (same Dockerfile, parameterized via TARGET_SERVER build-arg). |
Future Dockerfile evolution that genuinely needs a lifecycle script in production: re-introduce explicit step (e.g., RUN node ./buildScripts/specific-prod-step.mjs) with COPY of the needed files first. The --ignore-scripts is the production-ergonomic default; explicit scripting is the escape hatch. |
Inline comment in Dockerfile linking to this ticket; possibly cookbook (learn/agentos/DeploymentCookbook.md) Section 3 (Container Setup) reference. |
L3 — Lane C CI (#10899) integration job runs green on the very PR fixing this; documented in PR body with run URL. |
Acceptance Criteria
Out of Scope
- Multi-stage refactor of the Dockerfile — current 2-stage shape (builder → runtime) is fine; only the
npm ci step needs the flag.
- Removing
prepare lifecycle from package.json — script remains valuable for local-dev workflow; only Docker context needs to opt out.
- Audit of other Docker contexts — only
ai/deploy/Dockerfile exists in the repo today; cross-project Docker context audit isn't needed.
- Splitting prepare into prepare:dev + prepare:prod — see Avoided Traps.
Avoided Traps / Gold Standards Rejected
- Rejected: split
prepare into prepare:dev + prepare:prod scripts. Larger surgery; introduces npm-script proliferation and downstream version-management debt. The --ignore-scripts flag is the idiomatic Dockerfile pattern for the same goal.
- Rejected: move
COPY . . before RUN npm ci. Defeats Docker layer caching — every source-file change invalidates the (slow) npm install layer. Only acceptable if cache-bust is desired (it's not here).
- Rejected: explicit
COPY buildScripts/ai/initServerConfigs.mjs ./buildScripts/ai/initServerConfigs.mjs before RUN npm ci. Adds path-coupling fragility (file moves break the Dockerfile silently) for no production-side benefit, since the script itself doesn't need to run in production.
- Rejected: defer to a follow-up PR after Lane C lands with
unit-only matrix. Lane C's whole point is gating PRs on test failures; landing the workflow without ability to gate integration undermines it. The 1-line fix is the elegant unblock.
Related
- Surfacing context: Lane C CI workflow PR #10899 (closes #10897) — first PR to run a clean-environment Docker build.
- Original substrate: #10801 → PR #10880 (Docker artifacts, KB+MC compose, ai/deploy/ namespace).
- Downstream consumer: #10805 → PR #10893 (Lane A integration test using the broken Dockerfile — passed locally via cached layers).
- Sibling lane impact: #10896 (Lane B sustained-liveness PR #10898) is also blocked from L2/L3 verification by this same Dockerfile bug.
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="Dockerfile npm ci prepare lifecycle initServerConfigs Lane C CI substrate fix")
Context
Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first integration-job run. The Dockerfile in
ai/deploy/Dockerfile(shipped via #10880) fails to build in any clean CI environment.The Problem
The Dockerfile's COPY/RUN ordering creates a hard dependency cycle:
WORKDIR /app COPY package*.json ./ # Step 4: only package.json + package-lock.json RUN npm ci --omit=dev # Step 5: triggers `prepare` lifecycle COPY . . # Step 7: brings buildScripts/ but TOO LATEThe
preparescript inpackage.json:57is:npm ciautomatically invokesprepareafter install. At that step in the Dockerfile, onlypackage*.jsonexists in/app—buildScripts/hasn't been COPY'd yet. The script fails with:Error: Cannot find module '/app/buildScripts/ai/initServerConfigs.mjs' at Function._resolveFilename (node:internal/modules/cjs/loader:1383:15) ... npm error code 1 target mc-server: failed to solve: process "/bin/sh -c npm ci --omit=dev" did not complete successfully: exit code: 1Empirical anchor: Lane C CI run 25497549380 — both
kb-serverandmc-serverDocker builds fail at the same step.Why this wasn't caught earlier
#10880 (Docker artifacts) and #10893 (Lane A integration test using these artifacts) both passed local verification because:
docker compose -f ai/deploy/docker-compose.test.yml up --buildafter a prior successful build, theRUN npm cilayer was cached from a state when buildScripts/ had been previously available (e.g., during local dev with full repo)..neo-ai-data/exists — the secondpreparestep (downloadKnowledgeBase) short-circuits via existing-dir guard, so even if the first step were to succeed locally, the failure mode is masked.This is a classic "works on my machine" pattern hidden by Docker layer caching. The fix prevents future regressions and makes the Docker artifacts genuinely reproducible across all environments.
The Architectural Reality
ai/deploy/Dockerfile— 21 lines, multi-stage build (builder → runtime).initServerConfigs.mjs— copies template configs to gitignoredconfig.mjs. Production deployment provides config via env vars + the template defaults; the gitignored copy is for local-dev override only.downloadKnowledgeBase.mjs— downloads release-artifact KB zip into.neo-ai-data/. Production KB MCP server provides KB via mounted Chroma volume + sync APIs, not bundled artifacts.The Fix
Single-line change to
ai/deploy/Dockerfile:5:- RUN npm ci --omit=dev + RUN npm ci --omit=dev --ignore-scripts--ignore-scriptsinstructs npm to skip lifecycle hooks (prepare,postinstall, etc.) during install. Aligns with Docker production semantics — the runtime image carries already-installed deps and has no need to run dev-time bootstrap.Why not "move COPY . . before RUN npm ci"? That alternative works but defeats Docker layer-caching: every source change invalidates the npm-install layer. The
--ignore-scriptsfix preserves layer caching while skipping the dev-only hooks.Why not "split prepare into prepare:dev + prepare:prod"? Larger surgery; introduces npm-script proliferation. The
--ignore-scriptsflag is the standard idiomatic move in Dockerfile contexts.Contract Ledger (T3)
ai/deploy/Dockerfileline 5--ignore-scriptsflag tonpm ci --omit=dev. Bothkb-serverandmc-serverbuilds inherit the fix (same Dockerfile, parameterized viaTARGET_SERVERbuild-arg).RUN node ./buildScripts/specific-prod-step.mjs) with COPY of the needed files first. The--ignore-scriptsis the production-ergonomic default; explicit scripting is the escape hatch.learn/agentos/DeploymentCookbook.md) Section 3 (Container Setup) reference.integrationjob runs green on the very PR fixing this; documented in PR body with run URL.Acceptance Criteria
ai/deploy/Dockerfile:5containsRUN npm ci --omit=dev --ignore-scripts.integrationmatrix row passes after this fix lands (verified by re-running CI).docker compose -f ai/deploy/docker-compose.test.yml build --no-cache kb-server mc-servercompletes successfully on a clean checkout (no cached layers).Out of Scope
npm cistep needs the flag.preparelifecycle frompackage.json— script remains valuable for local-dev workflow; only Docker context needs to opt out.ai/deploy/Dockerfileexists in the repo today; cross-project Docker context audit isn't needed.Avoided Traps / Gold Standards Rejected
prepareintoprepare:dev+prepare:prodscripts. Larger surgery; introduces npm-script proliferation and downstream version-management debt. The--ignore-scriptsflag is the idiomatic Dockerfile pattern for the same goal.COPY . .beforeRUN npm ci. Defeats Docker layer caching — every source-file change invalidates the (slow) npm install layer. Only acceptable if cache-bust is desired (it's not here).COPY buildScripts/ai/initServerConfigs.mjs ./buildScripts/ai/initServerConfigs.mjsbeforeRUN npm ci. Adds path-coupling fragility (file moves break the Dockerfile silently) for no production-side benefit, since the script itself doesn't need to run in production.unit-only matrix. Lane C's whole point is gating PRs on test failures; landing the workflow without ability to gateintegrationundermines it. The 1-line fix is the elegant unblock.Related
Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="Dockerfile npm ci prepare lifecycle initServerConfigs Lane C CI substrate fix")