LearnNewsExamplesServices
Frontmatter
id10902
titleDockerfile npm ci fails: prepare lifecycle needs buildScripts before COPY
stateClosed
labels
bugaiarchitecturebuild
assigneesneo-opus-4-7
createdAtMay 7, 2026, 3:22 PM
updatedAtMay 9, 2026, 11:16 PM
githubUrlhttps://github.com/neomjs/neo/issues/10902
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 3:43 PM

Dockerfile npm ci fails: prepare lifecycle needs buildScripts before COPY

Closedbugaiarchitecturebuild
neo-opus-4-7
neo-opus-4-7 commented on May 7, 2026, 3:22 PM

Context

Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first integration-job run. The Dockerfile in ai/deploy/Dockerfile (shipped via #10880) fails to build in any clean CI environment.

The Problem

The Dockerfile's COPY/RUN ordering creates a hard dependency cycle:

WORKDIR /app
COPY package*.json ./       # Step 4: only package.json + package-lock.json
RUN npm ci --omit=dev       # Step 5: triggers `prepare` lifecycle
COPY . .                    # Step 7: brings buildScripts/ but TOO LATE

The prepare script in package.json:57 is:

"prepare": "node ./buildScripts/ai/initServerConfigs.mjs && node ./buildScripts/ai/downloadKnowledgeBase.mjs"

npm ci automatically invokes prepare after install. At that step in the Dockerfile, only package*.json exists in /appbuildScripts/ hasn't been COPY'd yet. The script fails with:

Error: Cannot find module '/app/buildScripts/ai/initServerConfigs.mjs'
    at Function._resolveFilename (node:internal/modules/cjs/loader:1383:15)
    ...
npm error code 1
target mc-server: failed to solve: process "/bin/sh -c npm ci --omit=dev" did not complete successfully: exit code: 1

Empirical anchor: Lane C CI run 25497549380 — both kb-server and mc-server Docker builds fail at the same step.

Why this wasn't caught earlier

#10880 (Docker artifacts) and #10893 (Lane A integration test using these artifacts) both passed local verification because:

  1. Local Docker has cached layers — when the developer ran docker compose -f ai/deploy/docker-compose.test.yml up --build after a prior successful build, the RUN npm ci layer was cached from a state when buildScripts/ had been previously available (e.g., during local dev with full repo).
  2. Local .neo-ai-data/ exists — the second prepare step (downloadKnowledgeBase) short-circuits via existing-dir guard, so even if the first step were to succeed locally, the failure mode is masked.
  3. No CI gate existed — until #10899, no automated environment was building from a truly clean state. Local builds from scratch would have surfaced this immediately.

This is a classic "works on my machine" pattern hidden by Docker layer caching. The fix prevents future regressions and makes the Docker artifacts genuinely reproducible across all environments.

The Architectural Reality

  • ai/deploy/Dockerfile — 21 lines, multi-stage build (builder → runtime).
  • The Docker images are for production deployment of MCP servers (KB / MC). They do NOT need:
    • initServerConfigs.mjs — copies template configs to gitignored config.mjs. Production deployment provides config via env vars + the template defaults; the gitignored copy is for local-dev override only.
    • downloadKnowledgeBase.mjs — downloads release-artifact KB zip into .neo-ai-data/. Production KB MCP server provides KB via mounted Chroma volume + sync APIs, not bundled artifacts.
  • Both prepare-lifecycle scripts exist for local-dev convenience, not production deployment. They should not run inside the Docker build.

The Fix

Single-line change to ai/deploy/Dockerfile:5:

- RUN npm ci --omit=dev
+ RUN npm ci --omit=dev --ignore-scripts

--ignore-scripts instructs npm to skip lifecycle hooks (prepare, postinstall, etc.) during install. Aligns with Docker production semantics — the runtime image carries already-installed deps and has no need to run dev-time bootstrap.

Why not "move COPY . . before RUN npm ci"? That alternative works but defeats Docker layer-caching: every source change invalidates the npm-install layer. The --ignore-scripts fix preserves layer caching while skipping the dev-only hooks.

Why not "split prepare into prepare:dev + prepare:prod"? Larger surgery; introduces npm-script proliferation. The --ignore-scripts flag is the standard idiomatic move in Dockerfile contexts.

Contract Ledger (T3)

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
ai/deploy/Dockerfile line 5 This ticket; surfaced by #10897 Lane C CI run Add --ignore-scripts flag to npm ci --omit=dev. Both kb-server and mc-server builds inherit the fix (same Dockerfile, parameterized via TARGET_SERVER build-arg). Future Dockerfile evolution that genuinely needs a lifecycle script in production: re-introduce explicit step (e.g., RUN node ./buildScripts/specific-prod-step.mjs) with COPY of the needed files first. The --ignore-scripts is the production-ergonomic default; explicit scripting is the escape hatch. Inline comment in Dockerfile linking to this ticket; possibly cookbook (learn/agentos/DeploymentCookbook.md) Section 3 (Container Setup) reference. L3 — Lane C CI (#10899) integration job runs green on the very PR fixing this; documented in PR body with run URL.

Acceptance Criteria

  • ai/deploy/Dockerfile:5 contains RUN npm ci --omit=dev --ignore-scripts.
  • Inline Dockerfile comment cross-links this ticket as the rationale.
  • Lane C CI (#10899) integration matrix row passes after this fix lands (verified by re-running CI).
  • Local Docker build smoke verification: docker compose -f ai/deploy/docker-compose.test.yml build --no-cache kb-server mc-server completes successfully on a clean checkout (no cached layers).

Out of Scope

  • Multi-stage refactor of the Dockerfile — current 2-stage shape (builder → runtime) is fine; only the npm ci step needs the flag.
  • Removing prepare lifecycle from package.json — script remains valuable for local-dev workflow; only Docker context needs to opt out.
  • Audit of other Docker contexts — only ai/deploy/Dockerfile exists in the repo today; cross-project Docker context audit isn't needed.
  • Splitting prepare into prepare:dev + prepare:prod — see Avoided Traps.

Avoided Traps / Gold Standards Rejected

  • Rejected: split prepare into prepare:dev + prepare:prod scripts. Larger surgery; introduces npm-script proliferation and downstream version-management debt. The --ignore-scripts flag is the idiomatic Dockerfile pattern for the same goal.
  • Rejected: move COPY . . before RUN npm ci. Defeats Docker layer caching — every source-file change invalidates the (slow) npm install layer. Only acceptable if cache-bust is desired (it's not here).
  • Rejected: explicit COPY buildScripts/ai/initServerConfigs.mjs ./buildScripts/ai/initServerConfigs.mjs before RUN npm ci. Adds path-coupling fragility (file moves break the Dockerfile silently) for no production-side benefit, since the script itself doesn't need to run in production.
  • Rejected: defer to a follow-up PR after Lane C lands with unit-only matrix. Lane C's whole point is gating PRs on test failures; landing the workflow without ability to gate integration undermines it. The 1-line fix is the elegant unblock.

Related

  • Surfacing context: Lane C CI workflow PR #10899 (closes #10897) — first PR to run a clean-environment Docker build.
  • Original substrate: #10801 → PR #10880 (Docker artifacts, KB+MC compose, ai/deploy/ namespace).
  • Downstream consumer: #10805 → PR #10893 (Lane A integration test using the broken Dockerfile — passed locally via cached layers).
  • Sibling lane impact: #10896 (Lane B sustained-liveness PR #10898) is also blocked from L2/L3 verification by this same Dockerfile bug.

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="Dockerfile npm ci prepare lifecycle initServerConfigs Lane C CI substrate fix")

tobiu referenced in commit 2721678 - "fix(deploy): add --ignore-scripts to npm ci in Dockerfile (#10902) (#10904) on May 7, 2026, 3:43 PM
tobiu closed this issue on May 7, 2026, 3:43 PM
tobiu referenced in commit 3f94362 - "fix(deploy): use python urllib for Chroma healthcheck (curl missing) (#10908) (#10909) on May 7, 2026, 5:59 PM