LearnNewsExamplesServices
Frontmatter
id10964
titleHarden clean-checkout AI deploy bootstrap
stateClosed
labels
bugaitestingagent-task:reviewbuild
assigneesneo-gpt
createdAtMay 8, 2026, 4:35 PM
updatedAtMay 12, 2026, 4:09 AM
githubUrlhttps://github.com/neomjs/neo/issues/10964
authorneo-gpt
commentsCount0
parentIssue10945
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 8, 2026, 6:38 PM

Harden clean-checkout AI deploy bootstrap

Closedbugaitestingagent-task:reviewbuild
neo-gpt
neo-gpt commented on May 8, 2026, 4:35 PM

Context

Created from an external team-shared deployment trial on 2026-05-08. The deployment pipeline is no longer just an internal CI fixture: a clean operator environment tried the ai/deploy stack and hit setup friction that our local shared checkout masks.

Observed surfaces in the current repo:

  • ai/deploy/Dockerfile uses node:22-bookworm-slim for builder and runtime and runs npm ci --omit=dev --ignore-scripts.
  • ai/deploy/docker-compose.test.yml also uses node:22-bookworm-slim for the mock embedding server.
  • package.json currently has zero dependencies; runtime packages used by the MCP servers are in devDependencies (@modelcontextprotocol/sdk, chromadb, commander, dotenv, fs-extra, gray-matter, js-yaml, etc.).
  • The runtime server modules import gitignored config.mjs files directly (ai/mcp/server/*/config.mjs). A clean checkout only has templates.
  • The repo has five tracked templates relevant to the confusion: ai/config.template.mjs plus four per-server ai/mcp/server/*/config.template.mjs files. buildScripts/ai/initServerConfigs.mjs currently clones only the per-server templates to gitignored config.mjs during npm prepare.
  • ai/deploy/.dockerignore exists, but the compose files build with the repo root as context (../..), so Docker will use a root .dockerignore if present. There is no tracked root .dockerignore; local node_modules and gitignored config files can leak into local builds and hide clean-checkout defects.

Node baseline correction, as of 2026-05-08:

  • Node's official release page lists v26 as Current, v24 as LTS, and v22 as LTS: https://nodejs.org/en/about/previous-releases
  • Node's release policy says production applications should use Active LTS or Maintenance LTS releases. The modern production baseline should therefore be v24 LTS, not v26 Current. A Current-line compatibility lane can be separate.

The Problem

The current deploy artifacts can work on maintainer machines for the wrong reasons:

  1. Local npm prepare creates gitignored config.mjs files before Docker is involved.
  2. Local bind mounts in docker-compose.dev.yml and the missing root .dockerignore can expose local config files and node_modules to containers.
  3. Docker layer/cache history can hide whether npm ci --omit=dev --ignore-scripts actually produced a runnable image.

In a clean external checkout, those masks disappear. Operators then have to manually copy template configs, and the Docker image may be missing runtime dependencies because Neo currently keeps MCP runtime packages under devDependencies.

#10902/#10904 fixed the first clean-CI failure by adding --ignore-scripts so npm ci no longer tries to run prepare before buildScripts/ is copied. That was correct for the CI blocker, but it also left the deploy image without any explicit config bootstrap path. The next hardening pass needs to make clean deployment intentional rather than relying on local checkout residue.

The Architectural Reality

Relevant files and boundaries:

  • ai/deploy/Dockerfile owns the deploy image dependency install and runtime base image.
  • ai/deploy/docker-compose.test.yml is the freshest deployed-stack fixture and should remain the reference for CI integration behavior.
  • ai/deploy/docker-compose.yml and ai/deploy/docker-compose.dev.yml are older and drift from the test stack.
  • buildScripts/ai/initServerConfigs.mjs is the local-dev config bootstrap script. It copies per-server config.template.mjs to gitignored config.mjs only when npm prepare runs.
  • ai/mcp/server/*/mcp-server.mjs, loggers, services, and managers import ./config.mjs or ../config.mjs directly. That makes active config files load-bearing at runtime.
  • ai/mcp/server/shared/BaseConfig.mjs already applies environment bindings after merging loaded config. A deploy fix should preserve env precedence.

This ticket belongs under #10945 because it turns the deployment-pipeline integration lane from "CI stack can start on our machines" into "a clean external operator can build and run it without hidden local state."

The Fix

Harden the deployment bootstrap as one clean-checkout contract:

  1. Node image baseline

    • Move deploy images to a modern LTS baseline, preferably node:24-alpine or an explicit v24 Alpine tag.
    • Keep v26/current testing as an optional compatibility lane, not the production default.
  2. Dependency install strategy

    • Either move MCP runtime packages from devDependencies to dependencies and keep npm ci --omit=dev, or remove --omit=dev until the dependency split is done.
    • The chosen path must be validated in a clean Docker build with no local node_modules leakage.
  3. Config bootstrap

    • Remove the requirement that external operators manually copy templates before first boot.
    • Choose one explicit deployment behavior:
      • Docker/entrypoint creates required per-server config.mjs files from templates after source copy, then env bindings override them; or
      • runtime imports are changed to load template defaults directly and layer optional config.mjs when present.
    • Clarify the role of ai/config.template.mjs vs the four per-server templates so operators no longer infer five required manual copies unless five active configs are truly required.
  4. Docker context hygiene

    • Add a root .dockerignore or change build context so local node_modules, .neo-ai-data, .git, and gitignored ai/mcp/server/*/config.mjs cannot mask build defects.
  5. Compose drift cleanup

    • Refresh ai/deploy/docker-compose.yml and ai/deploy/docker-compose.dev.yml against the current test stack, or clearly mark them legacy with follow-up tickets.
    • Keep known Chroma healthcheck constraints visible: the chromadb/chroma:1.5.9 image does not provide arbitrary helper binaries like curl; the test stack's no-curl healthcheck should remain canonical.
  6. Clean-checkout deployment smoke

    • Add or document a smoke path that builds from a clean context, starts KB + MC + Chroma, and verifies health without any pre-existing gitignored config files or local node_modules.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
Deploy Node runtime Node official release policy; this ticket Production deploy image uses v24 LTS Alpine or explicit LTS pin Current-line v26 only in separate compatibility lane Dockerfile comment or deploy doc Image build logs show selected tag
Docker dependency install package.json + MCP runtime imports Clean image contains all runtime packages needed by KB/MC Temporarily install dev deps until runtime deps are split PR body explains chosen strategy Clean docker compose build --no-cache plus container start
Config bootstrap initServerConfigs.mjs, direct config.mjs imports First boot works from tracked templates and env vars without manual copy Optional local config.mjs remains supported for overrides DeploymentCookbook / SharedDeployment note Clean checkout has no config files, stack still starts
Docker context hygiene Docker build context semantics Local ignored files cannot enter deploy image accidentally If context must stay repo root, root .dockerignore carries exclusions Inline .dockerignore comments if needed Build from dirty maintainer checkout does not copy config/node_modules
Deploy compose files ai/deploy/docker-compose*.yml Prod/dev/test compose surfaces agree on current env names and healthcheck constraints Legacy compose files are explicitly labeled and ticketed Deploy docs docker compose config and integration smoke

Acceptance Criteria

  • ai/deploy/Dockerfile uses a modern LTS Node baseline (v24 LTS) for production images, or documents a deliberate LTS pin with rationale.
  • ai/deploy/docker-compose.test.yml mock embedding server no longer hardcodes the older node:22-bookworm-slim baseline without rationale.
  • Docker dependency installation is corrected: either runtime packages move to dependencies and --omit=dev remains valid, or --omit=dev is removed until the split lands.
  • A clean Docker build with no local node_modules succeeds for KB and MC images.
  • A clean first boot does not require manual copying of per-server config.template.mjs files into config.mjs.
  • The root-vs-per-server config template contract is documented, including whether ai/config.template.mjs is active deployment substrate or shared Tier-1 reference only.
  • Docker context hygiene prevents local node_modules, .neo-ai-data, .git, and gitignored ai/mcp/server/*/config.mjs from being copied into deploy images.
  • ai/deploy/docker-compose.yml and ai/deploy/docker-compose.dev.yml are refreshed to match current env/healthcheck behavior or explicitly ticketed as legacy cleanup.
  • PR evidence includes a clean-checkout smoke command and output summary proving KB + MC start and healthcheck without pre-existing gitignored config files.

Out of Scope

  • Real OIDC/GitHub auth integration tests; that remains a separate #10945 sub-lane.
  • Primary/secondary lifecycle integration; that remains a separate #10945 sub-lane linked to #10813.
  • Backup/restore/wipe-detection integration; that remains a separate #10945 sub-lane.
  • Full production infrastructure provisioning for a specific external partner.
  • Making v26 Current the production default. Current-line compatibility is useful, but production should stay on Active/Maintenance LTS unless maintainers explicitly decide otherwise.

Avoided Traps / Gold Standards Rejected

  • Rejected: blame operators for not copying templates. The repo currently makes gitignored config files load-bearing while Docker skips the bootstrap that creates them. Clean operators are exposing our hidden dependency.
  • Rejected: keep --omit=dev because it looks production-grade. With an empty dependencies block, that flag currently removes the packages the MCP servers need at runtime.
  • Rejected: switch production straight to v26 because it is latest. v26 is Current. The Node project says production applications should use LTS releases; v24 is the modern LTS line.
  • Rejected: rely on local Docker cache/bind mounts as evidence. The acceptance criteria require clean-context proof.
  • Rejected: bury stale compose files as "less important." They can be lower priority than Dockerfile/config bootstrap, but leaving them ambiguous will keep confusing external deployers.

Related

  • Parent epic: #10945 — Expand deployment-pipeline integration coverage for Memory Core.
  • Sibling: #10952 — Audit Memory Core integration CI skip gates.
  • Predecessor: #10902 / PR #10904 — Dockerfile prepare-lifecycle bug; fixed by --ignore-scripts but did not solve clean deploy bootstrap.
  • Related deploy CI fixes: #10913 and closed follow-ups in the Chroma healthcheck lineage.
  • Prior shared-deployment completeness epic: #10721.
  • Original deploy artifacts: #10801 / PR #10880.

Origin Session ID: c02fbf4e-870c-44c0-ba7e-e9ffacce094b

Retrieval Hint: query_raw_memories(query="clean checkout AI deploy Dockerfile config bootstrap npm ci omit dev node 24 alpine config.template manual copy")

tobiu closed this issue on May 8, 2026, 6:38 PM
tobiu referenced in commit 9366e14 - "fix(deploy): harden clean checkout bootstrap (#10964) (#10968) on May 8, 2026, 6:38 PM