LearnNewsExamplesServices
Frontmatter
id10815
titleinitServerConfigs: detect template drift, refresh stale config.mjs
stateClosed
labels
enhancementaiarchitecturebuildmodel-experience
assigneesneo-opus-4-7
createdAtMay 6, 2026, 1:40 PM
updatedAtMay 7, 2026, 2:19 PM
githubUrlhttps://github.com/neomjs/neo/issues/10815
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 2:19 PM

initServerConfigs: detect template drift, refresh stale config.mjs

Closedenhancementaiarchitecturebuildmodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 6, 2026, 1:40 PM

Context

Each time ai/mcp/server/{knowledge-base,memory-core}/config.template.mjs evolves structurally (new imports, new exports, renamed/removed fields), every Swarm peer must manually re-sync their gitignored config.mjs. The MCP servers boot from config.mjs (deep-merged onto defaultConfig); when the gitignored file's structure lags the template, calls into resolver-backed fields (resolveMcpHttpPort, resolveChromaHost, resolveEmbeddingProvider, …) miss entirely or fall through to legacy code paths, producing silent functional drift between operator clones.

Empirical anchors gathered today (2026-05-06) across PR #10812 (env-var ergonomics) → PR #10814 (publicUrl + reverse-proxy resolver):

  • Claude clone (@neo-opus-4-7): KB config.mjs was pre-#10812 shape; MC was post-#10810 but pre-#10812; both required manual cp template config sweep.
  • Gemini clone (@neo-gemini-3-1-pro): KB+MC both significantly behind (still using ssePort, missing resolveChromaHost); manual cp + node --check cycle.
  • GPT clone (@neo-gpt): KB+MC also pre-#10812; full re-migration with import/export/resolver wiring update.

Three peers, two PRs of substrate-evolution, one session — and the bootstrap script (buildScripts/ai/initServerConfigs.mjs) silently no-op'd on every clone because each config.mjs already existed. Operator subsequently had to broadcast a migration recipe twice in the same session ("BEFORE we merged both PRs… all 3 of you need to update the repo config files once more"). This is the load-bearing definition of friction-into-gold per AGENTS.md §13.

The Problem

buildScripts/ai/initServerConfigs.mjs (52 lines, the prepare-hook bootstrap) is a strict first-time-only clone:

if (fs.existsSync(templatePath)) {
    if (!fs.existsSync(activePath)) {
        await fs.copy(templatePath, activePath);  // existing config.mjs → no-op
    }
}

It has no awareness of template-vs-config structural drift after the initial bootstrap. The agent or operator only discovers staleness when the MCP server boots with missing fields and either:

  • crashes on aiConfig.publicUrl is undefined interpolation (recent #10814 anchor),
  • falls back to legacy env-var paths emitting deprecation warnings (#10808 anchor), or
  • silently exposes a stale runtime shape (the #10779 "no way to observe deployed state" friction).

This is exactly the substrate friction Sister-Sub-Epic of MX (Model Experience): manual repetitive migration whenever the template substrate evolves, with no detection/warning layer between template-author and config-consumer.

The Architectural Reality

  • buildScripts/ai/initServerConfigs.mjs runs at npm prepare (per package.json "prepare" hook), at clone time and at every npm install.
  • config.mjs is gitignored (per .gitignore entry); config.template.mjs is canonical and version-controlled.
  • Templates carry structural evolution (import statements, export re-exports, new defaultConfig keys, helper-resolver wiring) and semantic evolution (default value changes, env-var renames). Structural drift is detectable at the lexical layer; semantic drift requires deep-merge-with-policy.
  • Today's three migrations touched only structural drift: imports of DeploymentConfig.mjs, top-level re-exports, and resolver-backed field shapes (mcpHttpPort: resolveMcpHttpPort({...})). All three peers' resolution was identical: full template overwrite, since no peer had load-bearing operator overrides in their gitignored config.mjs.
  • Healthcheck observability ticket #10779 ("features.dream block") covers the read-side: exposing the actual deployed autoDream/autoGoldenPath values via runtime probe. This ticket covers the write-side: keeping config.mjs in sync with template structure at bootstrap. Complementary, not overlapping.

The Fix

Extend buildScripts/ai/initServerConfigs.mjs with a structural-drift detector + opt-in migration flag.

Detection (always on, warn-only): when config.mjs already exists, read both files' top-level lexical shape (imports, exports, top-level defaultConfig keys at depth 1) via cheap regex projection. If the template carries imports/exports/keys absent from config.mjs, emit a per-server warning to stderr listing the drift items.

Migration (opt-in via --migrate-config argv flag): when the flag is set, overwrite stale config.mjs files with the current template (matches all three peers' manual workflow today). Idempotent; safe on already-current files.

Concrete prescription, anchored to the current 52-line file:

// buildScripts/ai/initServerConfigs.mjs (post-fix, structural projection helper)

const projectShape = async (filePath) => {
    const src = await fs.readFile(filePath, 'utf-8');
    return {
        imports: [...src.matchAll(/^import\s+.*?from\s+['"]([^'"]+)['"]/gm)].map(m => m[1]).sort(),
        exports: [...src.matchAll(/^export\s+\{([^}]+)\}/gm)]
            .flatMap(m => m[1].split(',').map(s => s.trim())).sort()
    };
};

const detectDrift = (templateShape, configShape) => {
    const missingImports = templateShape.imports.filter(i => !configShape.imports.includes(i));
    const missingExports = templateShape.exports.filter(e => !configShape.exports.includes(e));
    return {missingImports, missingExports, hasDrift: missingImports.length + missingExports.length > 0};
};

// inside initConfigs() loop, replace the existing if-not-exists block:
if (!fs.existsSync(activePath)) {
    console.log(`[Neo AI] Config missing for '${serverName}'. Cloning from template...`);
    await fs.copy(templatePath, activePath);
} else {
    const drift = detectDrift(await projectShape(templatePath), await projectShape(activePath));
    if (drift.hasDrift) {
        if (process.argv.includes('--migrate-config')) {
            console.log(`[Neo AI] Migrating stale config for '${serverName}' (drift detected, --migrate-config set)...`);
            await fs.copy(templatePath, activePath);
        } else {
            console.warn(`[Neo AI] Stale config.mjs for '${serverName}' — template has evolved:`);
            drift.missingImports.forEach(i => console.warn(`  + import: ${i}`));
            drift.missingExports.forEach(e => console.warn(`  + export: ${e}`));
            console.warn(`  Run \`npm run prepare -- --migrate-config\` to refresh (gitignored; safe).`);
        }
    }
}

Argv plumbing: npm run prepare -- --migrate-config reaches the script via process.argv. No new package.json script needed; existing prepare hook already invokes node ./buildScripts/ai/initServerConfigs.mjs.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
npm prepare (default) buildScripts/ai/initServerConfigs.mjs Detect structural drift between config.template.mjs and existing config.mjs; emit per-server warning listing missing imports/exports; do NOT overwrite If config.mjs is missing entirely → clone (preserves current behavior); if no drift → silent no-op (preserves current quiet path) learn/agentos/DeploymentCookbook.md §1 (Bootstrap) — add note about drift detection Today's 3-peer manual-migration anchor (this session)
npm run prepare -- --migrate-config Same script, argv flag Force-overwrite stale config.mjs files with current template; idempotent If template missing → skip (matches current behavior); if argv flag absent → warn-only path Same; document flag inline Three peers' workflow today: full cp template config
Stderr warning format Same script One header line per drifting server + indented + import: / + export: lines + final prompt to use the flag If both files identical → no output Inline log format; no external doc surface Stderr-warning style matches existing console.warn calls in the repo

Acceptance Criteria

  • buildScripts/ai/initServerConfigs.mjs detects structural drift (top-level imports + named exports) between config.template.mjs and existing config.mjs for every server under ai/mcp/server/*.
  • Default invocation (npm run prepare) emits a per-server stderr warning listing each missing import/export when drift is detected; does NOT overwrite.
  • --migrate-config argv flag (e.g., npm run prepare -- --migrate-config) overwrites stale config.mjs files with the current template; idempotent across repeated runs.
  • On clones where config.mjs is identical to the template, the script is silent (no false-positive warnings).
  • When config.mjs does not exist, behavior is unchanged from today (clone from template).
  • When the template does not exist, behavior is unchanged from today (skip server with existing console.warn).
  • Unit test under test/playwright/unit/buildScripts/initServerConfigs.spec.mjs covers the four states: (1) missing config → cloned, (2) drifting config + no flag → warned, not overwritten, (3) drifting config + flag → overwritten, (4) current config → silent.
  • Cookbook (learn/agentos/DeploymentCookbook.md) §1 references the drift-detection behavior with one line ("Re-run npm run prepare after pulling template changes; pass -- --migrate-config to refresh stale gitignored configs").
  • Operator-facing message in stderr is unambiguous: server name + drift list + exact recovery command.
  • Post-merge: each peer can verify by running npm run prepare against an intentionally stale config.mjs and confirming the warning shape; this is informally validated rather than gated on CI.

Out of Scope

  • Value-level (semantic) drift detection: e.g., defaultConfig.batchDelay: 500 vs template's 10000. Different concern; requires per-deployment policy decisions about whether operator overrides are intentional. Structural-only drift is the conservative MVP and matches all three of today's empirical anchors.
  • Deep-merge migration with operator overrides preserved: a future-MX surface that requires conflict-resolution UI; outside the bootstrap-script substrate. If it ships, it ships as a separate ticket.
  • Healthcheck-side runtime exposure of stale config: covered by #10779 (DreamMode features-block observability). Read-side; this ticket is write-side.
  • Cross-server schema canonicalization (e.g., enforcing both KB+MC use the same auth block shape): a different normative concern; out of bootstrap scope.

Avoided Traps / Gold Standards Rejected

  • "Just always overwrite at npm prepare" — Rejected. Destroys operator overrides silently; turns gitignored config from a delta-store into a write-only mirror. The --migrate-config flag explicitly scopes the destructive path.
  • Detect via line-count or file-checksum — Rejected. Fragile to whitespace/comment changes that aren't structural drift. Imports + exports are the substrate boundary; everything else is value-level concern.
  • Use AST library (recast / @babel/parser) — Rejected. Adds a dependency for a 52-line bootstrap script; shape projection at the regex layer (top-level import / export lines) is sufficient for the failure modes we have empirically observed. Re-evaluate only if structural drift starts hiding inside conditional/dynamic imports, which has not occurred in any of today's three anchors.
  • Embed shape detection inside Config.load() at runtime — Rejected per service-boundary discipline. Bootstrap-time detection is the right substrate; runtime would shift the friction window from "before the server starts" to "while the server is starting," which is strictly worse for operator UX.

Related

  • Refs PR #10812 — env-var ergonomics; first of two same-session template-evolution anchors that drove this ticket.
  • Refs PR #10814 — publicUrl resolver; second template-evolution anchor (added resolvePublicUrl import + publicUrl field).
  • Complements #10779 — healthcheck observability for stale autoDream/autoGoldenPath (read-side; this ticket is write-side).
  • Adjacent to #10103 — SDK-layer config file (service-boundary discipline for keys; orthogonal but same substrate file).
  • Substrate context: Epic #9999 Cloud-Native Memory Core sub-epic #10015 — operator deployment ergonomics.

Origin Session ID

8b31fd62-6a53-40b5-aae2-c5288f8ced09

Handoff Retrieval Hints

  • query_raw_memories(query="bootstrapWorktree migrate-config staleness detection initServerConfigs") — surfaces the same-session migration anchors (Claude/Gemini/GPT all manually migrated 2026-05-06).
  • query_raw_memories(query="config.template.mjs evolution peer migration broadcast") — surfaces operator's two same-session migration broadcasts post-#10812 and post-#10814.
  • Git commit-range anchor: 1b07e561d..HEAD covers the #10812 + #10814 substrate-evolution window that drove the friction.
  • Commit SHA 491613bdc76 (Gemini's #10814 cycle-3 conflict-marker fix) is the in-session evidence that template evolution, peer migration, and post-merge cleanup compound; the cost grows with each evolution if not automated.

Filed via ticket-create skill per AGENTS.md §13 (Self-Evolving Systems / friction-into-gold). Three same-session empirical anchors qualify per the substrate-investment evidence threshold.

Co-authored-by: Claude Opus 4.7 neo-opus-4-7@neomjs.com

tobiu referenced in commit 8f7300b - "feat(ai/buildScripts): template drift detection in initServerConfigs (#10815) (#10892) on May 7, 2026, 2:19 PM
tobiu closed this issue on May 7, 2026, 2:19 PM