LearnNewsExamplesServices
Frontmatter
id10822
titleConfig substrate cleanup: KISS hard cuts + three-tier model
stateClosed
labels
epicairefactoringarchitecturemodel-experience
assignees[]
createdAtMay 6, 2026, 6:00 PM
updatedAtMay 11, 2026, 11:33 PM
githubUrlhttps://github.com/neomjs/neo/issues/10822
authorneo-opus-4-7
commentsCount5
parentIssue9999
subIssues
10823 Config Cleanup: Delete legacy env-var aliases (Phase 1)
10825 Audit dead config fields for substrate cleanup
10824 Audit MCP env vars for config substrate cleanup
10826 Codify clean-slate sunset rule in pull-request-workflow §1.1
10827 Create top-level ai/config.template.mjs shared globals (Tier 1 NEW)
10842 Substrate Config Refactor: applyEnv Priority Inversion Fix
10862 Prefix Neo-specific env vars with NEO_ across MC, KB, Tier 1 configs
10944 Remove stale sqlite-vec artifacts from Memory Core healthcheck
10956 Migrate Memory Core summary sweeps to daemon
11011 Retire chromaUnified flag + federated Chroma topology
subIssuesCompleted10
subIssuesTotal10
blockedBy[]
blocking[]
closedAtMay 11, 2026, 11:33 PM

Config substrate cleanup: KISS hard cuts + three-tier model

Closedepicairefactoringarchitecturemodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 6, 2026, 6:00 PM

Authored by @neo-opus-4-7 (Claude Opus 4.7, Claude Code). Session 8b31fd62-6a53-40b5-aae2-c5288f8ced09. Graduated from Discussion #10819 per ideation-sandbox-workflow.md §5.

Context

The Agent OS config substrate has accumulated framework-shaped backwards-compat over the recent merge sequence (#10808 / #10810 / #10814 / #10817), and the original operator-extensibility surface (config.json / custom config.mjs delta-merge) is architecturally bypassed by env-var-first resolvers. Operator surfaced this directly: "so many layers of backwards compatibility which no one will ever use. the entire extensibility got lost. this is NOT elegant."

Cumulative present-state audit (verified via Bash grep at Discussion-time):

Surface Count Empirical present-cost
Resolver functions with multi-layer fallback 5 resolveEmbeddingProvider, resolveMcpHttpPort, resolveChromaHost, resolveChromaPort, resolvePublicUrl
Legacy/deprecated env-var aliases referenced in templates+helpers 41 lines SSE_PORT, NEO_CHROMA_EMBEDDING_PROVIDER, NEO_KB_CHROMA_HOST/PORT, plus deprecated config keys
console.warn deprecation-class calls in helpers 18 Fire on every test boot; visible in current logs
memory-core/config.template.mjs line count 469 Significant fraction is JSDoc explaining backwards-compat rationale
chromaUnified / engines.kb.chroma mirror-block branching ~30 lines in MC + ~12 lines in KB Topology dimension that splits healthcheck, resolver, doc surfaces
Config delta-merge override callsites 2 (MC + KB Server.mjs) Exists but functionally bypassed by env-first resolution

The Problem

Framework-shaped substrate dressed in engine-shaped reality. Per the engine-vs-framework category recognition (feedback_neo_is_engine_not_framework memory anchor), Neo.mjs is application engine kin to Unreal/Unity/Godot — not a framework with thousands of consumers. The realistic operator population is the swarm + selected partners. Framework-class deprecation chains, env-var aliases, and silent-fallback patterns are wrong-category for engine-class deployments.

The substrate-accretion-defense violation. AGENTS.md §13 mandates that every substrate-mutation PR must net-reduce loaded bytes OR cite concrete sunset triggers. Three recent PRs (#10808, #10810, #10814) each added one resolver/legacy-fallback layer; none net-reduced; none cited concrete sunset triggers. The defense exists in writing; no PR was rejected for failing it.

The dev-only-history clarification. Per operator direction relayed via @neo-gpt: the legacy env-var support targeted for deletion (SSE_PORT, NEO_CHROMA_EMBEDDING_PROVIDER, NEO_KB_CHROMA_HOST/PORT, deprecated config keys) only ever existed on the dev branch — never in a released npm package version. Treating them as compatibility-contract surfaces is wrong-shape: there are no released users to protect.

The lost extensibility (calibration 2026-05-06 16:15Z). Env vars retain priority — load-bearing for Playwright unit-testing isolation, sub-agent runtime overrides, operator one-off testing, and container-bind injection. The "lost" piece is specifically that Config.load(filePath) exists in both MC + KB Server.mjs but is currently undocumented as a contract surface for non-env-overridable concerns. The flattened resolver shape (env || configDefault) preserves env-var-first priority while making config.mjs defaults the explicit fallback when env is unset. The template/gitignored split serves a real purpose (preserves config experimentation from leaking into PRs across Neo forks + npx neo-app workspaces + swarm tuning); the goal is to document the secondary extensibility surface, NOT to invert the env-var precedence.

The Architectural Reality

  • ai/services.mjs — SDK Bouncer Pattern (per learn/benefits/ArchitectureOverview.md §The SDK Bouncer Pattern). Loads OpenAPI specs from each MCP server and wraps with Zod validators. Currently imports server-internal config.mjs files — this couples SDK to server-internal substrate (separate substrate concern surfaced 2026-05-06; address in Phase 1.5 sub-issue #6 by migrating shared values to top-level ai/config.template.mjs).
  • ai/mcp/server/{knowledge-base,memory-core}/config.template.mjs + gitignored config.mjs. KB at 248 lines, MC at 469 lines.
  • ai/mcp/server/shared/helpers/DeploymentConfig.mjs — 4 resolver helpers (resolveMcpHttpPort, resolveChromaHost, resolveChromaPort, resolvePublicUrl) introduced via #10808 + #10814. Multi-layer fallback chains.
  • ai/mcp/server/memory-core/helpers/EmbeddingProviderConfig.mjsresolveEmbeddingProvider introduced via #10810. 4-layer fallback.
  • buildScripts/initServerConfigs.mjs + bootstrapWorktree.mjs — first-time-only template→config copy. Doesn't detect template-evolution drift (separately tracked as #10815, folds into Phase 3 sub-issue #13).
  • learn/agentos/MemoryCore.md, SharedDeployment.md, DeploymentCookbook.md — all carry sections explaining chromaUnified toggle, deprecation-window env-vars, federated-vs-unified topology mode.

The Fix

Three principles drive the reshape:

  1. Env vars are the universal override surface (Playwright unit-testing isolation, container-bind injection, secrets, operator one-off overrides). Hard rule: ONE canonical env-var name per concept. No aliases.
  2. No deprecation chains. Renames are hard cuts in one PR. Future env-var rename PRs that introduce deprecation chains get rejected at review.
  3. Document config.mjs delta-merge as the secondary extensibility surface for non-env-driven concerns. Env vars retain priority. Flatten resolvers to single-line env || configDefault (env wins, config.mjs default is the fallback). Config.load(filePath) already exists in both MC + KB Server.mjs but is currently undocumented as a contract surface — this principle restores it as a documented surface without changing the env-var-first precedence (load-bearing for Playwright + sub-agents + operator overrides).

Three-tier config model:

Tier File Purpose
1. Shared globals (NEW) ai/config.template.mjs + ai/config.mjs Cross-MC values: embeddingProvider, vectorDimension, modelProvider, modelName, provider blocks, auth block
2. Per-MC server knobs (slimmed) ai/mcp/server/<name>/config.template.mjs + config.mjs Per-server-only knobs; clones/spreads Tier 1 (Tier 1 is immutable plain-data at import time)
3. .env (slimmed hard via one-name-per-concept) .env Universal override surface — ONE canonical name per concept, no aliases

Drop the federated/non-unified Chroma topology entirely. KB owns Chroma, MC connects as downstream client. Resolves #10015 by virtue of the topology decision.

Restore config.mjs delta-merge as documented primary extensibility for non-env-overridable concerns. Template/gitignored split preserved (load-bearing for forks + workspaces + experimentation).

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
Top-level ai/config.template.mjs shared globals (NEW) This Epic, Discussion #10819 OQ-1 resolution Plain JS module exporting an immutable defaults object containing embeddingProvider, vectorDimension, modelProvider, modelName, ollama/openAiCompatible/auth blocks, neoRootDir. Per-MC configs import + spread, never mutate. If shared module fails to load → MC + KB fail loudly at boot (boot-time validator surfaces missing-required-field) learn/agentos/MemoryCore.md + new top-level config primer L2 unit-test: import shape verification, immutability check
Per-MC config.template.mjs slimming This Epic + sub-issue #7 Each per-MC template imports Tier 1 + adds per-server-only knobs (ports, paths, collection names, server-specific tuning). No shared values duplicated. If per-MC config drifts from template structure → drift detector (sub-issue #13) flags + prints exact migration block per harness Updated per-server JSDoc; cookbook updates L2 unit-test: per-MC config shape validates against tier-split contract
.env keep-list (one-name-per-concept hard rule) Operator + cross-family review 5 substrate-role categories: secrets / runtime-binding / identity-binding / single-writer-process-role / multi-tenant-isolation / operator-one-shot-toggles. ONE canonical name per concept. Boot-time validator (sub-issue #5) errors loudly on removed legacy aliases + missing required replacement fields. Distinguishes "config invalid" from "sandbox boundary symptom". Updated DeploymentCookbook.md env-var inventory + SharedDeployment.md env-var ergonomics section Audit table (sub-issue #1) classifies every env-var into a category
Resolver flattening (legacyEnvVar parameter removal across DeploymentConfig.mjs + EmbeddingProviderConfig.mjs) This Epic + sub-issue #11 Each resolver collapses to single-line `env configDefaultshape. No multi-layer fallback chains. Noconsole.warn` deprecation calls. None — hard cut. Boot-time validator surfaces operator-misconfig.
Federated Chroma topology drop This Epic + sub-issues #9 + #10 (Resolves #10015) Drop chromaUnified flag, engines.kb.chroma mirror block, coordinates.resolvedVia branching in HealthService. Single unified Chroma per topology, KB-owned. Operator-side data migration prerequisite: backup-first (npm run ai:backup) + post-migration healthcheck evidence as merge-gate AC. One-shot script deleted in same Epic close-out after 3-harness ack. DeploymentCookbook.md + SharedDeployment.md topology sections rewritten Backup evidence + post-migration healthcheck attached to PR; 3-harness ack before deletion
config.mjs delta-merge restoration This Epic + sub-issue #8 Config.load(filePath) becomes documented primary extensibility surface. Operator drops config.json or custom config.mjs for non-env-overridable concerns. Template/gitignored split preserved (forks + workspaces + experimentation use case); doctor (sub-issue #13) drift-detects without dropping the split. DeploymentCookbook.md lead-with-config-edit (env vars secondary, only for secrets/container-bind/test-isolation) L2 unit-test: Config.load delta-merge shape verification

Acceptance Criteria

Phase 1 — clean cut, no operator-data dependency

  • AC1: Audit + classify every env-var read across ai/mcp/server/** into 4-column markdown table (env var | current readers | target tier | deletion/keep rationale)
  • AC2: All legacy env-var aliases deleted in one PR. SSE_PORT, NEO_CHROMA_EMBEDDING_PROVIDER, NEO_KB_CHROMA_HOST/PORT, plus deprecated config keys (modelProvider, neoEmbeddingProvider, chromaEmbeddingProvider)
  • AC3: pull-request-workflow.md §1.1 codifies clean-slate sunset rule — future env-var rename PRs that introduce deprecation chains get rejected at review
  • AC4: Dead config fields audited + removed (KB embeddingModel SearchService.mjs:63 dead instantiation; stale ai:migrate-memory package.json entry; etc.)
  • AC5: Boot-time validator errors loudly on removed aliases + missing required replacement fields. Distinguishes "config invalid" from "sandbox boundary symptom"

Phase 1.5 — three-tier substrate

  • AC6: Top-level ai/config.template.mjs created with shared globals. Tier 1 immutable plain-data at import time
  • AC7: Per-MC config templates slimmed to per-server-only knobs; clone/spread shared globals from Tier 1
  • AC8: config.mjs delta-merge documented in DeploymentCookbook.md as secondary extensibility surface for non-env-overridable concerns. Env vars retain priority (Playwright + sub-agents + operator overrides). Documentation MUST explicitly state the precedence (env first, config.mjs second) to prevent misframing.
  • AC9 (substrate hygiene): ai/services.mjs SDK Bouncer Pattern decoupled from server-internal config.mjs imports — references Tier 1 instead

Phase 2 — non-unified drop, gated on operator data migration

  • AC10: Operator data migration: federated → unified Chroma. Backup-first via npm run ai:backup. Backup evidence + post-migration healthcheck evidence attached to PR as merge-gate
  • AC11: chromaUnified flag + engines.kb.chroma mirror-block dropped. Closes #10015
  • AC12: Resolvers flattened — legacyEnvVar parameters dropped; single-line env || configDefault shape across all 5 resolvers
  • AC13: Topology-mode branching dropped in HealthService + coordinates.resolvedVia collapsed
  • AC14 (sequencing constraint): .env dependencies NOT removed before new config.mjs resolution is fully active. Boot-critical flags remain readable until per-MC slim configs land

Phase 3 — parallelizable with Phase 2

  • AC15: Canonical-clone-aware config doctor + drift detector ships. Walks git rev-parse --git-common-dir from worktree to find canonical root. Prints exact migration block per harness (Claude worktrees + Gemini clone + Codex .codex/config.template.toml + canonical checkout). Output distinguishes "config invalid" from "sandbox boundary symptom". Folds in #10815 substrate

Cross-Phase

  • AC16: Harness migration checklist documented covering Claude worktrees + Gemini clone + Codex .codex/config.template.toml + canonical checkout
  • AC17: One-shot migration script (buildScripts/ai/migrateFederatedToUnified.mjs) deleted in same Epic close-out after all three harness families acknowledge migrated setup
  • AC18 (post-merge, operator-side): Restart all clones; healthcheck verifies new substrate shape across Claude + Gemini + Codex

Rough scope: ~4-5 days of focused work, much of it deletion. Net-reduces ~400-600 lines once executed.

Out of Scope

  • Migrating Chroma to SQLite-vec / native SQLite vector storage. Empirically validated dead-end per Memory Core sessions 72141e68 (2026-04-12) + 46f8f6d0 (2026-04-08) + 7e216b50 (2026-04-04). sqlite-vec brute-force O(N) scan, no HNSW/skip-list/IVF — unworkable at our scale. Chroma stays for vectors; better-sqlite stays for graph (Native Edge Graph layer).
  • Multi-tenant identity & data privacy substrate. Tracked under #10016; orthogonal sub-epic under #9999.
  • Partner-trial deployment ergonomics. Tracked under #10691; orthogonal sub-epic under #9999.
  • External MCP framework standards alignment. Engine-category-specific design.
  • Released-version v12.x compat preservation. Legacy vars never shipped in a released version; nothing to preserve.

Avoided Traps / Paths Not Taken

  • Framework-class deprecation chains for users who don't exist. Multi-window deprecation patterns are correct for libraries with thousands of released-version users; engine-class with ~4 deployments AND legacy vars that only existed on dev (never released) doesn't need them.
  • Source-controlling local config.mjs (initially considered as OQ-5). Wrong-shape per cross-family review + operator clarification: the gitignored split preserves config experimentation from leaking into PRs across Neo forks + npx neo-app workspaces + swarm tuning. Local config legitimately contains machine-local paths, trust overrides, env-specific MCP settings, operator-private model configs. Phase 3 sub-issue #13 builds canonical-clone-aware doctor instead of dropping the split.
  • Permanent migration script accumulation (migrate-v1-to-v2.mjs, migrate-v2-to-v3.mjs...). One-shot delete-on-completion is the discipline.
  • DSL for config policy (referenced from sibling Epic #10291 P6b discussion). Structured JSON/JS at boot is the correct choice over OPA/Rego-style DSL. Keep config legible.
  • Treating dev-branch-only legacy vars as released-version compat contracts. Operator clarification: no released users to protect. KISS-aggressive deletion is the correct path.
  • Filing all 13 sub-issues upfront. Per operator's quality-over-velocity signal + KISS: sub-issues materialize as Phase 1/1.5/2/3 work activates. Avoids issue spam + accommodates substrate refinement during implementation.

Related

  • Parent: #9999 Cloud-Native Knowledge & Multi-Tenant Memory Core (sub-epic alongside #10013-closed, #10014-closed, #10015-open, #10016-open, #10691-open)
  • Resolves (when complete): #10015 Dynamic Topology — Unified vs. Federated Routing (Epic-labeled — Related keyword used per pr-review-guide.md §5.2 Epic Close-Target Ban; close-completed will be triggered manually via epic-resolution skill once this Epic ships)
  • Folds in: #10815 initServerConfigs --migrate-config drift detection (becomes Phase 3 sub-issue #13)
  • Originating Discussion: #10819 Config Substrate Cleanup (graduated)
  • Triggering PRs (substrate-accretion violations): #10808, #10810, #10814, #10817
  • v13 release vehicle: this Epic + #10016 + #10691 close-completes #9999, which triggers v13. Operator framing: breaking changes via v13, not v12.x minor versions.

Origin Session ID

8b31fd62-6a53-40b5-aae2-c5288f8ced09

Handoff Retrieval Hints

  • query_raw_memories(query="config substrate cleanup KISS three-tier model engine-class") — surfaces this session's Discussion #10819 graduation context + cross-family review absorption + operator's KISS/v13/dev-only-history framing inputs
  • query_raw_memories(query="SQLite vec-1 chroma replacement experiment dead end hierarchical NSW") — surfaces the empirical anchors for the Avoided Traps SQLite-vec dead-end (sessions 72141e68 / 46f8f6d0 / 7e216b50)
  • query_raw_memories(query="resolveEmbeddingProvider legacy env var deprecation chain framework-shaped") — surfaces the substrate-accretion-violation framing
  • Discussion archaeological source: #10819 — closed at graduation; remains as substrate reasoning trail
  • Cross-family review anchors: GPT comment DC_kwDODSospM4BAMdu on #10819; Gemini A2A 2026-05-06 14:22Z; operator green-light 2026-05-06 15:15Z
  • Empirical-anchor commit range: 5082575b7 (#10816 merge, dev tip at Discussion-time) baseline; substrate audit Bash-command outputs documented in Discussion §1 audit table
  • v13 trigger context: #9999 closure path = this Epic + #10016 + #10691; reach v13 when all three close

This Epic graduates Discussion #10819. The 13 sub-issues outlined in the Discussion's Phase 1 / 1.5 / 2 / 3 structure will materialize incrementally as work activates per phase (KISS — no upfront issue spam). Sub-issues will be filed by whichever swarm member picks up each piece, parent-linked back to this Epic via update_issue_relationship.

Co-authored-by: Claude Opus 4.7 neo-opus-4-7@neomjs.com

tobiu referenced in commit 7af53f8 - "feat(skills): codify clean-slate sunset rule for env-var renames (#10826) (#10828) on May 6, 2026, 7:19 PM