LearnNewsExamplesServices
Frontmatter
id10800
titleAuthor cloud deployment cookbook for shared KB+MC topology
stateClosed
labels
documentationenhancementaiarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 6, 2026, 10:29 AM
updatedAtMay 6, 2026, 11:32 AM
githubUrlhttps://github.com/neomjs/neo/issues/10800
authorneo-opus-4-7
commentsCount0
parentIssue9999
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 6, 2026, 11:32 AM

Author cloud deployment cookbook for shared KB+MC topology

Closeddocumentationenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 6, 2026, 10:29 AM

Context

#10721 (Shared deployment MVP completeness gaps) closes 9/10 → 10/10 with the three PRs landing today (#10797 / #10798 / #10799). #10691 shipped the substrate primitives. learn/agentos/SharedDeployment.md exists as the operator-facing reference doc — it documents what each piece does, how to verify topology, threat models, healthcheck shapes.

What's missing is a walkthrough cookbook — a concrete, ordered, end-to-end deployment guide for an external operator standing up KB+MC against a shared cloud-hosted Chroma + reverse proxy + identity-aware deployment. Reference docs answer "what is this?"; cookbooks answer "how do I deploy this from zero?".

This is filed at the operator's directive to (a) provide a starting point for external onboarding, and (b) deliberately surface gaps between substrate-shipped state and external-deployable state. The act of writing the cookbook IS the gap audit.

The Problem

Substrate-complete vs deploy-walkable are different states. Today an external operator cloning dev faces these unresolved questions with no canonical answer in repo:

  1. Container packaging. One image with both server processes (KB + MC), or two images (one per server) deployed as sidecars? Each shape has trade-offs (process supervision vs cloud-native isolation). No documented guidance.
  2. Server-knows-its-URL. OAuth redirect_uri registration, public-endpoint discovery, SSE callback URLs — does each server need to know its public-facing URL at boot? Currently the SSE port is configurable (SSE_PORT) but the canonical public URL is not surfaced to the server itself.
  3. Cross-server tool calls. KB and MC are independent processes that share Chroma but otherwise don't talk to each other directly. Is that the intended deployment model long-term, or is there an evolved shape that needs cross-server RPC?
  4. Two-MCP-server URL strategy. Operator must expose two distinct HTTPS endpoints (one per MCP server). Reverse-proxy routing pattern, port allocation, hostname-vs-pathname routing — no canonical recipe.
  5. Two embedding providers. ai/mcp/server/memory-core/config.template.mjs exposes both chromaEmbeddingProvider (read by KB + MC) and neoEmbeddingProvider (read in exactly one callsiteai/daemons/services/GoldenPathSynthesizer.mjs:81). The "two providers" model surfaced explicitly in #10773 via aligned: true/false. The cookbook-writing pass should challenge whether two-provider is justified or residual config-history; consolidation may become a follow-up ticket.
  6. Config-surface inventory. Both memory-core/config.template.mjs and knowledge-base/config.template.mjs carry many env vars (AUTO_*, AUTH_*, NEO_*). No consolidated env-var matrix exists for deployment-time provisioning.
  7. Integration testing infrastructure. Today our test surface is unit-tested silos (HealthService.spec.mjs, Auth.spec.mjs, etc.). We have no staged-stack integration test exercising KB+MC+Chroma+reverse-proxy+OAuth as a deployed unit. The cookbook should articulate what testing-against-deployed-shape would look like — likely surfaces a separate infrastructure ticket.

The cookbook's output is a deployment artifact for external operators. The cookbook's byproduct is each gap above becoming a discrete, actionable follow-up ticket.

The Architectural Reality

  • Existing reference doc: learn/agentos/SharedDeployment.md — covers topology modes, healthcheck verification, authentication threat model, async session summarization, migration path. Reference-format, not walkthrough-format.
  • Sibling reference docs to compose: MemoryCore.md (canonical healthcheck contract), KnowledgeBase.md, MemoryCoreMcpAuth.md (OIDC + stdio dual-path).
  • Config templates: ai/mcp/server/memory-core/config.template.mjs, ai/mcp/server/knowledge-base/config.template.mjs — env-var inventory source.
  • learn/tree.json: the canonical taxonomy file. New cookbook entry slots under AgentOS parent (sibling to existing agentos/SharedDeployment entry).
  • Currently no Docker artifacts in repo (docker-compose* and Dockerfile* absent). Cookbook may reference example containerization, but the repo intentionally ships zero infra-as-code today.

The Fix

Author a new guide at learn/agentos/DeploymentCookbook.md registered in learn/tree.json under AgentOS parent, with the following walkthrough structure:

Section 1 — Prerequisites & Architecture Picture

  • Topology diagram: agent harnesses → reverse proxy → KB MCP server + MC MCP server → shared Chroma + per-server SQLite.
  • Identity flow: external IdP → reverse proxy → MCP server (OIDC verify or proxy-header-trusted).
  • What ships in repo vs what the operator provisions (Chroma instance, OAuth provider, reverse proxy, container runtime).

Section 2 — Container Packaging

  • One-image-two-processes vs two-images sidecar trade-offs.
  • Recommended MVP shape (likely: two images, two containers, one shared Chroma — but the cookbook-writing pass should verify by walking through the reasoning).
  • Worked example Dockerfile skeleton(s) + docker-compose.yml reference if appropriate (deferred to a follow-up infrastructure ticket if the cookbook concludes infra-as-code belongs there).

Section 3 — Reverse Proxy Configuration

  • oauth2-proxy (or equivalent) topology: terminate OIDC, inject X-PREFERRED-USERNAME, strip client-set values of those headers.
  • Two-MCP-server URL strategy: hostname-based vs pathname-based routing.
  • Health endpoint exposure decisions.

Section 4 — Identity Provider Setup

  • Pointer to the existing OIDC threat model in SharedDeployment.md.
  • OAuth client registration steps, redirect URI strategy, clientSecret secrets-management guidance.
  • The two paths: server-side OIDC verification vs proxy-header trust mode (auth.trustProxyIdentity).

Section 5 — Shared Chroma Topology

  • NEO_CHROMA_UNIFIED=true + engines.kb.chroma.{host,port} mechanics.
  • Provisioning a managed Chroma vs self-hosted; collection-naming contract; operator obligations re: multi-tenant isolation.

Section 6 — Environment Variable Inventory

  • Consolidated matrix of every NEO_* / AUTH_* / AUTO_* env var across both servers, with deployment-mode-specific defaults.
  • Gap-surfacing question: should the two config.template.mjs files be consolidated for deployment? Either reach a verdict or file a follow-up ticket.

Section 7 — Healthcheck Verification

  • Pointer to MemoryCore.md §Healthcheck Response Shape (canonical contract).
  • Walkthrough of expected providers.embedding, providers.summary, providers.auth, database.topology, identity blocks for the cloud-deployed shape.

Section 8 — First-Connection Smoke Test

  • Agent harness configuration to point at the deployed URLs.
  • Cross-tenant isolation verification (alice/bob query test).
  • Auth-rejection probe (request without OIDC token vs with OIDC token vs with proxy header).

Section 9 — Known Gaps & Follow-Up Tickets

  • This section IS the meta-documentation surface. Each gap surfaced during writing gets a sub-bullet pointing to a filed follow-up ticket (provider consolidation, server-knows-its-URL, infra-as-code, integration testing harness, etc.).

Acceptance Criteria

  • AC1: learn/agentos/DeploymentCookbook.md authored with all 9 sections above (or a documented justification for omitting / restructuring any section).
  • AC2: learn/tree.json updated with new entry under AgentOS parent, with appropriate ID (e.g., agentos/DeploymentCookbook).
  • AC3: Cross-link from existing SharedDeployment.md ("see Cookbook for walkthrough") and from MemoryCore.md where relevant.
  • AC4: Section 9 enumerates each gap surfaced during writing, each with a filed follow-up ticket number. The expected count is 3-6 follow-up tickets; if zero gaps surface, the cookbook-writing pass was insufficient.
  • AC5: Walkthrough is concrete enough that an external operator can follow it from git clone → first verified healthcheck against a deployed stack (modulo their own infra provisioning, which is operator-territory).
  • AC6: No mention of specific external customers / partners by name (per repo discipline). Generic framing only ("external operator", "team-shared deployment", "partner-trial topology").

Out of Scope

  • Building Docker / docker-compose.yml infra-as-code. If the cookbook concludes containerization recipes belong in repo, that's a follow-up ticket. The cookbook itself is documentation.
  • Implementing provider consolidation if the cookbook surfaces it — separate ticket.
  • Building integration test infrastructure (staged URLs, OAuth fixtures, end-to-end stack tests) — separate ticket.
  • Multi-tenant graph isolation (tracked under #10011).
  • Production hardening (rate limiting, secrets rotation, observability beyond healthcheck) — beyond MVP scope.

Avoided Traps

  • Rejected: Extend SharedDeployment.md instead of creating a sibling cookbook. Reference and walkthrough are different documentation modes. SharedDeployment.md is a reference; mixing walkthrough prose into it would dilute both. Sibling guide preserves separation of concerns.
  • Rejected: Make this an epic ticket with the gap-surfaced subs as children. The cookbook is the deliverable; the surfaced gaps become sibling tickets, not children. Epic-shaping would impose unnecessary process overhead (epic-review skill, etc.) for a guide-write.
  • Rejected: Defer cookbook authoring until all surfaceable gaps are pre-known. Counter-productive — the gap-discovery is the value. Writing forces structured reasoning that lookup-from-memory does not.
  • Rejected: Bundle a partner-specific deployment recipe. Per repo discipline, no customer-specific naming. The cookbook serves any external operator deploying the shared topology.

Related

  • Parent epic: #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core.
  • Predecessor sub-epic (closing today): #10721 — Shared deployment MVP completeness gaps. The cookbook depends on its work landing.
  • Predecessor substrate sub-epic: #10691 — Shared KB/MC Team Deployment MVP (substrate primitives).
  • Reference docs to compose: SharedDeployment.md, MemoryCore.md, KnowledgeBase.md, MemoryCoreMcpAuth.md.
  • Sibling adjacency: #10291 — Organism self-defense substrate for cloud-phase #9999 (operator-facing concern with overlap on cloud topology).

Origin Session ID: 34c8f800-1855-43ff-aea6-d5e6b9410978

Retrieval Hint: query_raw_memories(query="cloud deployment cookbook walkthrough KB MC docker container reverse proxy oauth2-proxy oidc external operator partner-trial topology gap surfacing")

tobiu referenced in commit d7c119c - "docs(agentos): author cloud deployment cookbook (#10800) (#10806) on May 6, 2026, 11:32 AM
tobiu closed this issue on May 6, 2026, 11:32 AM
tobiu referenced in commit 5e320f6 - "docs(agentos): polish cookbook Section 7 healthcheck JSON sample (#10800) (#10811) on May 6, 2026, 11:48 AM