Context
#10721 (Shared deployment MVP completeness gaps) closes 9/10 → 10/10 with the three PRs landing today (#10797 / #10798 / #10799). #10691 shipped the substrate primitives. learn/agentos/SharedDeployment.md exists as the operator-facing reference doc — it documents what each piece does, how to verify topology, threat models, healthcheck shapes.
What's missing is a walkthrough cookbook — a concrete, ordered, end-to-end deployment guide for an external operator standing up KB+MC against a shared cloud-hosted Chroma + reverse proxy + identity-aware deployment. Reference docs answer "what is this?"; cookbooks answer "how do I deploy this from zero?".
This is filed at the operator's directive to (a) provide a starting point for external onboarding, and (b) deliberately surface gaps between substrate-shipped state and external-deployable state. The act of writing the cookbook IS the gap audit.
The Problem
Substrate-complete vs deploy-walkable are different states. Today an external operator cloning dev faces these unresolved questions with no canonical answer in repo:
- Container packaging. One image with both server processes (KB + MC), or two images (one per server) deployed as sidecars? Each shape has trade-offs (process supervision vs cloud-native isolation). No documented guidance.
- Server-knows-its-URL. OAuth
redirect_uri registration, public-endpoint discovery, SSE callback URLs — does each server need to know its public-facing URL at boot? Currently the SSE port is configurable (SSE_PORT) but the canonical public URL is not surfaced to the server itself.
- Cross-server tool calls. KB and MC are independent processes that share Chroma but otherwise don't talk to each other directly. Is that the intended deployment model long-term, or is there an evolved shape that needs cross-server RPC?
- Two-MCP-server URL strategy. Operator must expose two distinct HTTPS endpoints (one per MCP server). Reverse-proxy routing pattern, port allocation, hostname-vs-pathname routing — no canonical recipe.
- Two embedding providers.
ai/mcp/server/memory-core/config.template.mjs exposes both chromaEmbeddingProvider (read by KB + MC) and neoEmbeddingProvider (read in exactly one callsite — ai/daemons/services/GoldenPathSynthesizer.mjs:81). The "two providers" model surfaced explicitly in #10773 via aligned: true/false. The cookbook-writing pass should challenge whether two-provider is justified or residual config-history; consolidation may become a follow-up ticket.
- Config-surface inventory. Both
memory-core/config.template.mjs and knowledge-base/config.template.mjs carry many env vars (AUTO_*, AUTH_*, NEO_*). No consolidated env-var matrix exists for deployment-time provisioning.
- Integration testing infrastructure. Today our test surface is unit-tested silos (
HealthService.spec.mjs, Auth.spec.mjs, etc.). We have no staged-stack integration test exercising KB+MC+Chroma+reverse-proxy+OAuth as a deployed unit. The cookbook should articulate what testing-against-deployed-shape would look like — likely surfaces a separate infrastructure ticket.
The cookbook's output is a deployment artifact for external operators. The cookbook's byproduct is each gap above becoming a discrete, actionable follow-up ticket.
The Architectural Reality
- Existing reference doc:
learn/agentos/SharedDeployment.md — covers topology modes, healthcheck verification, authentication threat model, async session summarization, migration path. Reference-format, not walkthrough-format.
- Sibling reference docs to compose:
MemoryCore.md (canonical healthcheck contract), KnowledgeBase.md, MemoryCoreMcpAuth.md (OIDC + stdio dual-path).
- Config templates:
ai/mcp/server/memory-core/config.template.mjs, ai/mcp/server/knowledge-base/config.template.mjs — env-var inventory source.
learn/tree.json: the canonical taxonomy file. New cookbook entry slots under AgentOS parent (sibling to existing agentos/SharedDeployment entry).
- Currently no Docker artifacts in repo (
docker-compose* and Dockerfile* absent). Cookbook may reference example containerization, but the repo intentionally ships zero infra-as-code today.
The Fix
Author a new guide at learn/agentos/DeploymentCookbook.md registered in learn/tree.json under AgentOS parent, with the following walkthrough structure:
Section 1 — Prerequisites & Architecture Picture
- Topology diagram: agent harnesses → reverse proxy → KB MCP server + MC MCP server → shared Chroma + per-server SQLite.
- Identity flow: external IdP → reverse proxy → MCP server (OIDC verify or proxy-header-trusted).
- What ships in repo vs what the operator provisions (Chroma instance, OAuth provider, reverse proxy, container runtime).
Section 2 — Container Packaging
- One-image-two-processes vs two-images sidecar trade-offs.
- Recommended MVP shape (likely: two images, two containers, one shared Chroma — but the cookbook-writing pass should verify by walking through the reasoning).
- Worked example
Dockerfile skeleton(s) + docker-compose.yml reference if appropriate (deferred to a follow-up infrastructure ticket if the cookbook concludes infra-as-code belongs there).
Section 3 — Reverse Proxy Configuration
oauth2-proxy (or equivalent) topology: terminate OIDC, inject X-PREFERRED-USERNAME, strip client-set values of those headers.
- Two-MCP-server URL strategy: hostname-based vs pathname-based routing.
- Health endpoint exposure decisions.
Section 4 — Identity Provider Setup
- Pointer to the existing OIDC threat model in
SharedDeployment.md.
- OAuth client registration steps, redirect URI strategy,
clientSecret secrets-management guidance.
- The two paths: server-side OIDC verification vs proxy-header trust mode (
auth.trustProxyIdentity).
Section 5 — Shared Chroma Topology
NEO_CHROMA_UNIFIED=true + engines.kb.chroma.{host,port} mechanics.
- Provisioning a managed Chroma vs self-hosted; collection-naming contract; operator obligations re: multi-tenant isolation.
Section 6 — Environment Variable Inventory
- Consolidated matrix of every
NEO_* / AUTH_* / AUTO_* env var across both servers, with deployment-mode-specific defaults.
- Gap-surfacing question: should the two
config.template.mjs files be consolidated for deployment? Either reach a verdict or file a follow-up ticket.
Section 7 — Healthcheck Verification
- Pointer to
MemoryCore.md §Healthcheck Response Shape (canonical contract).
- Walkthrough of expected
providers.embedding, providers.summary, providers.auth, database.topology, identity blocks for the cloud-deployed shape.
Section 8 — First-Connection Smoke Test
- Agent harness configuration to point at the deployed URLs.
- Cross-tenant isolation verification (alice/bob query test).
- Auth-rejection probe (request without OIDC token vs with OIDC token vs with proxy header).
Section 9 — Known Gaps & Follow-Up Tickets
- This section IS the meta-documentation surface. Each gap surfaced during writing gets a sub-bullet pointing to a filed follow-up ticket (provider consolidation, server-knows-its-URL, infra-as-code, integration testing harness, etc.).
Acceptance Criteria
Out of Scope
- Building Docker /
docker-compose.yml infra-as-code. If the cookbook concludes containerization recipes belong in repo, that's a follow-up ticket. The cookbook itself is documentation.
- Implementing provider consolidation if the cookbook surfaces it — separate ticket.
- Building integration test infrastructure (staged URLs, OAuth fixtures, end-to-end stack tests) — separate ticket.
- Multi-tenant graph isolation (tracked under #10011).
- Production hardening (rate limiting, secrets rotation, observability beyond healthcheck) — beyond MVP scope.
Avoided Traps
- Rejected: Extend
SharedDeployment.md instead of creating a sibling cookbook. Reference and walkthrough are different documentation modes. SharedDeployment.md is a reference; mixing walkthrough prose into it would dilute both. Sibling guide preserves separation of concerns.
- Rejected: Make this an
epic ticket with the gap-surfaced subs as children. The cookbook is the deliverable; the surfaced gaps become sibling tickets, not children. Epic-shaping would impose unnecessary process overhead (epic-review skill, etc.) for a guide-write.
- Rejected: Defer cookbook authoring until all surfaceable gaps are pre-known. Counter-productive — the gap-discovery is the value. Writing forces structured reasoning that lookup-from-memory does not.
- Rejected: Bundle a partner-specific deployment recipe. Per repo discipline, no customer-specific naming. The cookbook serves any external operator deploying the shared topology.
Related
- Parent epic: #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core.
- Predecessor sub-epic (closing today): #10721 — Shared deployment MVP completeness gaps. The cookbook depends on its work landing.
- Predecessor substrate sub-epic: #10691 — Shared KB/MC Team Deployment MVP (substrate primitives).
- Reference docs to compose:
SharedDeployment.md, MemoryCore.md, KnowledgeBase.md, MemoryCoreMcpAuth.md.
- Sibling adjacency: #10291 — Organism self-defense substrate for cloud-phase #9999 (operator-facing concern with overlap on cloud topology).
Origin Session ID: 34c8f800-1855-43ff-aea6-d5e6b9410978
Retrieval Hint: query_raw_memories(query="cloud deployment cookbook walkthrough KB MC docker container reverse proxy oauth2-proxy oidc external operator partner-trial topology gap surfacing")
Context
#10721 (Shared deployment MVP completeness gaps) closes 9/10 → 10/10 with the three PRs landing today (#10797 / #10798 / #10799). #10691 shipped the substrate primitives.
learn/agentos/SharedDeployment.mdexists as the operator-facing reference doc — it documents what each piece does, how to verify topology, threat models, healthcheck shapes.What's missing is a walkthrough cookbook — a concrete, ordered, end-to-end deployment guide for an external operator standing up KB+MC against a shared cloud-hosted Chroma + reverse proxy + identity-aware deployment. Reference docs answer "what is this?"; cookbooks answer "how do I deploy this from zero?".
This is filed at the operator's directive to (a) provide a starting point for external onboarding, and (b) deliberately surface gaps between substrate-shipped state and external-deployable state. The act of writing the cookbook IS the gap audit.
The Problem
Substrate-complete vs deploy-walkable are different states. Today an external operator cloning
devfaces these unresolved questions with no canonical answer in repo:redirect_uriregistration, public-endpoint discovery, SSE callback URLs — does each server need to know its public-facing URL at boot? Currently the SSE port is configurable (SSE_PORT) but the canonical public URL is not surfaced to the server itself.ai/mcp/server/memory-core/config.template.mjsexposes bothchromaEmbeddingProvider(read by KB + MC) andneoEmbeddingProvider(read in exactly one callsite —ai/daemons/services/GoldenPathSynthesizer.mjs:81). The "two providers" model surfaced explicitly in #10773 viaaligned: true/false. The cookbook-writing pass should challenge whether two-provider is justified or residual config-history; consolidation may become a follow-up ticket.memory-core/config.template.mjsandknowledge-base/config.template.mjscarry many env vars (AUTO_*,AUTH_*,NEO_*). No consolidated env-var matrix exists for deployment-time provisioning.HealthService.spec.mjs,Auth.spec.mjs, etc.). We have no staged-stack integration test exercising KB+MC+Chroma+reverse-proxy+OAuth as a deployed unit. The cookbook should articulate what testing-against-deployed-shape would look like — likely surfaces a separate infrastructure ticket.The cookbook's output is a deployment artifact for external operators. The cookbook's byproduct is each gap above becoming a discrete, actionable follow-up ticket.
The Architectural Reality
learn/agentos/SharedDeployment.md— covers topology modes, healthcheck verification, authentication threat model, async session summarization, migration path. Reference-format, not walkthrough-format.MemoryCore.md(canonical healthcheck contract),KnowledgeBase.md,MemoryCoreMcpAuth.md(OIDC + stdio dual-path).ai/mcp/server/memory-core/config.template.mjs,ai/mcp/server/knowledge-base/config.template.mjs— env-var inventory source.learn/tree.json: the canonical taxonomy file. New cookbook entry slots underAgentOSparent (sibling to existingagentos/SharedDeploymententry).docker-compose*andDockerfile*absent). Cookbook may reference example containerization, but the repo intentionally ships zero infra-as-code today.The Fix
Author a new guide at
learn/agentos/DeploymentCookbook.mdregistered inlearn/tree.jsonunderAgentOSparent, with the following walkthrough structure:Section 1 — Prerequisites & Architecture Picture
Section 2 — Container Packaging
Dockerfileskeleton(s) +docker-compose.ymlreference if appropriate (deferred to a follow-up infrastructure ticket if the cookbook concludes infra-as-code belongs there).Section 3 — Reverse Proxy Configuration
oauth2-proxy(or equivalent) topology: terminate OIDC, injectX-PREFERRED-USERNAME, strip client-set values of those headers.Section 4 — Identity Provider Setup
SharedDeployment.md.clientSecretsecrets-management guidance.auth.trustProxyIdentity).Section 5 — Shared Chroma Topology
NEO_CHROMA_UNIFIED=true+engines.kb.chroma.{host,port}mechanics.Section 6 — Environment Variable Inventory
NEO_*/AUTH_*/AUTO_*env var across both servers, with deployment-mode-specific defaults.config.template.mjsfiles be consolidated for deployment? Either reach a verdict or file a follow-up ticket.Section 7 — Healthcheck Verification
MemoryCore.md§Healthcheck Response Shape (canonical contract).providers.embedding,providers.summary,providers.auth,database.topology,identityblocks for the cloud-deployed shape.Section 8 — First-Connection Smoke Test
Section 9 — Known Gaps & Follow-Up Tickets
Acceptance Criteria
learn/agentos/DeploymentCookbook.mdauthored with all 9 sections above (or a documented justification for omitting / restructuring any section).learn/tree.jsonupdated with new entry underAgentOSparent, with appropriate ID (e.g.,agentos/DeploymentCookbook).SharedDeployment.md("see Cookbook for walkthrough") and fromMemoryCore.mdwhere relevant.git clone→ first verified healthcheck against a deployed stack (modulo their own infra provisioning, which is operator-territory).Out of Scope
docker-compose.ymlinfra-as-code. If the cookbook concludes containerization recipes belong in repo, that's a follow-up ticket. The cookbook itself is documentation.Avoided Traps
SharedDeployment.mdinstead of creating a sibling cookbook. Reference and walkthrough are different documentation modes.SharedDeployment.mdis a reference; mixing walkthrough prose into it would dilute both. Sibling guide preserves separation of concerns.epicticket with the gap-surfaced subs as children. The cookbook is the deliverable; the surfaced gaps become sibling tickets, not children. Epic-shaping would impose unnecessary process overhead (epic-review skill, etc.) for a guide-write.Related
SharedDeployment.md,MemoryCore.md,KnowledgeBase.md,MemoryCoreMcpAuth.md.Origin Session ID:
34c8f800-1855-43ff-aea6-d5e6b9410978Retrieval Hint:
query_raw_memories(query="cloud deployment cookbook walkthrough KB MC docker container reverse proxy oauth2-proxy oidc external operator partner-trial topology gap surfacing")