LearnNewsExamplesServices
Frontmatter
id11720
titleCloud Agent OS Deployment Readiness
stateClosed
labels
epicaiarchitecture
assignees[]
createdAtMay 21, 2026, 7:08 PM
updatedAtJun 7, 2026, 7:14 PM
githubUrlhttps://github.com/neomjs/neo/issues/11720
authorneo-opus-ada
commentsCount12
parentIssue9999
subIssues
11721 Cloud deployment topology + scheduler-task-taxonomy ADR
11722 Top-level ai deployment + maintenance-policy config
11723 Production container topology — multi-container + resource limits
11724 Production reference deployment profile + compose
11725 Deployed healthcheck + production-shaped journey proof
11726 Tenant-repo ingestion operational model for cloud deployment
11727 DeploymentCookbook realignment as accurate deployment authority
11728 Day-0 executable cloud-deployment tutorial
11743 Specify repo-push receiver and auth flow for KB ingestion envelopes
subIssuesCompleted9
subIssuesTotal9
blockedBy[]
blocking[x] 11733 Downstream external deployment-pipeline wiring (post-MVP cloud deployment)
closedAtMay 22, 2026, 3:02 PM

Cloud Agent OS Deployment Readiness

Closed v13.0.0/archive-v13-0-0-chunk-12 epicaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 21, 2026, 7:08 PM

Context

Graduated from Discussion #11718 (Cloud Agent OS Deployment Readiness) on 2026-05-21 after cross-family convergence: @neo-gpt [SCOPING_APPROVED] @ DC_kwDODSospM4BA4Np. @neo-gemini-pro is unavailable (~1 month); per an explicit operator §6.5 liveness disposition recorded on #11718 (DC_kwDODSospM4BA4Qb), graduation proceeds now on Claude + GPT + operator convergence — Gemini's no-signal is archived as a liveness gap (Gemini may re-open the risk on return), not treated as implicit consent. Waiting ~1 month is rejected: it would miss the external-stakeholder MVP deployment window.

This Epic is the #11718 graduation artifact; #11718 is marked GRADUATED + closed referencing this Epic as the immediate next step of the same ideation-sandbox-workflow.md §6.7 graduation sequence (satisfies the ticket-create §1c cross-check — the Discussion is converged, not prematurely mined). Downstream amendments may be needed if Gemini re-opens a risk on return.

Mission: an external dev team can deploy Neo's Agent OS (KB + MC MCP servers + orchestrator + supporting infra) into a containerized cloud environment and use it against their own repositories — and a future agent/operator can do this without tacit maintainer knowledge.

The Problem

Not a missing capability — substrate drift. After the orchestrator daemon landed, the deployment docs / tests / profiles under ai/deploy/ were never realigned with the actual Agent OS runtime shape. Three gaps, shared root = future-session operability: (1) no ADR documents the deployment topology; (2) no production-shaped reference deployment profile (docker-compose.yml runs only a stale KB/MC/Chroma 3-service baseline); (3) DeploymentCookbook.md is stale as deployment authority + there is no day-0 executable tutorial. Full archaeology: Discussion #11718 §3 + its Evidence Appendix.

The Architectural Reality

Audited at dev (per #11718): KB/MC/Chroma each containerized (ai/deploy/docker-compose.yml, 3 services); the Orchestrator is a mixed-responsibility local Agent OS supervisor (cloud-relevant daemon-fleet lanes + local-only maintainer lanes — cannot be containerized as-is); no production model-provider profile/container; no per-container resource limits; reverse-proxy refs unwired (port mismatch — proxy 3001/3002 vs compose 3000/3001); backups not externalized for redeploy-survival; KB/MC compose services define no Docker healthcheck: blocks; no ADR for deployment topology / provider isolation / persistence. ADRs 0003 (unified Chroma), 0009 (cross-daemon lease) exist.

The Fix — Decomposition

Decision workstreams D0–D5 (each may produce an ADR once it reaches a durable decision — an ADR is a decision record of the chosen outcome + rejected options, per ADR 0005/0006, not an open A/B/C/D workspace) + implementation sub-tickets Sub A–F2:

Sub / workstream Scope MVP class (V-B-A)
Sub A Top-level deployment/maintenance config — reshape #11075 MVP-critical
D0 Deployment-topology + scheduler-task-taxonomy decision — classify every orchestrator/daemon lane cloud-deployable / local-only / shared primitive → first ADR MVP-critical (gates B/C/D)
Sub B Production container topology (multi-container; profile variants) MVP-critical
Sub C Reference deployment profile + compose (orchestrator service or ADR-justified exclusion; proxy/TLS wiring; redeploy-safe persistence) MVP-critical
Sub D Deployed healthcheck + production-shaped journey proof MVP-critical
Sub E Tenant-repo ingestion operational model (push-based default) MVP-critical
Sub F1 DeploymentCookbook.md realignment — deployment-authority repair deferrable-vs-critical — V-B-A
Sub F2 Day-0 step-by-step tutorial (separate new markdown artifact) MVP-critical (reproducibility proof)

The orchestrator's local wake-delivery lane defaults disabled / no-op / tenant-bound in the cloud profile; remote graph-backed agent messages are cloud-relevant.

Sprint Budget — ceiling, not target

  • Ceiling: 60–80 merged PRs maximum (~4 days at current velocity). Hard cap.
  • Sizing hypothesis: aim for a ~25-PR critical path if V-B-A supports it (operator gut-feel — explicitly a V-B-A item, not a commitment).
  • Verification method: once the sub-tickets exist, map each sub → its minimum PR chain → classify MVP-critical / deferrable / later-v13. That map sets the actual sprint budget.
  • Review implication: epic-review challenges PR-count bloat if the decomposition expands beyond the mission proof path. 60–80 is the ceiling, not the expected plan.

Discussion Criteria Mapping

Per ideation-sandbox-workflow.md §6.6 — Discussion #11718's resolved criteria → this Epic's ACs:

#11718 criterion Epic AC
OQ1 — gap classification (proof / exploration / decision) AC: each sub carries its gap class
OQ2 — in-repo proof = incremental adoption ladder AC: the adoption-ladder proof below
OQ3 — #11075 → Sub A AC: Sub A reshapes #11075
D1 Double Diamond (container topology) carried as the D1-workstream ADR's selected-decision rationale

Acceptance Criteria

  • All sub-tickets (Sub A, D0, B, C, D, E, F1, F2) filed, each cross-referenced to this Epic + classified MVP-critical / deferrable / later-v13.
  • D0 decision made + first ADR authored (deployment topology + scheduler-task-taxonomy) under learn/agentos/decisions/.
  • A production-shaped reference deployment profile exists in-repo (ai/deploy/) running the decided topology.
  • Proof — incremental adoption ladder (each milestone independently verifiable): (0) runnable Docker remote-MCP SSE healthcheck demo · (1) MC connection · (2) KB connection over the Neo-shared corpus · (3) tenant KB ingestion · (4) a client-side custom-file-type parser · (5) bulk/backfill path · (6) (optional) server-side source/clone · (7) backup → redeploy-survival → handoff.
  • Cloud-profile negative-behavior AC: the cloud profile asserts the absence of local-only behavior (no git pull origin/dev, no local worktree discovery, no .sync-metadata.json reset, no local-checkout KB-sync cascade).
  • KB/MC container readiness semantics defined; deployed healthcheck covers the MCP servers + orchestrator.
  • DeploymentCookbook.md realigned as deployment authority; a day-0 executable tutorial exists.
  • ADR set governed by an owner-map, not a separate ADR sub-ticket: D0 #11721 emits the topology + scheduler-task-taxonomy ADR plus the stale-ADR (0003/0009) reconciliation sweep; Sub B #11723 emits the D1 container-topology ADR (with D2 provider-isolation as a section unless D0/D1 falsifies the fold); Sub C #11724 emits the D4 backup/redeploy-persistence ADR; Sub E #11726 emits the D3 ingestion-ownership ADR. Each owner emits its ADR in the decision PR, or records explicit no-ADR-required rationale. A dedicated ADR sub is re-filed only if D0 proves D2 / stale-ADR governance is too cross-cutting for the owner-map.
  • Sprint stayed within the 60–80-PR ceiling; the sub→PR-chain map is documented.

Out of Scope

  • Server-side repo cloning (a D3 high-blast exploration — push-based ingestion is the MVP default).
  • SQLite → networked-SQL graph-store migration (D5 — deferred v13 follow-up).
  • External-repo deployment-pipeline wiring (a later phase, contingent on this Epic's in-repo proof).
  • Broad backlog archaeology / ticket-supersession sweeps.

Avoided Traps

Trap Why rejected
Mono-container deployment Defeats per-service resource isolation — the devops concern that motivates the mission
Containerize the current Orchestrator as-is It is mixed-responsibility (cloud + local-only lanes); D0 task-taxonomy must split first
ADR as an open A/B/C/D options workspace An ADR is a decision record (chosen outcome + rejected options), per ADR 0005/0006
Encoding 60–80 PRs as the sprint target 60–80 is the ceiling; ~25 is the hypothesis; the sub→PR-chain map sets the real budget

Related

  • Origin Discussion: #11718 (archaeological source — body Updates a–h + the 19-comment thread; GPT SCOPING_APPROVED @ DC_kwDODSospM4BA4Np; operator §6.5 disposition @ DC_kwDODSospM4BA4Qb)
  • Parent: #9999 (v13 Cloud-Native Knowledge & Multi-Tenant umbrella)
  • Folds in: #11075 (top-level config → Sub A), #11649 (backup-config tracked-input)
  • Related: #11719 (DeploymentCookbook §6 docs defect), #10801 (reference Docker/compose artifacts), #11003 (Dockerized remote MCP transport proof)
  • ADRs: 0003 (unified Chroma), 0005 (ADR-at-graduation), 0006 (ADRs as graph-queryable entities), 0009 (cross-daemon lease)

Signal Ledger

  • @neo-opus-ada: author / lead
  • @neo-gpt: [SCOPING_APPROVED] @ Discussion #11718 DC_kwDODSospM4BA4Np
  • @neo-gemini-pro: no signal — unavailable ~1 month

Unresolved Liveness

@neo-gemini-pro — no signal. Per the explicit operator §6.5 liveness disposition (DC_kwDODSospM4BA4Qb), this Epic graduates on Claude + GPT + operator convergence under MVP-deployment time pressure. Gemini's signal remains a recorded liveness gap — Gemini may re-open any risk on return.

Origin Session ID

8e1dc8ca-b5a5-4479-b3cf-31918eb4a5b2

Handoff Retrieval Hints

  • query_raw_memories({query: 'cloud agent os deployment readiness #11718 graduation'})
  • Discussion #11718 is the full archaeological source (body Updates a–h + the 19-comment cross-family thread).
  • Cross-family signal anchors: GPT DC_kwDODSospM4BA4Np; operator disposition DC_kwDODSospM4BA4Qb.
tobiu referenced in commit ed81417 - "feat(agentos): cloud deployment topology + scheduler-task-taxonomy ADR (#11721) (#11738) on May 21, 2026, 11:03 PM
tobiu referenced in commit a0926cf - "feat(deploy): multi-container topology + per-service resource limits (#11723) (#11741) on May 22, 2026, 4:00 AM
tobiu referenced in commit 350aa33 - "test(deploy): add adoption-ladder journey proof — milestones 0-2 (#11725) (#11757) on May 22, 2026, 1:34 PM