LearnNewsExamplesServices
Frontmatter
id10945
titleExpand deployment-pipeline integration coverage for Memory Core
stateClosed
labels
epicaitestingarchitecture
assignees[]
createdAtMay 8, 2026, 12:24 PM
updatedAtMay 22, 2026, 9:08 PM
githubUrlhttps://github.com/neomjs/neo/issues/10945
authorneo-gpt
commentsCount5
parentIssue9999
subIssues
10947 Add real OIDC auth integration fixture for Memory Core
10948 Verify primary Memory Core lifecycle in deployment fixture
10949 Prove backup restore and wipe detection for deployed Memory Core
10950 Reconcile topology matrix coverage for Memory Core deployment
10951 Add team-private retrieval integration tests for Memory Core
10952 Audit Memory Core integration CI skip gates
10964 Harden clean-checkout AI deploy bootstrap
10969 Document deploy compose env-var toggles
11003 Implement Dockerized remote MCP transport proof (v13 Release Gate)
subIssuesCompleted9
subIssuesTotal9
blockedBy[]
blocking[]
closedAtMay 22, 2026, 9:08 PM

Expand deployment-pipeline integration coverage for Memory Core

Closedepicaitestingarchitecture
neo-gpt
neo-gpt commented on May 8, 2026, 12:24 PM

Context

Created from the 2026-05-08 three-track coordination correction. @tobiu clarified that the next high-value test work is integration tests for the deployment pipeline for multi-user Memory Core, not another all-hands unit-test substrate push:

we have 3 frontier models on our team, we can split work, 3 focussing on unit tests feels very inefficient to me.

The team split is now:

  • @neo-opus-4-7 continues the already-contextualized Bucket G unit lane (#10939 / #10924 closeout).
  • @neo-gpt owns shaping the deployment-pipeline integration backlog.
  • @neo-gemini-3-1-pro supports Docker/runtime integration surfaces or continues wake/daemon incident work, depending on live priority.

Peer confirmations:

  • @neo-opus-4-7: MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c — confirmed the lane split and provided historical anchors for the "LOT more" list.
  • @neo-gemini-3-1-pro: MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7 — confirmed the lane split and the same integration-gap list.

Duplicate / Adjacency Sweep

No equivalent open parent ticket was found.

Shipped or adjacent surfaces:

  • #10807 — closed duplicate/original Docker Compose integration fixture for shared KB+MC deployment.
  • #10895 — closed deployed-shape identity fixture + cross-tenant isolation + auth-rejection specs.
  • #10896 — closed sustained-liveness helper + heartbeat propagation integration spec.
  • #10897 — closed CI workflow matrix runner for test suites.
  • #10915/#10917/#10918 — closed follow-up fixes from the first integration CI runs.
  • #10008/#10009 — open topology-coverage tickets, currently split by unified/federated mode and with #10009 demoted to non-default diagnostic priority.
  • #10010/#10011 — open team/private retrieval and Native Edge Graph tenant-isolation semantics.
  • #10813 — open primary/summarization/sunset-event substrate work.
  • #10844/#10854 — closed backup and wipe-detection primitives without an obvious deployed-stack backup-restore-wipe integration proof.

The Problem

The current integration suite is a useful MVP, but it is not yet the multi-user Memory Core deployment pipeline we ultimately need to trust.

Current coverage proves a narrow deployed slice:

  • KB and MC healthcheck shape against Dockerized KB/MC/Chroma/mock-embedding stack.
  • Proxy-header auth rejection and identity-bearing client success.
  • Alice/bob Chroma tenant isolation for add_memory and query_raw_memories.
  • Short sustained-health composability checks.
  • CI invokes the integration row.

The remaining product-path gaps are larger than heartbeat checks:

  1. Real OIDC / GitHub-account auth — current tests use trustProxyIdentity header injection as the MVP shortcut; they do not exercise the full deployed OAuth/OIDC path.
  2. Primary vs secondary Memory Core behavior — current stack sets NEO_MC_PRIMARY=false and disables auto summarization/dream/golden-path paths. It does not prove single-writer behavior, sunset-event summarization, or non-primary suppression.
  3. Backup / restore / wipe-detection in the deployed stack — primitives exist, but no integration proof appears to run Chroma + better-sqlite graph backup, destructive wipe simulation, restore, and integrity checks through the deployment fixture.
  4. Unified/federated topology routing — current Docker stack runs unified Chroma. #10008/#10009 exist, but the deployment-product validation needs a coherent topology matrix or an explicit diagnostic-vs-product split.
  5. Team/private retrieval semantics#10010/#10011 are the policy/RLS substrate, but the deployed-shape proof needs to show real MCP requests obey team/private read rules across Chroma memories, session summaries, and Native Edge Graph data.
  6. Deliberate-regression proof discipline — for security-sensitive deployment tests, at least one test lane should document a throwaway regression that the integration suite catches before merge.

There is also a current CI hygiene signal: .github/workflows/test.yml sets NEO_TEST_SKIP_CI=true, and test/playwright/integration/HeartbeatPropagation.integration.spec.mjs still carries a CI skip referencing #10918 even though #10918 is closed. That should be audited as part of making integration coverage meaningful.

The Architectural Reality

Current repo surfaces:

  • ai/deploy/docker-compose.test.yml runs Chroma, deterministic OpenAI-compatible embedding server, KB server, and MC server.
  • test/playwright/playwright.config.integration.mjs runs the integration suite through a Docker-backed Playwright webServer fixture.
  • test/playwright/integration/fixtures/composeWebServer.mjs owns stack startup/readiness.
  • test/playwright/integration/fixtures/mcpClient.mjs owns Streamable HTTP MCP client creation, identity headers, tool-call assertions, and healthcheck calls.
  • test/playwright/integration/healthcheck.spec.mjs verifies current KB/MC healthcheck shape and a short sustained-health helper path.
  • test/playwright/integration/CrossTenantIsolation.integration.spec.mjs verifies tenant isolation for raw memories through Chroma metadata filtering.
  • test/playwright/integration/AuthRejection.integration.spec.mjs verifies proxy-header auth rejection and positive identity-bearing calls.
  • test/playwright/integration/HeartbeatPropagation.integration.spec.mjs is the longer sustained-liveness test, but is currently CI-skipped under NEO_TEST_SKIP_CI.

Owning architecture:

  • #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core umbrella.
  • #10015 — Dynamic topology: unified vs federated routing.
  • #10016 — Multi-Tenant Identity & Data Privacy.
  • #10813 — restore session summaries via primary flag + sunset-event trigger.
  • #10822/#10944 — config/healthcheck cleanup work that should prevent stale substrate artifacts from confusing the integration assertions.

The Fix

Use this epic as the parent coordination artifact for deployment-pipeline integration coverage. File focused sub-issues from this matrix rather than sending all three frontier models into the same unit-test queue.

Proposed sub-lanes:

  1. Real OIDC / GitHub auth integration fixture

    • Add a higher-fidelity auth fixture beyond trustProxyIdentity header injection.
    • Validate missing/invalid token rejection, valid user acceptance, and user identity propagation into Memory Core request context.
    • Decide whether to use a local OIDC mock, GitHub-device/token fixture, or staged-cloud operator fixture.
  2. Primary / secondary Memory Core lifecycle integration

    • Prove NEO_MC_PRIMARY=true owns summarization/sunset-event work.
    • Prove secondary instances do not perform single-writer duties.
    • Cross-link with #10813 and #10186.
  3. Backup / restore / wipe-detection deployed-stack proof

    • Seed Chroma memories, session summaries, and Native Edge Graph nodes.
    • Run backup snapshot.
    • Simulate destructive wipe or collection-count collapse inside the Docker fixture.
    • Restore and verify semantic queries + graph topology + counts recover.
  4. Topology matrix coverage

    • Re-scope #10008/#10009 into a practical product/diagnostic matrix.
    • Unified mode should remain the product-path default.
    • Federated mode should either be diagnostic-only with explicit tests or retired from deployment-product claims.
  5. Team/private retrieval deployed-shape proof

    • Exercise #10010/#10011 semantics through MCP calls, not mocks.
    • Verify write-side user tagging is unconditional.
    • Verify read-side team/private/legacy behavior and Native Edge Graph RLS clauses under real request context.
  6. Integration assertion hardening / CI skip cleanup

    • Audit current NEO_TEST_SKIP_CI effects on integration specs.
    • Remove stale skip guards where the referenced issue is closed.
    • Add failure artifacts/log capture for Docker stack regressions so agents do not need to rerun blind.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
Deployment-pipeline integration epic @tobiu 2026-05-08 coordination correction; peer A2A confirmations Central parent for multi-user Memory Core deployed-shape integration coverage, separate from Bucket G unit-substrate work If a lane already has an owning open ticket, link it instead of duplicating This issue + #9999/#10015/#10016 links Native sub-issues and peer A2A message IDs
Integration test fixture surfaces Existing test/playwright/integration/** + ai/deploy/docker-compose.test.yml Grow from MVP smoke/auth/isolation checks into product-path deployment proofs Keep trustProxyIdentity as fast MVP path; add real-OIDC path separately learn/agentos/SharedDeployment.md, learn/agentos/DeploymentCookbook.md npm run test-integration locally and CI integration row
Multi-user privacy proof #10016, #10010, #10011 Verify Chroma metadata filters and graph RLS through real MCP calls for team/private/legacy modes Preserve legacy migration behavior until default flips are ready MultiTenant migration docs + SharedDeployment auth docs Deliberate cross-tenant regression is caught by integration test
Primary/single-writer proof #10813, #10186, Memory Core config Verify only primary MC instance runs single-writer lifecycle duties Secondary must stay query-capable but non-writing for summarization work MemoryCore.md health/startup sections Multi-instance Docker fixture test
Backup/restore/wipe proof #10844, #10854, backup tooling Verify Chroma + better-sqlite graph data can be backed up, wiped, restored, and queried Test fixture must use isolated tmp data and not touch developer data DeploymentCookbook recovery section Seed → backup → wipe → restore → query/graph/count checks

Acceptance Criteria

  • Native parent/child issue relationships created for each resulting sub-lane; do not rely on Markdown checkboxes as tracking.
  • Existing integration specs and CI skip guards audited; stale NEO_TEST_SKIP_CI integration skips either removed or backed by an open ticket.
  • Real-OIDC/GitHub-auth deployed-shape lane filed or explicitly rejected with rationale.
  • Primary/secondary MC lifecycle integration lane filed, linked to #10813/#10186.
  • Backup/restore/wipe-detection deployed-stack lane filed, linked to #10844/#10854.
  • Topology matrix lane reconciles #10008/#10009 with current unified-product vs federated-diagnostic posture.
  • Team/private retrieval integration lane links to #10010/#10011 and verifies policy through MCP requests.
  • At least one security-sensitive integration lane documents deliberate-regression proof in its PR body.
  • Final closeout runs epic-resolution and states which deployment-product gaps remain intentionally deferred.

Out of Scope

  • Reassigning or blocking @neo-opus-4-7's Bucket G unit-test lane (#10939/#10924).
  • Implementing every sub-lane in one PR.
  • Making real cloud staging infrastructure an agent-only requirement; if operator-provisioned resources are needed, file the ticket with a clear InputRequired/blocked state.
  • Changing production Memory Core semantics without a focused child ticket.
  • Retiring federated topology without explicit maintainer approval.

Avoided Traps / Gold Standards Rejected

  • Rejected: all three frontier models pile into unit tests. One model can own Bucket G; the others should advance orthogonal deployment-product surfaces.
  • Rejected: call the current integration suite sufficient because it has heartbeat checks. Heartbeat is necessary but not enough for multi-user Memory Core deployment confidence.
  • Rejected: bundle real OIDC, backup/restore, primary lifecycle, topology, and team/private policy into one PR. This epic coordinates those lanes; each implementation must be narrow.
  • Rejected: rely on trustProxyIdentity-only forever. It remains a fast fixture path, not the full deployed auth proof.
  • Rejected: silently keep CI skips after the backing ticket closes. Skip guards must be traceable to open work or removed.

Related

  • #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core.
  • #10015 — Dynamic Topology — Unified vs Federated Routing.
  • #10016 — Multi-Tenant Identity & Data Privacy.
  • #10008/#10009 — Playwright topology coverage tracks.
  • #10010/#10011 — team/private retrieval and Native Edge Graph tenant isolation.
  • #10805/#10807/#10895/#10896/#10897 — current deployed integration-test lineage.
  • #10813 — primary flag + sunset-event summarization trigger.
  • #10844/#10854 — backup and wipe-detection primitives.
  • #10944 — Memory Core healthcheck sqlite-vec artifact cleanup.
  • A2A: MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c, MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7.

Origin Session ID: c02fbf4e-870c-44c0-ba7e-e9ffacce094b

Retrieval Hint: query_raw_memories(query="deployment pipeline integration multi-user Memory Core real OIDC primary secondary backup restore team private")

tobiu referenced in commit a77adf0 - "docs(agentos): v13 architectural path strategy document (#10957) (#10958) on May 8, 2026, 2:18 PM