Context
Created from the 2026-05-08 three-track coordination correction. @tobiu clarified that the next high-value test work is integration tests for the deployment pipeline for multi-user Memory Core, not another all-hands unit-test substrate push:
we have 3 frontier models on our team, we can split work, 3 focussing on unit tests feels very inefficient to me.
The team split is now:
- @neo-opus-4-7 continues the already-contextualized Bucket G unit lane (#10939 / #10924 closeout).
- @neo-gpt owns shaping the deployment-pipeline integration backlog.
- @neo-gemini-3-1-pro supports Docker/runtime integration surfaces or continues wake/daemon incident work, depending on live priority.
Peer confirmations:
- @neo-opus-4-7:
MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c — confirmed the lane split and provided historical anchors for the "LOT more" list.
- @neo-gemini-3-1-pro:
MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7 — confirmed the lane split and the same integration-gap list.
Duplicate / Adjacency Sweep
No equivalent open parent ticket was found.
Shipped or adjacent surfaces:
- #10807 — closed duplicate/original Docker Compose integration fixture for shared KB+MC deployment.
- #10895 — closed deployed-shape identity fixture + cross-tenant isolation + auth-rejection specs.
- #10896 — closed sustained-liveness helper + heartbeat propagation integration spec.
- #10897 — closed CI workflow matrix runner for test suites.
- #10915/#10917/#10918 — closed follow-up fixes from the first integration CI runs.
- #10008/#10009 — open topology-coverage tickets, currently split by unified/federated mode and with #10009 demoted to non-default diagnostic priority.
- #10010/#10011 — open team/private retrieval and Native Edge Graph tenant-isolation semantics.
- #10813 — open primary/summarization/sunset-event substrate work.
- #10844/#10854 — closed backup and wipe-detection primitives without an obvious deployed-stack backup-restore-wipe integration proof.
The Problem
The current integration suite is a useful MVP, but it is not yet the multi-user Memory Core deployment pipeline we ultimately need to trust.
Current coverage proves a narrow deployed slice:
- KB and MC healthcheck shape against Dockerized KB/MC/Chroma/mock-embedding stack.
- Proxy-header auth rejection and identity-bearing client success.
- Alice/bob Chroma tenant isolation for
add_memory and query_raw_memories.
- Short sustained-health composability checks.
- CI invokes the integration row.
The remaining product-path gaps are larger than heartbeat checks:
- Real OIDC / GitHub-account auth — current tests use
trustProxyIdentity header injection as the MVP shortcut; they do not exercise the full deployed OAuth/OIDC path.
- Primary vs secondary Memory Core behavior — current stack sets
NEO_MC_PRIMARY=false and disables auto summarization/dream/golden-path paths. It does not prove single-writer behavior, sunset-event summarization, or non-primary suppression.
- Backup / restore / wipe-detection in the deployed stack — primitives exist, but no integration proof appears to run Chroma + better-sqlite graph backup, destructive wipe simulation, restore, and integrity checks through the deployment fixture.
- Unified/federated topology routing — current Docker stack runs unified Chroma. #10008/#10009 exist, but the deployment-product validation needs a coherent topology matrix or an explicit diagnostic-vs-product split.
- Team/private retrieval semantics — #10010/#10011 are the policy/RLS substrate, but the deployed-shape proof needs to show real MCP requests obey team/private read rules across Chroma memories, session summaries, and Native Edge Graph data.
- Deliberate-regression proof discipline — for security-sensitive deployment tests, at least one test lane should document a throwaway regression that the integration suite catches before merge.
There is also a current CI hygiene signal: .github/workflows/test.yml sets NEO_TEST_SKIP_CI=true, and test/playwright/integration/HeartbeatPropagation.integration.spec.mjs still carries a CI skip referencing #10918 even though #10918 is closed. That should be audited as part of making integration coverage meaningful.
The Architectural Reality
Current repo surfaces:
ai/deploy/docker-compose.test.yml runs Chroma, deterministic OpenAI-compatible embedding server, KB server, and MC server.
test/playwright/playwright.config.integration.mjs runs the integration suite through a Docker-backed Playwright webServer fixture.
test/playwright/integration/fixtures/composeWebServer.mjs owns stack startup/readiness.
test/playwright/integration/fixtures/mcpClient.mjs owns Streamable HTTP MCP client creation, identity headers, tool-call assertions, and healthcheck calls.
test/playwright/integration/healthcheck.spec.mjs verifies current KB/MC healthcheck shape and a short sustained-health helper path.
test/playwright/integration/CrossTenantIsolation.integration.spec.mjs verifies tenant isolation for raw memories through Chroma metadata filtering.
test/playwright/integration/AuthRejection.integration.spec.mjs verifies proxy-header auth rejection and positive identity-bearing calls.
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs is the longer sustained-liveness test, but is currently CI-skipped under NEO_TEST_SKIP_CI.
Owning architecture:
- #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core umbrella.
- #10015 — Dynamic topology: unified vs federated routing.
- #10016 — Multi-Tenant Identity & Data Privacy.
- #10813 — restore session summaries via primary flag + sunset-event trigger.
- #10822/#10944 — config/healthcheck cleanup work that should prevent stale substrate artifacts from confusing the integration assertions.
The Fix
Use this epic as the parent coordination artifact for deployment-pipeline integration coverage. File focused sub-issues from this matrix rather than sending all three frontier models into the same unit-test queue.
Proposed sub-lanes:
Real OIDC / GitHub auth integration fixture
- Add a higher-fidelity auth fixture beyond
trustProxyIdentity header injection.
- Validate missing/invalid token rejection, valid user acceptance, and user identity propagation into Memory Core request context.
- Decide whether to use a local OIDC mock, GitHub-device/token fixture, or staged-cloud operator fixture.
Primary / secondary Memory Core lifecycle integration
- Prove
NEO_MC_PRIMARY=true owns summarization/sunset-event work.
- Prove secondary instances do not perform single-writer duties.
- Cross-link with #10813 and #10186.
Backup / restore / wipe-detection deployed-stack proof
- Seed Chroma memories, session summaries, and Native Edge Graph nodes.
- Run backup snapshot.
- Simulate destructive wipe or collection-count collapse inside the Docker fixture.
- Restore and verify semantic queries + graph topology + counts recover.
Topology matrix coverage
- Re-scope #10008/#10009 into a practical product/diagnostic matrix.
- Unified mode should remain the product-path default.
- Federated mode should either be diagnostic-only with explicit tests or retired from deployment-product claims.
Team/private retrieval deployed-shape proof
- Exercise #10010/#10011 semantics through MCP calls, not mocks.
- Verify write-side user tagging is unconditional.
- Verify read-side team/private/legacy behavior and Native Edge Graph RLS clauses under real request context.
Integration assertion hardening / CI skip cleanup
- Audit current
NEO_TEST_SKIP_CI effects on integration specs.
- Remove stale skip guards where the referenced issue is closed.
- Add failure artifacts/log capture for Docker stack regressions so agents do not need to rerun blind.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
| Deployment-pipeline integration epic |
@tobiu 2026-05-08 coordination correction; peer A2A confirmations |
Central parent for multi-user Memory Core deployed-shape integration coverage, separate from Bucket G unit-substrate work |
If a lane already has an owning open ticket, link it instead of duplicating |
This issue + #9999/#10015/#10016 links |
Native sub-issues and peer A2A message IDs |
| Integration test fixture surfaces |
Existing test/playwright/integration/** + ai/deploy/docker-compose.test.yml |
Grow from MVP smoke/auth/isolation checks into product-path deployment proofs |
Keep trustProxyIdentity as fast MVP path; add real-OIDC path separately |
learn/agentos/SharedDeployment.md, learn/agentos/DeploymentCookbook.md |
npm run test-integration locally and CI integration row |
| Multi-user privacy proof |
#10016, #10010, #10011 |
Verify Chroma metadata filters and graph RLS through real MCP calls for team/private/legacy modes |
Preserve legacy migration behavior until default flips are ready |
MultiTenant migration docs + SharedDeployment auth docs |
Deliberate cross-tenant regression is caught by integration test |
| Primary/single-writer proof |
#10813, #10186, Memory Core config |
Verify only primary MC instance runs single-writer lifecycle duties |
Secondary must stay query-capable but non-writing for summarization work |
MemoryCore.md health/startup sections |
Multi-instance Docker fixture test |
| Backup/restore/wipe proof |
#10844, #10854, backup tooling |
Verify Chroma + better-sqlite graph data can be backed up, wiped, restored, and queried |
Test fixture must use isolated tmp data and not touch developer data |
DeploymentCookbook recovery section |
Seed → backup → wipe → restore → query/graph/count checks |
Acceptance Criteria
Out of Scope
- Reassigning or blocking @neo-opus-4-7's Bucket G unit-test lane (#10939/#10924).
- Implementing every sub-lane in one PR.
- Making real cloud staging infrastructure an agent-only requirement; if operator-provisioned resources are needed, file the ticket with a clear
InputRequired/blocked state.
- Changing production Memory Core semantics without a focused child ticket.
- Retiring federated topology without explicit maintainer approval.
Avoided Traps / Gold Standards Rejected
- Rejected: all three frontier models pile into unit tests. One model can own Bucket G; the others should advance orthogonal deployment-product surfaces.
- Rejected: call the current integration suite sufficient because it has heartbeat checks. Heartbeat is necessary but not enough for multi-user Memory Core deployment confidence.
- Rejected: bundle real OIDC, backup/restore, primary lifecycle, topology, and team/private policy into one PR. This epic coordinates those lanes; each implementation must be narrow.
- Rejected: rely on trustProxyIdentity-only forever. It remains a fast fixture path, not the full deployed auth proof.
- Rejected: silently keep CI skips after the backing ticket closes. Skip guards must be traceable to open work or removed.
Related
- #9999 — Cloud-Native Knowledge & Multi-Tenant Memory Core.
- #10015 — Dynamic Topology — Unified vs Federated Routing.
- #10016 — Multi-Tenant Identity & Data Privacy.
- #10008/#10009 — Playwright topology coverage tracks.
- #10010/#10011 — team/private retrieval and Native Edge Graph tenant isolation.
- #10805/#10807/#10895/#10896/#10897 — current deployed integration-test lineage.
- #10813 — primary flag + sunset-event summarization trigger.
- #10844/#10854 — backup and wipe-detection primitives.
- #10944 — Memory Core healthcheck sqlite-vec artifact cleanup.
- A2A:
MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c, MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7.
Origin Session ID: c02fbf4e-870c-44c0-ba7e-e9ffacce094b
Retrieval Hint: query_raw_memories(query="deployment pipeline integration multi-user Memory Core real OIDC primary secondary backup restore team private")
Context
Created from the 2026-05-08 three-track coordination correction. @tobiu clarified that the next high-value test work is integration tests for the deployment pipeline for multi-user Memory Core, not another all-hands unit-test substrate push:
The team split is now:
Peer confirmations:
MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c— confirmed the lane split and provided historical anchors for the "LOT more" list.MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7— confirmed the lane split and the same integration-gap list.Duplicate / Adjacency Sweep
No equivalent open parent ticket was found.
Shipped or adjacent surfaces:
The Problem
The current integration suite is a useful MVP, but it is not yet the multi-user Memory Core deployment pipeline we ultimately need to trust.
Current coverage proves a narrow deployed slice:
add_memoryandquery_raw_memories.The remaining product-path gaps are larger than heartbeat checks:
trustProxyIdentityheader injection as the MVP shortcut; they do not exercise the full deployed OAuth/OIDC path.NEO_MC_PRIMARY=falseand disables auto summarization/dream/golden-path paths. It does not prove single-writer behavior, sunset-event summarization, or non-primary suppression.There is also a current CI hygiene signal:
.github/workflows/test.ymlsetsNEO_TEST_SKIP_CI=true, andtest/playwright/integration/HeartbeatPropagation.integration.spec.mjsstill carries a CI skip referencing #10918 even though #10918 is closed. That should be audited as part of making integration coverage meaningful.The Architectural Reality
Current repo surfaces:
ai/deploy/docker-compose.test.ymlruns Chroma, deterministic OpenAI-compatible embedding server, KB server, and MC server.test/playwright/playwright.config.integration.mjsruns the integration suite through a Docker-backed PlaywrightwebServerfixture.test/playwright/integration/fixtures/composeWebServer.mjsowns stack startup/readiness.test/playwright/integration/fixtures/mcpClient.mjsowns Streamable HTTP MCP client creation, identity headers, tool-call assertions, and healthcheck calls.test/playwright/integration/healthcheck.spec.mjsverifies current KB/MC healthcheck shape and a short sustained-health helper path.test/playwright/integration/CrossTenantIsolation.integration.spec.mjsverifies tenant isolation for raw memories through Chroma metadata filtering.test/playwright/integration/AuthRejection.integration.spec.mjsverifies proxy-header auth rejection and positive identity-bearing calls.test/playwright/integration/HeartbeatPropagation.integration.spec.mjsis the longer sustained-liveness test, but is currently CI-skipped underNEO_TEST_SKIP_CI.Owning architecture:
The Fix
Use this epic as the parent coordination artifact for deployment-pipeline integration coverage. File focused sub-issues from this matrix rather than sending all three frontier models into the same unit-test queue.
Proposed sub-lanes:
Real OIDC / GitHub auth integration fixture
trustProxyIdentityheader injection.Primary / secondary Memory Core lifecycle integration
NEO_MC_PRIMARY=trueowns summarization/sunset-event work.Backup / restore / wipe-detection deployed-stack proof
Topology matrix coverage
Team/private retrieval deployed-shape proof
Integration assertion hardening / CI skip cleanup
NEO_TEST_SKIP_CIeffects on integration specs.Contract Ledger Matrix
test/playwright/integration/**+ai/deploy/docker-compose.test.ymllearn/agentos/SharedDeployment.md,learn/agentos/DeploymentCookbook.mdnpm run test-integrationlocally and CI integration rowAcceptance Criteria
NEO_TEST_SKIP_CIintegration skips either removed or backed by an open ticket.epic-resolutionand states which deployment-product gaps remain intentionally deferred.Out of Scope
InputRequired/blocked state.Avoided Traps / Gold Standards Rejected
Related
MESSAGE:349ca5c0-dae6-4ac8-b072-899f09b9cb8c,MESSAGE:e570df94-2e39-4388-bd7a-f109c6f47dd7.Origin Session ID: c02fbf4e-870c-44c0-ba7e-e9ffacce094b
Retrieval Hint:
query_raw_memories(query="deployment pipeline integration multi-user Memory Core real OIDC primary secondary backup restore team private")