Context
Live empirical observation in session 1f30c9d8-4a36-4be0-98a5-bd5b89289227 (2026-05-01): Memory Core's stdio-mode tenant filter renders ~9700 of 9787 raw memories and 812 of 812 session summaries invisible to all stdio agents. Direct chromadb-js client probe confirms the data is intact and retrievable; only the MCP read path filters it out. This is the same class of issue that #10017 addressed for the SQLite Native Edge Graph — but #10017 closed without addressing the chromadb metadata layer.
@tobiu surfaced the symptom as "missing session summaries... we do have backups." Empirical investigation showed backups are not needed: the data is present, just untagged. Gemini and GPT independently converged on the same diagnosis and elegant shape via 3-way A2A coordination this session.
The Problem
Multi-tenant identity rollout (#10145, #10000) added where: {userId} filters to all reads in SummaryService and MemoryService. Per chromadb's documented semantics (Metadata Filtering):
"Where filters only search embeddings where the key exists. If you search with a filter like {'version': {'$ne': 1}}, metadata that does not have the key version will not be returned."
Pre-#10145 records (812 summaries + ~9700 memories) lack the userId metadata key entirely. No native chromadb operator ($exists is unsupported; $ne skips missing keys) can return them. Reads filter them out silently.
Empirical reproducer (this session):
const c = await client.getOrCreateCollection({name: 'neo-agent-sessions', embeddingFunction: dummyFn});
await c.count();
await c.get({limit: 2000, include: ['metadatas']});
await c.get({limit: 2000, where: {userId: '@neo-opus-4-7'}});
await c.get({limit: 5, include: ['metadatas']}).then(r =>
r.metadatas.filter(m => 'userId' in m).length);
Healthcheck observability supports the diagnosis: migration: {memory: 0, session: 0, total: 0, available: true}. The graph-side migration tracker shows zero migrated nodes; the chromadb side has no equivalent observability surface today.
Downstream effects observed:
get_all_summaries / query_summaries return 0 across all tenants
query_raw_memories finds nothing for legacy sessions
runSandman / Golden Path produce no sandman_handoff.md (depends on summaries)
- Pre-#10145 architectural reasoning is invisible to memory-mining workflows
The Architectural Reality
Two storage layers, separately migrated:
| Layer |
#10017 status |
This ticket |
SQLite Native Edge Graph (neo-native-graph collection) |
Closed: gradual migration via natural query patterns + observability surface |
Out of scope |
ChromaDB metadata (neo-agent-memory, neo-agent-sessions) |
Not addressed |
In scope |
#10017 used 'legacy' as the sentinel value for graph nodes. This ticket aligns with that vocabulary for chromadb consistency (rather than introducing 'shared' as a divergent term). See HealthService.mjs:294 for the existing comment establishing 'legacy' precedent.
Identity normalization gap surfaced during empirical probing: nodeId is stored as '@neo-opus-4-7' (with @ prefix) but userId is stored as 'neo-opus-4-7' (no prefix). Two parallel namespaces create a latent self-filter trap if either side ever uses the wrong form. Closing this gap is in scope as adjacent unification.
Files in scope:
ai/mcp/server/shared/services/RequestContextService.mjs — export LEGACY_USER_ID constant + normalizeUserId() boundary helper
ai/mcp/server/memory-core/services/SummaryService.mjs:103-224 — listSummaries + querySummaries: read filter becomes $or: [{userId}, {userId: LEGACY_USER_ID}]
ai/mcp/server/memory-core/services/MemoryService.mjs — listMemories + queryMemories: same shape
ai/scripts/backfillChromaLegacyUserId.mjs (new) — one-shot migration runner; idempotent
ai/mcp/server/memory-core/services/HealthService.mjs:290+ — extend migration block with chromadb-side untaggedCount observability symmetric to the existing graph-side surface
- Test files (canonical path per
feedback_mcp_test_location.md):
test/playwright/unit/ai/mcp/server/memory-core/services/SummaryService.LegacyTenant.spec.mjs (new)
test/playwright/unit/ai/mcp/server/memory-core/services/MemoryService.LegacyTenant.spec.mjs (new)
test/playwright/unit/ai/mcp/server/shared/services/RequestContextService.normalizeUserId.spec.mjs (new)
The Fix
Single concrete prescription. Three coordinated changes; one PR.
Constant + normalization helper in RequestContextService.mjs:
export const LEGACY_USER_ID = 'legacy';
export function normalizeUserId(input) {
if (input == null) return undefined;
return String(input).startsWith('@') ? input.slice(1) : input;
}
Read filter shape in SummaryService.listSummaries (and parallel in queryMethods + MemoryService):
const userId = normalizeUserId(RequestContextService.getUserId());
const where = userId
? {$or: [{userId}, {userId: LEGACY_USER_ID}]}
: undefined;
if (where) getArgs.where = where;
The if (where) guard preserves the current pattern of NOT passing undefined to chromadb (per @tobiu's Verify-Before-Assert reminder this session).
Migration runner at ai/scripts/backfillChromaLegacyUserId.mjs: parallel to buildScripts/ai/migrateMemoryCore.mjs precedent. Idempotent (checks for userId absence before tagging). Operates on both neo-agent-memory and neo-agent-sessions collections. Updates metadata only — does NOT re-embed.
Acceptance Criteria
Out of Scope
- SQLite graph-side migration (already covered by #10017 with different gradual-conversion approach)
- Boot-time auto-summarization restoration (
startup.summarizationStatus: not_attempted) — independent symptom; will file separately per coordination consensus
- StorageRouter
strategicNeighbors is not iterable guard — separate sibling bug; will file separately
- KB embedding-provider unification — covered by #10003 (GPT/Gemini ownership per swarm coordination)
- 150+ orphaned
test-session-* / test-memory-* chromadb collections — separate hygiene ticket (Gemini ownership per coordination consensus)
- Re-embedding existing records —
migrateMemoryCore.mjs precedent already covers that path; this ticket only adds metadata, never touches vectors
Avoided Traps
- Sentinel
'shared' (rejected) — initial swarm convergence preferred 'shared' for explicit visibility semantics. Reading the existing HealthService.mjs:294 comment surfaced 'legacy' as the established precedent for the graph-side migration. Vocabulary alignment across both storage layers wins over semantic precision; one term means the same thing in both places. Per feedback_verify_written_claims_against_precedent.md.
- Sentinel
'default' (rejected) — #10017's original AC text proposed 'default'. Implementation chose 'legacy'. Aligning with the actual implementation, not the spec.
- Re-embedding all legacy records — tempting because the migration script precedent (
migrateMemoryCore.mjs) does that. But re-embedding is expensive (Gemini API rate limits, ~50/10s) and unnecessary; tagging metadata is sufficient.
- In-memory filter fallback — fetching all records then filtering in code. Doesn't scale (9787 memories today, more tomorrow); migration is the chromadb-native path.
- Per-record userId inference from
participatingAgents / models metadata — looked at this; ambiguous in multi-agent sessions, brittle for sessions where attribution metadata is missing. Sentinel + uniform tag is cleaner.
- Read-path-only fix without backfill — chromadb has no
$exists operator. Confirmed via empirical probe + 2026 docs. Pure read-path elegance is mathematically not achievable; backfill is required.
- Bundling boot-summarization fix into this ticket — independent root cause; bundling violates AGENTS.md Gate 1 scoping discipline.
Related
- Predecessor: #10017 (Migration & Backward Compatibility for Multi-Tenant Schema) — closed; covered graph-side observability + gradual migration. This ticket completes the chromadb-side gap.
- Underlying multi-tenant epic: #9999 (Hardened Identity Ingestion & Tenant Isolation)
- Originating tenant filter: #10000 (Hardened Identity Ingestion & Tenant Isolation, write-side)
- Stdio identity resolution that activated the filter: #10145 (OAuth2 authentication layer for Memory Core MCP connections)
- Adjacent (will be filed separately): boot-summarization restoration; StorageRouter
strategicNeighbors Array.isArray guard; orphaned test-* chromadb collections cleanup
- Adjacent (parallel swarm work): #10003 (KB ↔ MC embedding unification — GPT/Gemini lane per swarm coordination)
Origin Session ID: 1f30c9d8-4a36-4be0-98a5-bd5b89289227
Retrieval Hint: "Memory Core tenant-isolation migration gap chromadb legacy userId backfill"
Context
Live empirical observation in session
1f30c9d8-4a36-4be0-98a5-bd5b89289227(2026-05-01): Memory Core's stdio-mode tenant filter renders ~9700 of 9787 raw memories and 812 of 812 session summaries invisible to all stdio agents. Direct chromadb-js client probe confirms the data is intact and retrievable; only the MCP read path filters it out. This is the same class of issue that #10017 addressed for the SQLite Native Edge Graph — but #10017 closed without addressing the chromadb metadata layer.@tobiu surfaced the symptom as "missing session summaries... we do have backups." Empirical investigation showed backups are not needed: the data is present, just untagged. Gemini and GPT independently converged on the same diagnosis and elegant shape via 3-way A2A coordination this session.
The Problem
Multi-tenant identity rollout (#10145, #10000) added
where: {userId}filters to all reads inSummaryServiceandMemoryService. Per chromadb's documented semantics (Metadata Filtering):Pre-#10145 records (812 summaries + ~9700 memories) lack the
userIdmetadata key entirely. No native chromadb operator ($existsis unsupported;$neskips missing keys) can return them. Reads filter them out silently.Empirical reproducer (this session):
// chromadb-js v3.3.1, against live neo-agent-memory + neo-agent-sessions: const c = await client.getOrCreateCollection({name: 'neo-agent-sessions', embeddingFunction: dummyFn}); await c.count(); // 812 await c.get({limit: 2000, include: ['metadatas']}); // 812 ids returned await c.get({limit: 2000, where: {userId: '@neo-opus-4-7'}}); // 0 ids returned await c.get({limit: 5, include: ['metadatas']}).then(r => r.metadatas.filter(m => 'userId' in m).length); // 0 / 5 — no userId key on legacyHealthcheck observability supports the diagnosis:
migration: {memory: 0, session: 0, total: 0, available: true}. The graph-side migration tracker shows zero migrated nodes; the chromadb side has no equivalent observability surface today.Downstream effects observed:
get_all_summaries/query_summariesreturn 0 across all tenantsquery_raw_memoriesfinds nothing for legacy sessionsrunSandman/ Golden Path produce nosandman_handoff.md(depends on summaries)The Architectural Reality
Two storage layers, separately migrated:
neo-native-graphcollection)neo-agent-memory,neo-agent-sessions)#10017 used
'legacy'as the sentinel value for graph nodes. This ticket aligns with that vocabulary for chromadb consistency (rather than introducing'shared'as a divergent term). See HealthService.mjs:294 for the existing comment establishing'legacy'precedent.Identity normalization gap surfaced during empirical probing:
nodeIdis stored as'@neo-opus-4-7'(with@prefix) butuserIdis stored as'neo-opus-4-7'(no prefix). Two parallel namespaces create a latent self-filter trap if either side ever uses the wrong form. Closing this gap is in scope as adjacent unification.Files in scope:
ai/mcp/server/shared/services/RequestContextService.mjs— exportLEGACY_USER_IDconstant +normalizeUserId()boundary helperai/mcp/server/memory-core/services/SummaryService.mjs:103-224— listSummaries + querySummaries: read filter becomes$or: [{userId}, {userId: LEGACY_USER_ID}]ai/mcp/server/memory-core/services/MemoryService.mjs— listMemories + queryMemories: same shapeai/scripts/backfillChromaLegacyUserId.mjs(new) — one-shot migration runner; idempotentai/mcp/server/memory-core/services/HealthService.mjs:290+— extend migration block with chromadb-sideuntaggedCountobservability symmetric to the existing graph-side surfacefeedback_mcp_test_location.md):test/playwright/unit/ai/mcp/server/memory-core/services/SummaryService.LegacyTenant.spec.mjs(new)test/playwright/unit/ai/mcp/server/memory-core/services/MemoryService.LegacyTenant.spec.mjs(new)test/playwright/unit/ai/mcp/server/shared/services/RequestContextService.normalizeUserId.spec.mjs(new)The Fix
Single concrete prescription. Three coordinated changes; one PR.
Constant + normalization helper in
RequestContextService.mjs:export const LEGACY_USER_ID = 'legacy'; export function normalizeUserId(input) { if (input == null) return undefined; return String(input).startsWith('@') ? input.slice(1) : input; }Read filter shape in SummaryService.listSummaries (and parallel in queryMethods + MemoryService):
const userId = normalizeUserId(RequestContextService.getUserId()); const where = userId ? {$or: [{userId}, {userId: LEGACY_USER_ID}]} : undefined; if (where) getArgs.where = where;The
if (where)guard preserves the current pattern of NOT passingundefinedto chromadb (per @tobiu's Verify-Before-Assert reminder this session).Migration runner at
ai/scripts/backfillChromaLegacyUserId.mjs: parallel tobuildScripts/ai/migrateMemoryCore.mjsprecedent. Idempotent (checks foruserIdabsence before tagging). Operates on bothneo-agent-memoryandneo-agent-sessionscollections. Updates metadata only — does NOT re-embed.Acceptance Criteria
LEGACY_USER_IDconstant exported fromRequestContextService.mjs; all tenant-related sentinel references in services use the constant (not string literal)normalizeUserId()helper exported from same module; covers@-prefix stripping; null-safenormalizeUserId('@x') === normalizeUserId('x')(canonical-form invariant)SummaryService.listSummaries,SummaryService.querySummaries,MemoryService.listMemories,MemoryService.queryMemoriesuse$orfilter withLEGACY_USER_IDwhen userId resolvedwhere:undefinedpath preserved when userId unresolved (if (where) getArgs.where = where)ai/scripts/backfillChromaLegacyUserId.mjsruns idempotently against both collections; tags only records lackinguserId; no embedding regenerationget_all_summaries({limit:5})returns ≥5 records;query_summaries({query:'antigravity'})returns non-emptymigration.chromadb.untaggedCount.{memory, session, total}symmetric to existing graph-side counts$or, (b) read returns nothing when no records match, (c) write tags new records with normalized userId (no@prefix)Out of Scope
startup.summarizationStatus: not_attempted) — independent symptom; will file separately per coordination consensusstrategicNeighbors is not iterableguard — separate sibling bug; will file separatelytest-session-*/test-memory-*chromadb collections — separate hygiene ticket (Gemini ownership per coordination consensus)migrateMemoryCore.mjsprecedent already covers that path; this ticket only adds metadata, never touches vectorsAvoided Traps
'shared'(rejected) — initial swarm convergence preferred'shared'for explicit visibility semantics. Reading the existingHealthService.mjs:294comment surfaced'legacy'as the established precedent for the graph-side migration. Vocabulary alignment across both storage layers wins over semantic precision; one term means the same thing in both places. Perfeedback_verify_written_claims_against_precedent.md.'default'(rejected) — #10017's original AC text proposed'default'. Implementation chose'legacy'. Aligning with the actual implementation, not the spec.migrateMemoryCore.mjs) does that. But re-embedding is expensive (Gemini API rate limits, ~50/10s) and unnecessary; tagging metadata is sufficient.participatingAgents/modelsmetadata — looked at this; ambiguous in multi-agent sessions, brittle for sessions where attribution metadata is missing. Sentinel + uniform tag is cleaner.$existsoperator. Confirmed via empirical probe + 2026 docs. Pure read-path elegance is mathematically not achievable; backfill is required.Related
strategicNeighborsArray.isArray guard; orphaned test-* chromadb collections cleanupOrigin Session ID: 1f30c9d8-4a36-4be0-98a5-bd5b89289227 Retrieval Hint: "Memory Core tenant-isolation migration gap chromadb legacy userId backfill"