Context
PR #11684 implements the #11683 Knowledge Base re-embed staleStrategy: 'shadow-swap' path. Claude-family review cycle 1 approved the PR as merge-eligible, but surfaced three non-blocking follow-ups that should be tracked before operators or scripts activate the new strategy broadly.
Evidence checked before filing:
- Live PR state for #11684:
reviewDecision: APPROVED, CI green, state open.
- Duplicate sweep: live GitHub searches for
shadow-swap, staleStrategy, and getOrCreateCollection shadow collection found only parent #11683 and no dedicated hardening issue.
- Local content sweep across
resources/content/issues / resources/content/pulls found no active shadow-swap follow-up beyond #11683.
- KB ticket semantic sweep for
shadow-swap staleStrategy getOrCreateCollection follow-up ticket no canonical window orphan shadow collection did not surface an equivalent ticket.
The Problem
The shadow-swap path is currently opt-in and not wired into the normal KB sync entry point, so #11684 can merge without blocking. Before activation, the promote path needs hardening around two correctness hazards and one explicit activation gap:
- Two-rename promote window:
embedViaShadowSwap() renames live canonical -> parking, then shadow -> canonical. During the interval where no collection has the canonical name, another process or cold-cache read can call getKnowledgeBaseCollection().
getOrCreateCollection collision risk: ChromaManager.getKnowledgeBaseCollection() uses Chroma getOrCreateCollection(). If called during the no-canonical-name window, it can create an empty canonical collection. The later shadow rename can then collide, and rollback can also collide with the empty canonical.
- Pre-promote leak hygiene: if
embedChunks() fails after the shadow collection is created but before any rename, the current path can leave an orphaned shadow collection.
- Activation gap: #11684 introduces the strategy, but
DatabaseService.embedKnowledgeBase() still calls VectorService.embed(aiConfig.dataPath, {viaMcp}) without staleStrategy, so the original #11677 friction is not operationally closed until an entry point opts in.
The Architectural Reality
ai/services/knowledge-base/VectorService.mjs:343 defines embedViaShadowSwap({liveCollection, knowledgeBase, idsToDeleteCount}).
ai/services/knowledge-base/VectorService.mjs:349-362 creates the shadow collection, embeds the full corpus, then promotes via live -> parking and shadow -> canonical renames.
ai/services/knowledge-base/VectorService.mjs:380-389 invalidates cache and attempts rollback only after liveParked is true.
ai/services/knowledge-base/ChromaManager.mjs:128-135 resolves the canonical KB collection with client.getOrCreateCollection({name: aiConfig.collectionName, ...}).
ai/services/knowledge-base/DatabaseService.mjs:519-520 keeps the normal KB embed entry point on the default strategy because it does not pass staleStrategy.
This is a KB service-layer hardening ticket. It should not reopen the broad Memory Core/Chroma contention design in Discussion #11676, and it should not mutate the MCP server tool shape.
The Fix
Harden shadow-swap before activation:
- Add deterministic protection for the no-canonical-name promote window. Acceptable shapes include a scoped promotion marker/lock, an internal promote-aware canonical resolver, avoiding
getOrCreateCollection() during known promote windows, or another repo-consistent service-layer guard. The fix must prove that an empty canonical collection cannot strand the KB during promote.
- Add tested cleanup or explicit parking semantics for a shadow collection created before
embedChunks() fails.
- Add an explicit activation path only after the hardening tests pass. The likely owner is the KB bulk re-embed/sync path that reaches
DatabaseService.embedKnowledgeBase() / VectorService.embed().
- Preserve operator-gated destructive behavior. If cleanup requires deletion, tests must show the guard is constrained to safe/test-owned shadow artifacts, not arbitrary canonical data.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
VectorService.embedViaShadowSwap() promote transaction |
#11683, PR #11684 review follow-up |
Promotion cannot create or collide with an empty canonical collection during the live -> parking / shadow -> canonical interval |
Fail loudly with recoverable parking/shadow names and no silent empty canonical |
JSDoc near promote path |
Unit or integration test forcing cold-cache canonical resolve between renames |
| Shadow collection lifecycle |
#11684 review follow-up |
A failed pre-promote embed does not leave an untracked orphan shadow collection |
Explicitly parked/deleted test-owned shadow artifact with logged recovery handle |
JSDoc/comment only if behavior is non-obvious |
Unit test with embedChunks() failure before rename |
| KB sync activation path |
#11677 -> #11683 lineage |
A caller can intentionally opt into staleStrategy: 'shadow-swap' only after hardening is present |
Default remains non-shadow-swap until the activation caller is explicit |
PR body / service JSDoc for the opted-in entry point |
Targeted unit + dockerized integration evidence |
Acceptance Criteria
Out of Scope
- Reopening the broader Memory Core lightweight-operation resilience work from Discussion #11676.
- Changing MCP tool schemas or OpenAPI/YAML surfaces.
- Introducing a new Chroma topology.
- Making
shadow-swap the default strategy before the hardening criteria above pass.
Avoided Traps
- Do not paper over this with retry-only behavior. The failure mode is a semantic collision created by
getOrCreateCollection() during a name-gap, not ordinary transient Chroma flakiness.
- Do not solve it by deleting arbitrary canonical/parking collections. Deletion must remain constrained and auditable.
- Do not treat #11684 approval as proof of broad activation readiness. The review explicitly approved the implementation with follow-up, not default rollout.
Related
- Parent implementation ticket: #11683
- Implementation PR: #11684
- Source ideation lineage: Discussion #11677
- Sibling broader resilience discussion: Discussion #11676
- Review handoff: PR #11684 Claude-family review cycle 1 (
PRR_kwDODSospM8AAAABAhDLAA)
Origin Session ID: 019e44ba-d309-7e91-a819-36911fbf4e10
Handoff Retrieval Hints:
query_raw_memories("PR 11684 shadow-swap approved follow-up no canonical window orphan shadow activation")
query_raw_memories("Harden KB shadow-swap before activation")
- GitHub search:
shadow-swap staleStrategy getOrCreateCollection
Context
PR #11684 implements the #11683 Knowledge Base re-embed
staleStrategy: 'shadow-swap'path. Claude-family review cycle 1 approved the PR as merge-eligible, but surfaced three non-blocking follow-ups that should be tracked before operators or scripts activate the new strategy broadly.Evidence checked before filing:
reviewDecision: APPROVED, CI green, state open.shadow-swap,staleStrategy, andgetOrCreateCollection shadow collectionfound only parent #11683 and no dedicated hardening issue.resources/content/issues/resources/content/pullsfound no active shadow-swap follow-up beyond #11683.shadow-swap staleStrategy getOrCreateCollection follow-up ticket no canonical window orphan shadow collectiondid not surface an equivalent ticket.The Problem
The shadow-swap path is currently opt-in and not wired into the normal KB sync entry point, so #11684 can merge without blocking. Before activation, the promote path needs hardening around two correctness hazards and one explicit activation gap:
embedViaShadowSwap()renames live canonical -> parking, then shadow -> canonical. During the interval where no collection has the canonical name, another process or cold-cache read can callgetKnowledgeBaseCollection().getOrCreateCollectioncollision risk:ChromaManager.getKnowledgeBaseCollection()uses ChromagetOrCreateCollection(). If called during the no-canonical-name window, it can create an empty canonical collection. The later shadow rename can then collide, and rollback can also collide with the empty canonical.embedChunks()fails after the shadow collection is created but before any rename, the current path can leave an orphaned shadow collection.DatabaseService.embedKnowledgeBase()still callsVectorService.embed(aiConfig.dataPath, {viaMcp})withoutstaleStrategy, so the original #11677 friction is not operationally closed until an entry point opts in.The Architectural Reality
ai/services/knowledge-base/VectorService.mjs:343definesembedViaShadowSwap({liveCollection, knowledgeBase, idsToDeleteCount}).ai/services/knowledge-base/VectorService.mjs:349-362creates the shadow collection, embeds the full corpus, then promotes via live -> parking and shadow -> canonical renames.ai/services/knowledge-base/VectorService.mjs:380-389invalidates cache and attempts rollback only afterliveParkedis true.ai/services/knowledge-base/ChromaManager.mjs:128-135resolves the canonical KB collection withclient.getOrCreateCollection({name: aiConfig.collectionName, ...}).ai/services/knowledge-base/DatabaseService.mjs:519-520keeps the normal KB embed entry point on the default strategy because it does not passstaleStrategy.This is a KB service-layer hardening ticket. It should not reopen the broad Memory Core/Chroma contention design in Discussion #11676, and it should not mutate the MCP server tool shape.
The Fix
Harden shadow-swap before activation:
getOrCreateCollection()during known promote windows, or another repo-consistent service-layer guard. The fix must prove that an empty canonical collection cannot strand the KB during promote.embedChunks()fails.DatabaseService.embedKnowledgeBase()/VectorService.embed().Contract Ledger Matrix
VectorService.embedViaShadowSwap()promote transactionembedChunks()failure before renamestaleStrategy: 'shadow-swap'only after hardening is presentAcceptance Criteria
getKnowledgeBaseCollection()call betweenliveCollection.modify({name: parkingName})andshadowCollection.modify({name: aiConfig.collectionName})and proves no empty canonical collection strands the KB.embedChunks()before promotion cleans up or explicitly parks the created shadow collection with tested, constrained semantics.shadow-swapuntil the promote-window and leak-hygiene hardening are green.staleStrategy: 'shadow-swap'after hardening, with behavior documented at the owning service boundary.Out of Scope
shadow-swapthe default strategy before the hardening criteria above pass.Avoided Traps
getOrCreateCollection()during a name-gap, not ordinary transient Chroma flakiness.Related
PRR_kwDODSospM8AAAABAhDLAA)Origin Session ID: 019e44ba-d309-7e91-a819-36911fbf4e10
Handoff Retrieval Hints:
query_raw_memories("PR 11684 shadow-swap approved follow-up no canonical window orphan shadow activation")query_raw_memories("Harden KB shadow-swap before activation")shadow-swap staleStrategy getOrCreateCollection