Context
Phase 4 sub-ticket of #11628 (Operations + Observability for Cloud-Native KB Deployments) — addresses the per-substrate retention-policy framing flagged in #11628's body as "follow-up ticket scope (deferred to Phase 4 implementation; let implementer-hot-context shape)." That implementer-hot-context shaping happens here.
Surfaced 2026-05-20 during nightshift-mode operator delegation: operator directive to keep parallel #11624 lanes moving while @neo-gpt works on Phase 0/1C-α #11631 (write-side tenant stamping). This ticket is the cleanest pre-Phase-2 Phase-4-actionable slice — touches buildScripts/ai/backup.mjs + config templates, zero overlap with GPT's VectorService.mjs lane.
The Problem
buildScripts/ai/backup.mjs:337-401 cleanOldBackups() has hardcoded retention constants:
const K = 3;
const N_DAYS = 30;
These are atomic-bundle-level constants — they apply uniformly across KB + MC + graph + concepts + trajectories + mailbox. For zero-config Neo deployments this is fine, but:
- Cloud-native deployments with tighter disk budgets need lower
K / N_DAYS to bound bundle accumulation.
- High-tempo deployments doing hourly backups (vs the default daily cadence) need higher
K so 24h of bundles aren't churn-deleted.
- Recovery-critical deployments (e.g., the 2026-05-17 MC wipe recovery anchor) need higher
K and/or N_DAYS so a sufficient bundle history is retained for post-incident forensics.
Hardcoded constants force operator forks of backup.mjs for any deviation. Config-driven retention closes this gap cleanly without breaking the atomic-bundle architecture.
The Architectural Reality
Atomic-bundle architecture preserved. Per #10129 Phase 3 (peer-script-not-delegate), each backup is a single timestamped directory containing kb/ + mc/ + graph/ + concepts/ + trajectories/ + mailbox/. The bundle is the atomic unit; retention sweeps the bundle list, not per-substrate subdirs.
Per-substrate retention asymmetry framing from #11628 deferred to a follow-up. #11628 body discusses KB-as-cache vs MC-as-store and argues KB could tolerate lighter retention (weekly cadence; backup is cost-optimization). But the atomic-bundle shape doesn't support per-substrate retention without either:
- Splitting backups into per-substrate bundles (breaks #10129 atomicity)
- Selective per-substrate-subdir deletion inside older bundles (complex; partial-state risk during operator recovery)
Both are substantive architectural changes. This ticket scopes narrowly to bundle-level retention parameterization — uniform K + N_DAYS across the bundle, but configurable. The per-substrate asymmetry can be a follow-up once Phase 2 ingestion ships and the operational shape is clearer.
The Fix
1. Config additions
Add backupRetention config to BOTH KB + MC config templates (the backup script bundles both, so either config could carry the retention policy — adding to both keeps the contract discoverable from either MCP server's config surface):
backupRetention: {
keepMinimum: 3,
maxDays : 30
}
The two-axis policy (count floor + age threshold) preserves the current behavior exactly when defaults are used.
2. buildScripts/ai/backup.mjs refactor
async function cleanOldBackups(backupRoot, logger, {keepMinimum = 3, maxDays = 30} = {}) {
const K = keepMinimum;
const N_DAYS = maxDays;
}
Caller resolves the config at invoke time:
import kbConfig from '../../ai/mcp/server/knowledge-base/config.mjs';
import mcConfig from '../../ai/mcp/server/memory-core/config.mjs';
const retention = mcConfig.backupRetention ?? kbConfig.backupRetention ?? {keepMinimum: 3, maxDays: 30};
await cleanOldBackups(backupRoot, logger, retention);
3. Test coverage
New spec under test/playwright/unit/ai/buildScripts/backup-retention.spec.mjs:
- Default config (K=3, N_DAYS=30) matches pre-#NEW-TICKET behavior exactly (byte-equivalence anchor)
- Tighter config (K=1, N_DAYS=7) deletes more aggressively
- Higher-cadence config (K=24, N_DAYS=2) preserves rolling-24h history
- Missing config keys fall through to defaults (preserves zero-config deployments)
- Tests mock the filesystem via a tmp fixture (synthetic backup-* dirs with controllable mtimes)
Acceptance Criteria
Out of Scope
- Per-substrate retention asymmetry (KB lighter vs MC stricter) — deferred to a follow-up once Phase 2 ingestion ships and the cloud-deployment operational shape is clearer. The architectural change required (either per-substrate bundles or selective subdir deletion) is substantial and not blocking the V1 cloud deployment story.
- Defrag snapshot retention (
dist/chromadb-backups/<target>/ per-collection snapshots from defragChromaDB.mjs) — separate substrate, not covered here.
- Backup cadence control (daily vs hourly vs weekly) — separate config concern; this ticket only addresses how long bundles are KEPT, not how often they're CREATED.
- Retention-policy-driven alerting ("oldest available backup is N days old") — Phase 4D alerting sub-ticket scope, not retention-policy core.
Avoided Traps
| Trap |
Why rejected |
| Adding retention config to only ONE service's config (KB OR MC, not both) |
Discoverability: future operator reading either config should find the retention surface. Adding to both with a fallback chain is the substrate-correct shape. |
Renaming K and N_DAYS to longer names directly |
Local-variable readability; the option names (keepMinimum, maxDays) are the public surface. Local destructure to K + N_DAYS preserves the existing 60-line cleanOldBackups internal style. |
| Per-substrate retention in this ticket |
Breaks atomic-bundle architecture or introduces partial-state risk. Defer until Phase 2 ships and the operational shape is concrete. |
| New retention strategy enum (LRU / FIFO / etc.) |
Premature abstraction. The K + N_DAYS shape is empirically sufficient for the known cloud-deployment scenarios. |
Related
- Parent Phase Epic: #11628 (Phase 4: Operations + Observability)
- Parent meta-Epic: #11624
- Bundle architecture origin: #10129 (atomic-bundle Phase 3)
- Backup orchestrator:
buildScripts/ai/backup.mjs (the file modified)
- Sibling parallel lane (@neo-gpt): #11631 Phase 0/1C-α write-side tenant stamping — different files (
ai/services/knowledge-base/VectorService.mjs), zero merge collision risk
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
query_raw_memories({query: 'backup retention policy keepMinimum maxDays cleanOldBackups Phase 4'})
ask_knowledge_base({query: 'buildScripts ai backup retention K N_DAYS', type: 'src'})
- Empirical anchor: pre-ticket hardcoded constants
K=3, N_DAYS=30 at buildScripts/ai/backup.mjs:365-366
- 2026-05-17 MC wipe recovery anchor: K=3 + N_DAYS=30 retained the 5/16 bundle that became the recovery source — this ticket preserves that protective floor as the default while allowing operator tuning
Context
Phase 4 sub-ticket of #11628 (Operations + Observability for Cloud-Native KB Deployments) — addresses the per-substrate retention-policy framing flagged in #11628's body as "follow-up ticket scope (deferred to Phase 4 implementation; let implementer-hot-context shape)." That implementer-hot-context shaping happens here.
Surfaced 2026-05-20 during nightshift-mode operator delegation: operator directive to keep parallel #11624 lanes moving while @neo-gpt works on Phase 0/1C-α #11631 (write-side tenant stamping). This ticket is the cleanest pre-Phase-2 Phase-4-actionable slice — touches
buildScripts/ai/backup.mjs+ config templates, zero overlap with GPT'sVectorService.mjslane.The Problem
buildScripts/ai/backup.mjs:337-401 cleanOldBackups()has hardcoded retention constants:const K = 3; // keep newest 3 bundles unconditionally const N_DAYS = 30; // delete older bundles only if > 30 daysThese are atomic-bundle-level constants — they apply uniformly across KB + MC + graph + concepts + trajectories + mailbox. For zero-config Neo deployments this is fine, but:
K/N_DAYSto bound bundle accumulation.Kso 24h of bundles aren't churn-deleted.Kand/orN_DAYSso a sufficient bundle history is retained for post-incident forensics.Hardcoded constants force operator forks of
backup.mjsfor any deviation. Config-driven retention closes this gap cleanly without breaking the atomic-bundle architecture.The Architectural Reality
Atomic-bundle architecture preserved. Per #10129 Phase 3 (peer-script-not-delegate), each backup is a single timestamped directory containing
kb/+mc/+graph/+concepts/+trajectories/+mailbox/. The bundle is the atomic unit; retention sweeps the bundle list, not per-substrate subdirs.Per-substrate retention asymmetry framing from #11628 deferred to a follow-up. #11628 body discusses KB-as-cache vs MC-as-store and argues KB could tolerate lighter retention (weekly cadence; backup is cost-optimization). But the atomic-bundle shape doesn't support per-substrate retention without either:
Both are substantive architectural changes. This ticket scopes narrowly to bundle-level retention parameterization — uniform K + N_DAYS across the bundle, but configurable. The per-substrate asymmetry can be a follow-up once Phase 2 ingestion ships and the operational shape is clearer.
The Fix
1. Config additions
Add
backupRetentionconfig to BOTH KB + MC config templates (the backup script bundles both, so either config could carry the retention policy — adding to both keeps the contract discoverable from either MCP server's config surface):// In aiConfig.knowledgeBase + aiConfig.memoryCore backupRetention: { keepMinimum: 3, // keep newest N bundles unconditionally; deletion only applies after this floor maxDays : 30 // bundles older than this many days are eligible for deletion (subject to keepMinimum) }The two-axis policy (count floor + age threshold) preserves the current behavior exactly when defaults are used.
2.
buildScripts/ai/backup.mjsrefactorasync function cleanOldBackups(backupRoot, logger, {keepMinimum = 3, maxDays = 30} = {}) { // existing body, but K + N_DAYS read from arguments const K = keepMinimum; const N_DAYS = maxDays; // ... rest unchanged }Caller resolves the config at invoke time:
import kbConfig from '../../ai/mcp/server/knowledge-base/config.mjs'; import mcConfig from '../../ai/mcp/server/memory-core/config.mjs'; // Use KB config as primary (it's the orchestrating server); MC config can override if set. const retention = mcConfig.backupRetention ?? kbConfig.backupRetention ?? {keepMinimum: 3, maxDays: 30}; await cleanOldBackups(backupRoot, logger, retention);3. Test coverage
New spec under
test/playwright/unit/ai/buildScripts/backup-retention.spec.mjs:Acceptance Criteria
backupRetention: {keepMinimum, maxDays}added toai/mcp/server/knowledge-base/config.{mjs,template.mjs}andai/mcp/server/memory-core/config.{mjs,template.mjs}with default{keepMinimum: 3, maxDays: 30}.cleanOldBackups()inbuildScripts/ai/backup.mjsaccepts an options object withkeepMinimum+maxDaysand reads them in lieu of the hardcoded constants.mcConfig.backupRetention ?? kbConfig.backupRetention ?? <defaults>.npm run ai:backupbehavior under default config is byte-equivalent to pre-ticket behavior (manual smoke OR spec assertion).Out of Scope
dist/chromadb-backups/<target>/per-collection snapshots fromdefragChromaDB.mjs) — separate substrate, not covered here.Avoided Traps
KandN_DAYSto longer names directlykeepMinimum,maxDays) are the public surface. Local destructure toK+N_DAYSpreserves the existing 60-line cleanOldBackups internal style.Related
buildScripts/ai/backup.mjs(the file modified)ai/services/knowledge-base/VectorService.mjs), zero merge collision riskOrigin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
query_raw_memories({query: 'backup retention policy keepMinimum maxDays cleanOldBackups Phase 4'})ask_knowledge_base({query: 'buildScripts ai backup retention K N_DAYS', type: 'src'})K=3, N_DAYS=30atbuildScripts/ai/backup.mjs:365-366