Context
Sub of Phase 4 Epic #11628 (Cloud-Native KB Operations + Observability, meta-Epic #11624). Filed 2026-05-19 post-operator-V-B-A on backup substrate symmetry framing (see #11628 "KB-as-Cache vs MC-as-Store" section).
Position in Phase 4 sub-tree: foundation/config substrate that the daemons consume. Phase 4B reconciliation (#11640), Phase 4C GC (#11641), Phase 4D alerting (#11642) all interact with retention thresholds. This sub formalizes the per-substrate configuration shape they read from.
The Problem
Today's backup retention is UNIFORM across substrates:
buildScripts/ai/backup.mjs top-level: keep last 3 bundles unconditionally; delete bundles older than 30 days (default K=3, N=30)
buildScripts/ai/defragChromaDB.cleanOldBackups: keep last 3, delete others older than 7 days
But per the Phase 4 Epic #11628 "KB-as-Cache vs MC-as-Store" framing:
|
KB |
MC |
| Structural role |
Cache+index over external sources |
Primary store |
| Backup role |
Cost-optimization: avoid re-embed; reduce wall-clock recovery cost vs orchestrating N tenant re-syncs |
Data-loss prevention: amnesia-window minimization |
| Wipe consequence |
Recoverable via npm run ai:sync-kb (Neo content) + tenant re-pushes (Phase 2 cross-tenant content) |
Permanent loss of conversations + agent-thoughts in last-backup-to-wipe window |
Symmetric retention treats data-loss-prevention and cost-optimization as equivalent. They aren't. MC retention should be DAILY at HIGHER counts (amnesia minimization is mission-critical). KB JSONL bundle retention can be LIGHTER (weekly; the savings vs re-sync wall-clock are the bounded prize). Defrag pre-nuke snapshots remain symmetric (mid-flight safety net is substrate-agnostic).
The Architectural Reality
This sub touches:
| File |
Change |
buildScripts/ai/backup.mjs |
Top-level retention config becomes per-substrate-aware. Read aiConfig.{knowledgeBase,memoryCore}.backupRetention.jsonlBundle.{keepLast,maxAgeDays,cadenceHint} |
buildScripts/ai/defragChromaDB.mjs cleanOldBackups |
Same per-substrate-aware config read for defrag pre-nuke snapshots |
ai/mcp/server/knowledge-base/config.template.mjs |
Add backupRetention block with KB-tuned defaults |
ai/mcp/server/memory-core/config.template.mjs |
Add backupRetention block with MC-tuned defaults |
test/playwright/unit/ai/buildScripts/backup.spec.mjs |
Extend with per-substrate retention case coverage |
test/playwright/unit/ai/buildScripts/restore-filters.spec.mjs |
Verify retention-driven cleanup doesn't break restore |
learn/agentos/cloud-deployment/Security.md (Phase 3 #11627) |
Document the per-substrate retention defaults + tunable + rationale |
The Fix
1. Per-substrate retention config schema
backupRetention: {
jsonlBundle: {
keepLast : 2,
maxAgeDays : 14,
cadenceHint : 'weekly'
},
defragSnapshot: {
keepLast : 3,
maxAgeDays : 7
}
}
backupRetention: {
jsonlBundle: {
keepLast : 7,
maxAgeDays : 30,
cadenceHint : 'daily'
},
defragSnapshot: {
keepLast : 3,
maxAgeDays : 7
}
}
2. backup.mjs per-substrate retention enforcement
Retention sweep currently runs at the bundle-level (one timestamp = one decision). Refactor to consult per-substrate config when deciding which subdirs within a bundle survive — OR keep bundle-level retention but pick the MAX of any substrate's retention as the bundle-level threshold (simpler; small storage cost).
Lean: bundle-level retention = MAX(KB.maxAgeDays, MC.maxAgeDays) + MAX(keepLast). Bundles are atomic units; partial-keep is operationally weird. The asymmetry is in the cadence (when bundles are produced) more than the retention (how long bundles stick around). Worth peer pressure during implementation.
3. Documentation
- Inline JSDoc on the new config blocks
- Phase 3
learn/agentos/cloud-deployment/Security.md cross-reference (per #11627 ACs)
- Migration path: existing deployments get backward-compatible defaults that match current behavior; new deployments get the per-substrate-tuned defaults
Acceptance Criteria
Out of Scope
- Cadence scheduling (when daemons run backups) — config only documents the cadence hint; actual daily/weekly scheduling is per-deployment cron/daemon orchestration
- Per-tenant retention overrides (Phase 4C #11641 GC daemon scope; this sub is per-SUBSTRATE not per-TENANT)
- Restore-side retention enforcement (restore is operator-triggered; not policy-driven cleanup)
- Bundle-meta.json schema changes (existing topology compatibility check unchanged)
Avoided Traps
| Trap |
Why rejected |
| Symmetric retention (status quo) |
Treats KB cost-optimization and MC data-loss-prevention as equivalent; under-protects MC OR over-stores KB. Empirically wrong per #11628 KB-as-Cache vs MC-as-Store framing. |
| Per-subdir bundle partial-keep |
Bundles are atomic units; partial-keep makes restore.mjs semantics complex. Bundle-level MAX is simpler. |
| Per-tenant retention in this sub |
Different concern (Phase 4C #11641 scope); cross-cutting if folded in here. |
| Removing defrag pre-nuke retention asymmetry |
Defrag pre-nuke is mid-flight safety; substrate-agnostic. Symmetry there is correct. |
Related
- Parent: #11628 Phase 4 Epic (KB-as-Cache vs MC-as-Store framing section is the rationale source)
- Blocked-by (config consumers): #11640 Phase 4B reconciliation (consumes retention for tombstone-grace), #11641 Phase 4C GC (consumes retention for orphan cleanup), #11642 Phase 4D alerting (consumes retention for threshold breach severity)
- Substrate to modify:
buildScripts/ai/{backup,defragChromaDB}.mjs; KB + MC config templates
- Cross-reference for documentation: #11627 Phase 3 Security.md (KB-as-Cache vs MC-as-Store doc lands here)
- Substrate precedent: #10129 atomic-bundle architecture; #11141 graph preserve-live; #11144 Chroma preserve-live parity
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
query_raw_memories({query: 'KB-as-cache vs MC-as-store backup retention per-substrate'})
- Operator V-B-A framing (2026-05-19): "KB is still always recoverable. even external repo additions => worst case is a full re-sync from them too." + "daily daemon based backups already ensure that there are no big gaps" — the substrate-correct framing this ticket implements
- Memory anchor:
feedback_substrate_audit_consumer_sweep.md Category 4 (logical-extension scope-drift)
- Existing retention substrate:
backup.mjs §retention policy JSDoc + defragChromaDB.cleanOldBackups
Context
Sub of Phase 4 Epic #11628 (Cloud-Native KB Operations + Observability, meta-Epic #11624). Filed 2026-05-19 post-operator-V-B-A on backup substrate symmetry framing (see #11628 "KB-as-Cache vs MC-as-Store" section).
Position in Phase 4 sub-tree: foundation/config substrate that the daemons consume. Phase 4B reconciliation (#11640), Phase 4C GC (#11641), Phase 4D alerting (#11642) all interact with retention thresholds. This sub formalizes the per-substrate configuration shape they read from.
The Problem
Today's backup retention is UNIFORM across substrates:
buildScripts/ai/backup.mjstop-level: keep last 3 bundles unconditionally; delete bundles older than 30 days (defaultK=3,N=30)buildScripts/ai/defragChromaDB.cleanOldBackups: keep last 3, delete others older than 7 daysBut per the Phase 4 Epic #11628 "KB-as-Cache vs MC-as-Store" framing:
npm run ai:sync-kb(Neo content) + tenant re-pushes (Phase 2 cross-tenant content)Symmetric retention treats data-loss-prevention and cost-optimization as equivalent. They aren't. MC retention should be DAILY at HIGHER counts (amnesia minimization is mission-critical). KB JSONL bundle retention can be LIGHTER (weekly; the savings vs re-sync wall-clock are the bounded prize). Defrag pre-nuke snapshots remain symmetric (mid-flight safety net is substrate-agnostic).
The Architectural Reality
This sub touches:
buildScripts/ai/backup.mjsaiConfig.{knowledgeBase,memoryCore}.backupRetention.jsonlBundle.{keepLast,maxAgeDays,cadenceHint}buildScripts/ai/defragChromaDB.mjscleanOldBackupsai/mcp/server/knowledge-base/config.template.mjsbackupRetentionblock with KB-tuned defaultsai/mcp/server/memory-core/config.template.mjsbackupRetentionblock with MC-tuned defaultstest/playwright/unit/ai/buildScripts/backup.spec.mjstest/playwright/unit/ai/buildScripts/restore-filters.spec.mjslearn/agentos/cloud-deployment/Security.md(Phase 3 #11627)The Fix
1. Per-substrate retention config schema
// aiConfig.knowledgeBase.backupRetention (in KB config template) backupRetention: { jsonlBundle: { keepLast : 2, // weekly cadence; lighter than MC maxAgeDays : 14, // shorter than MC default cadenceHint : 'weekly' // documentation-only; backup.mjs doesn't schedule }, defragSnapshot: { keepLast : 3, // mid-flight safety; unchanged from current maxAgeDays : 7 } } // aiConfig.memoryCore.backupRetention (in MC config template) — heavier defaults backupRetention: { jsonlBundle: { keepLast : 7, // amnesia minimization maxAgeDays : 30, // current default; status quo cadenceHint : 'daily' }, defragSnapshot: { keepLast : 3, maxAgeDays : 7 } }2.
backup.mjsper-substrate retention enforcementRetention sweep currently runs at the bundle-level (one timestamp = one decision). Refactor to consult per-substrate config when deciding which subdirs within a bundle survive — OR keep bundle-level retention but pick the MAX of any substrate's retention as the bundle-level threshold (simpler; small storage cost).
Lean: bundle-level retention =
MAX(KB.maxAgeDays, MC.maxAgeDays)+MAX(keepLast). Bundles are atomic units; partial-keep is operationally weird. The asymmetry is in the cadence (when bundles are produced) more than the retention (how long bundles stick around). Worth peer pressure during implementation.3. Documentation
learn/agentos/cloud-deployment/Security.mdcross-reference (per #11627 ACs)Acceptance Criteria
aiConfig.knowledgeBase.backupRetentionconfig shape defined + documented inlineaiConfig.memoryCore.backupRetentionconfig shape defined + documented inlinebackup.mjsretention sweep reads per-substrate config (bundle-level decision = MAX or per-subdir, decided during implementation)defragChromaDB.cleanOldBackupsreads per-substrate configbackupRetentionconfig inherit current defaults (status quo)Security.md(Phase 3 #11627) cross-references the per-substrate retention defaults + rationale + tunableOut of Scope
Avoided Traps
restore.mjssemantics complex. Bundle-level MAX is simpler.Related
buildScripts/ai/{backup,defragChromaDB}.mjs; KB + MC config templatesOrigin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
query_raw_memories({query: 'KB-as-cache vs MC-as-store backup retention per-substrate'})feedback_substrate_audit_consumer_sweep.mdCategory 4 (logical-extension scope-drift)backup.mjs §retention policyJSDoc +defragChromaDB.cleanOldBackups