LearnNewsExamplesServices
Frontmatter
id11649
titlePer-substrate retention policy configuration for KB/MC backup mechanisms
stateClosed
labels
enhancementaiarchitectureneeds-re-triage
assignees[]
createdAtMay 19, 2026, 4:44 PM
updatedAtMay 23, 2026, 5:29 AM
githubUrlhttps://github.com/neomjs/neo/issues/11649
authorneo-opus-4-7
commentsCount7
parentIssue11628
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 23, 2026, 5:29 AM

Per-substrate retention policy configuration for KB/MC backup mechanisms

Closedenhancementaiarchitectureneeds-re-triage
neo-opus-4-7
neo-opus-4-7 commented on May 19, 2026, 4:44 PM

Context

Sub of Phase 4 Epic #11628 (Cloud-Native KB Operations + Observability, meta-Epic #11624). Filed 2026-05-19 post-operator-V-B-A on backup substrate symmetry framing (see #11628 "KB-as-Cache vs MC-as-Store" section).

Position in Phase 4 sub-tree: foundation/config substrate that the daemons consume. Phase 4B reconciliation (#11640), Phase 4C GC (#11641), Phase 4D alerting (#11642) all interact with retention thresholds. This sub formalizes the per-substrate configuration shape they read from.

The Problem

Today's backup retention is UNIFORM across substrates:

  • buildScripts/ai/backup.mjs top-level: keep last 3 bundles unconditionally; delete bundles older than 30 days (default K=3, N=30)
  • buildScripts/ai/defragChromaDB.cleanOldBackups: keep last 3, delete others older than 7 days

But per the Phase 4 Epic #11628 "KB-as-Cache vs MC-as-Store" framing:

KB MC
Structural role Cache+index over external sources Primary store
Backup role Cost-optimization: avoid re-embed; reduce wall-clock recovery cost vs orchestrating N tenant re-syncs Data-loss prevention: amnesia-window minimization
Wipe consequence Recoverable via npm run ai:sync-kb (Neo content) + tenant re-pushes (Phase 2 cross-tenant content) Permanent loss of conversations + agent-thoughts in last-backup-to-wipe window

Symmetric retention treats data-loss-prevention and cost-optimization as equivalent. They aren't. MC retention should be DAILY at HIGHER counts (amnesia minimization is mission-critical). KB JSONL bundle retention can be LIGHTER (weekly; the savings vs re-sync wall-clock are the bounded prize). Defrag pre-nuke snapshots remain symmetric (mid-flight safety net is substrate-agnostic).

The Architectural Reality

This sub touches:

File Change
buildScripts/ai/backup.mjs Top-level retention config becomes per-substrate-aware. Read aiConfig.{knowledgeBase,memoryCore}.backupRetention.jsonlBundle.{keepLast,maxAgeDays,cadenceHint}
buildScripts/ai/defragChromaDB.mjs cleanOldBackups Same per-substrate-aware config read for defrag pre-nuke snapshots
ai/mcp/server/knowledge-base/config.template.mjs Add backupRetention block with KB-tuned defaults
ai/mcp/server/memory-core/config.template.mjs Add backupRetention block with MC-tuned defaults
test/playwright/unit/ai/buildScripts/backup.spec.mjs Extend with per-substrate retention case coverage
test/playwright/unit/ai/buildScripts/restore-filters.spec.mjs Verify retention-driven cleanup doesn't break restore
learn/agentos/cloud-deployment/Security.md (Phase 3 #11627) Document the per-substrate retention defaults + tunable + rationale

The Fix

1. Per-substrate retention config schema

// aiConfig.knowledgeBase.backupRetention (in KB config template)
backupRetention: {
    jsonlBundle: {
        keepLast      : 2,        // weekly cadence; lighter than MC
        maxAgeDays    : 14,       // shorter than MC default
        cadenceHint   : 'weekly'  // documentation-only; backup.mjs doesn't schedule
    },
    defragSnapshot: {
        keepLast      : 3,        // mid-flight safety; unchanged from current
        maxAgeDays    : 7
    }
}

// aiConfig.memoryCore.backupRetention (in MC config template) — heavier defaults
backupRetention: {
    jsonlBundle: {
        keepLast      : 7,        // amnesia minimization
        maxAgeDays    : 30,       // current default; status quo
        cadenceHint   : 'daily'
    },
    defragSnapshot: {
        keepLast      : 3,
        maxAgeDays    : 7
    }
}

2. backup.mjs per-substrate retention enforcement

Retention sweep currently runs at the bundle-level (one timestamp = one decision). Refactor to consult per-substrate config when deciding which subdirs within a bundle survive — OR keep bundle-level retention but pick the MAX of any substrate's retention as the bundle-level threshold (simpler; small storage cost).

Lean: bundle-level retention = MAX(KB.maxAgeDays, MC.maxAgeDays) + MAX(keepLast). Bundles are atomic units; partial-keep is operationally weird. The asymmetry is in the cadence (when bundles are produced) more than the retention (how long bundles stick around). Worth peer pressure during implementation.

3. Documentation

  • Inline JSDoc on the new config blocks
  • Phase 3 learn/agentos/cloud-deployment/Security.md cross-reference (per #11627 ACs)
  • Migration path: existing deployments get backward-compatible defaults that match current behavior; new deployments get the per-substrate-tuned defaults

Acceptance Criteria

  • aiConfig.knowledgeBase.backupRetention config shape defined + documented inline
  • aiConfig.memoryCore.backupRetention config shape defined + documented inline
  • backup.mjs retention sweep reads per-substrate config (bundle-level decision = MAX or per-subdir, decided during implementation)
  • defragChromaDB.cleanOldBackups reads per-substrate config
  • Backward-compat: deployments without explicit backupRetention config inherit current defaults (status quo)
  • Unit tests: per-substrate retention enforcement + backward-compat fallback
  • Security.md (Phase 3 #11627) cross-references the per-substrate retention defaults + rationale + tunable

Out of Scope

  • Cadence scheduling (when daemons run backups) — config only documents the cadence hint; actual daily/weekly scheduling is per-deployment cron/daemon orchestration
  • Per-tenant retention overrides (Phase 4C #11641 GC daemon scope; this sub is per-SUBSTRATE not per-TENANT)
  • Restore-side retention enforcement (restore is operator-triggered; not policy-driven cleanup)
  • Bundle-meta.json schema changes (existing topology compatibility check unchanged)

Avoided Traps

Trap Why rejected
Symmetric retention (status quo) Treats KB cost-optimization and MC data-loss-prevention as equivalent; under-protects MC OR over-stores KB. Empirically wrong per #11628 KB-as-Cache vs MC-as-Store framing.
Per-subdir bundle partial-keep Bundles are atomic units; partial-keep makes restore.mjs semantics complex. Bundle-level MAX is simpler.
Per-tenant retention in this sub Different concern (Phase 4C #11641 scope); cross-cutting if folded in here.
Removing defrag pre-nuke retention asymmetry Defrag pre-nuke is mid-flight safety; substrate-agnostic. Symmetry there is correct.

Related

  • Parent: #11628 Phase 4 Epic (KB-as-Cache vs MC-as-Store framing section is the rationale source)
  • Blocked-by (config consumers): #11640 Phase 4B reconciliation (consumes retention for tombstone-grace), #11641 Phase 4C GC (consumes retention for orphan cleanup), #11642 Phase 4D alerting (consumes retention for threshold breach severity)
  • Substrate to modify: buildScripts/ai/{backup,defragChromaDB}.mjs; KB + MC config templates
  • Cross-reference for documentation: #11627 Phase 3 Security.md (KB-as-Cache vs MC-as-Store doc lands here)
  • Substrate precedent: #10129 atomic-bundle architecture; #11141 graph preserve-live; #11144 Chroma preserve-live parity

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

  • query_raw_memories({query: 'KB-as-cache vs MC-as-store backup retention per-substrate'})
  • Operator V-B-A framing (2026-05-19): "KB is still always recoverable. even external repo additions => worst case is a full re-sync from them too." + "daily daemon based backups already ensure that there are no big gaps" — the substrate-correct framing this ticket implements
  • Memory anchor: feedback_substrate_audit_consumer_sweep.md Category 4 (logical-extension scope-drift)
  • Existing retention substrate: backup.mjs §retention policy JSDoc + defragChromaDB.cleanOldBackups