LearnNewsExamplesServices
Frontmatter
id11663
titleKB Ingestion Phase 4: Configurable bundle retention policy via aiConfig.backupRetention
stateClosed
labels
enhancementai
assigneesneo-opus-4-7
createdAtMay 20, 2026, 3:13 AM
updatedAtMay 20, 2026, 8:00 AM
githubUrlhttps://github.com/neomjs/neo/issues/11663
authorneo-opus-4-7
commentsCount0
parentIssue11628
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 20, 2026, 8:00 AM

KB Ingestion Phase 4: Configurable bundle retention policy via aiConfig.backupRetention

Closedenhancementai
neo-opus-4-7
neo-opus-4-7 commented on May 20, 2026, 3:13 AM

Context

Phase 4 sub-ticket of #11628 (Operations + Observability for Cloud-Native KB Deployments) — addresses the per-substrate retention-policy framing flagged in #11628's body as "follow-up ticket scope (deferred to Phase 4 implementation; let implementer-hot-context shape)." That implementer-hot-context shaping happens here.

Surfaced 2026-05-20 during nightshift-mode operator delegation: operator directive to keep parallel #11624 lanes moving while @neo-gpt works on Phase 0/1C-α #11631 (write-side tenant stamping). This ticket is the cleanest pre-Phase-2 Phase-4-actionable slice — touches buildScripts/ai/backup.mjs + config templates, zero overlap with GPT's VectorService.mjs lane.

The Problem

buildScripts/ai/backup.mjs:337-401 cleanOldBackups() has hardcoded retention constants:

const K = 3;          // keep newest 3 bundles unconditionally
const N_DAYS = 30;    // delete older bundles only if > 30 days

These are atomic-bundle-level constants — they apply uniformly across KB + MC + graph + concepts + trajectories + mailbox. For zero-config Neo deployments this is fine, but:

  1. Cloud-native deployments with tighter disk budgets need lower K / N_DAYS to bound bundle accumulation.
  2. High-tempo deployments doing hourly backups (vs the default daily cadence) need higher K so 24h of bundles aren't churn-deleted.
  3. Recovery-critical deployments (e.g., the 2026-05-17 MC wipe recovery anchor) need higher K and/or N_DAYS so a sufficient bundle history is retained for post-incident forensics.

Hardcoded constants force operator forks of backup.mjs for any deviation. Config-driven retention closes this gap cleanly without breaking the atomic-bundle architecture.

The Architectural Reality

Atomic-bundle architecture preserved. Per #10129 Phase 3 (peer-script-not-delegate), each backup is a single timestamped directory containing kb/ + mc/ + graph/ + concepts/ + trajectories/ + mailbox/. The bundle is the atomic unit; retention sweeps the bundle list, not per-substrate subdirs.

Per-substrate retention asymmetry framing from #11628 deferred to a follow-up. #11628 body discusses KB-as-cache vs MC-as-store and argues KB could tolerate lighter retention (weekly cadence; backup is cost-optimization). But the atomic-bundle shape doesn't support per-substrate retention without either:

  • Splitting backups into per-substrate bundles (breaks #10129 atomicity)
  • Selective per-substrate-subdir deletion inside older bundles (complex; partial-state risk during operator recovery)

Both are substantive architectural changes. This ticket scopes narrowly to bundle-level retention parameterization — uniform K + N_DAYS across the bundle, but configurable. The per-substrate asymmetry can be a follow-up once Phase 2 ingestion ships and the operational shape is clearer.

The Fix

1. Config additions

Add backupRetention config to BOTH KB + MC config templates (the backup script bundles both, so either config could carry the retention policy — adding to both keeps the contract discoverable from either MCP server's config surface):

// In aiConfig.knowledgeBase + aiConfig.memoryCore
backupRetention: {
    keepMinimum: 3,   // keep newest N bundles unconditionally; deletion only applies after this floor
    maxDays    : 30   // bundles older than this many days are eligible for deletion (subject to keepMinimum)
}

The two-axis policy (count floor + age threshold) preserves the current behavior exactly when defaults are used.

2. buildScripts/ai/backup.mjs refactor

async function cleanOldBackups(backupRoot, logger, {keepMinimum = 3, maxDays = 30} = {}) {
    // existing body, but K + N_DAYS read from arguments
    const K       = keepMinimum;
    const N_DAYS  = maxDays;
    // ... rest unchanged
}

Caller resolves the config at invoke time:

import kbConfig from '../../ai/mcp/server/knowledge-base/config.mjs';
import mcConfig from '../../ai/mcp/server/memory-core/config.mjs';

// Use KB config as primary (it's the orchestrating server); MC config can override if set.
const retention = mcConfig.backupRetention ?? kbConfig.backupRetention ?? {keepMinimum: 3, maxDays: 30};
await cleanOldBackups(backupRoot, logger, retention);

3. Test coverage

New spec under test/playwright/unit/ai/buildScripts/backup-retention.spec.mjs:

  • Default config (K=3, N_DAYS=30) matches pre-#NEW-TICKET behavior exactly (byte-equivalence anchor)
  • Tighter config (K=1, N_DAYS=7) deletes more aggressively
  • Higher-cadence config (K=24, N_DAYS=2) preserves rolling-24h history
  • Missing config keys fall through to defaults (preserves zero-config deployments)
  • Tests mock the filesystem via a tmp fixture (synthetic backup-* dirs with controllable mtimes)

Acceptance Criteria

  • backupRetention: {keepMinimum, maxDays} added to ai/mcp/server/knowledge-base/config.{mjs,template.mjs} and ai/mcp/server/memory-core/config.{mjs,template.mjs} with default {keepMinimum: 3, maxDays: 30}.
  • cleanOldBackups() in buildScripts/ai/backup.mjs accepts an options object with keepMinimum + maxDays and reads them in lieu of the hardcoded constants.
  • Caller (the main backup script body) resolves the policy from mcConfig.backupRetention ?? kbConfig.backupRetention ?? <defaults>.
  • New spec covers default-equivalent / tighter / higher-cadence / missing-config scenarios.
  • Existing npm run ai:backup behavior under default config is byte-equivalent to pre-ticket behavior (manual smoke OR spec assertion).
  • Cross-link to #11628 added to the Phase 4 Epic body so future agents see this sub-ticket as the retention-policy slice.

Out of Scope

  • Per-substrate retention asymmetry (KB lighter vs MC stricter) — deferred to a follow-up once Phase 2 ingestion ships and the cloud-deployment operational shape is clearer. The architectural change required (either per-substrate bundles or selective subdir deletion) is substantial and not blocking the V1 cloud deployment story.
  • Defrag snapshot retention (dist/chromadb-backups/<target>/ per-collection snapshots from defragChromaDB.mjs) — separate substrate, not covered here.
  • Backup cadence control (daily vs hourly vs weekly) — separate config concern; this ticket only addresses how long bundles are KEPT, not how often they're CREATED.
  • Retention-policy-driven alerting ("oldest available backup is N days old") — Phase 4D alerting sub-ticket scope, not retention-policy core.

Avoided Traps

Trap Why rejected
Adding retention config to only ONE service's config (KB OR MC, not both) Discoverability: future operator reading either config should find the retention surface. Adding to both with a fallback chain is the substrate-correct shape.
Renaming K and N_DAYS to longer names directly Local-variable readability; the option names (keepMinimum, maxDays) are the public surface. Local destructure to K + N_DAYS preserves the existing 60-line cleanOldBackups internal style.
Per-substrate retention in this ticket Breaks atomic-bundle architecture or introduces partial-state risk. Defer until Phase 2 ships and the operational shape is concrete.
New retention strategy enum (LRU / FIFO / etc.) Premature abstraction. The K + N_DAYS shape is empirically sufficient for the known cloud-deployment scenarios.

Related

  • Parent Phase Epic: #11628 (Phase 4: Operations + Observability)
  • Parent meta-Epic: #11624
  • Bundle architecture origin: #10129 (atomic-bundle Phase 3)
  • Backup orchestrator: buildScripts/ai/backup.mjs (the file modified)
  • Sibling parallel lane (@neo-gpt): #11631 Phase 0/1C-α write-side tenant stamping — different files (ai/services/knowledge-base/VectorService.mjs), zero merge collision risk

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

  • query_raw_memories({query: 'backup retention policy keepMinimum maxDays cleanOldBackups Phase 4'})
  • ask_knowledge_base({query: 'buildScripts ai backup retention K N_DAYS', type: 'src'})
  • Empirical anchor: pre-ticket hardcoded constants K=3, N_DAYS=30 at buildScripts/ai/backup.mjs:365-366
  • 2026-05-17 MC wipe recovery anchor: K=3 + N_DAYS=30 retained the 5/16 bundle that became the recovery source — this ticket preserves that protective floor as the default while allowing operator tuning
tobiu referenced in commit 3c47411 - "feat(ai): configurable bundle retention via aiConfig.backupRetention (#11663) (#11664) on May 20, 2026, 8:00 AM
tobiu closed this issue on May 20, 2026, 8:00 AM