aiConfig.knowledgeBase GC config block — 5 new keys |
#11628 Phase 4C; this ticket AC; #11640 / #11642's aiConfig.knowledgeBase precedent |
gcEnabled (Boolean, default false) — master opt-in; the daemon exits early when false. gcIntervalMs (Number, default 86400000 = 24h) — poll-tick interval. gcRetention (Object, default {}) — {maxAgeMs?, maxCount?} retention policy. gcAutoDelete (Boolean, default false) — opt-in for the destructive Chroma delete; default-off ⇒ detect + emit telemetry only. gcDefragThreshold (Number, default 0.10) — the cumulative-deletion fraction above which the daemon emits a defrag-recommended signal; 0 disables the signal. |
A stale gitignored config.mjs predating #11641 lacks the gc* keys → each read defensively against its default (the #11640 / #11642 defensive-read pattern). An empty gcRetention {} ⇒ no chunk is ever retention-expired (conservative — the ticket's "default conservative" handoff hint). |
Yes — ai/config.template.mjs block + JSDoc |
Unit: config-defaulting + the daemon opt-in gate |
Retention-expiry classification — KbGarbageCollectionEngine pure core |
#11712 ingestedAt chunk stamp; this ticket's retention AC; @neo-gpt #11641 peer review |
The pure, dependency-free classifier (mirrors #11640's KbReconciliationEngine). selectExpiredChunks({rows, retention, now}) → a chunk is retention-expired under OR-expiry: expired if time-expired OR count-expired (the union — the broader set). Time-expiry: typeof metadata.ingestedAt === 'number' && now − ingestedAt > maxAgeMs. Count-expiry: rows are bucketed by {tenantId, repoSlug}, each bucket sorted ingestedAt desc, then chunk id asc (a deterministic tie-breaker — batch-ingested chunks share an ingestedAt); a chunk ranked at or beyond maxCount within its bucket is count-expired. Returns {expiredIds, expiredCount, evaluatedCount}. No I/O, no clock — the caller passes now. |
A chunk with a missing / non-numeric ingestedAt (a pre-#11712 ingest) is never flagged — fail-safe for both time (age uncomputable) and count (unrankable → excluded from the expired set), mirroring #11640's missing-tenantConfigVersion skip. An empty / absent retention policy → empty result. |
Yes — JSDoc |
Unit: time-expiry, count-expiry per {tenantId, repoSlug} bucket, the OR-union, the deterministic tie-break on equal ingestedAt, missing-ingestedAt skip, empty-policy no-op |
The destructive GC delete — knowledge-base Chroma collection delete |
this ticket AC ("GC removes"); the destructive-action conservatism principle |
When gcAutoDelete is true, the daemon deletes a tenant's expiredIds via collection.delete({ids}). Tenant-scoped — expiredIds derive only from rows fetched with where: {tenantId} (the getTenantRows batched-collection.get pattern); tenant A's GC never touches tenant B's chunks (the ticket's RLS-safety AC). |
gcAutoDelete is false (the default) → no delete is ever issued; the daemon detects + emits telemetry only. A collection.delete throw → logger.error, best-effort; the daemon continues to the next tenant. |
Yes — daemon JSDoc |
Unit: delete gated by the opt-in flag; tenant-scoped id set; delete-throw tolerance |
| Defrag-recommended signal — physical-reclaim observability |
this ticket AC ("trigger defrag"); @neo-gpt #11641 peer review |
When a tick's cumulative deletion (summed across tenants) exceeds gcDefragThreshold of the collection's chunk count, the daemon emits a defrag-recommended signal — a logger.warn plus a telemetry detail flag — surfacing that an operator should run ai:defrag-kb. V1 does not spawn ai:defrag-kb — see Row 6. |
gcDefragThreshold is 0 → no signal. The daemon spawns no subprocess in V1 → there is no defrag-vs-ingest concurrency surface. |
Yes — daemon JSDoc |
Unit: the signal fires when cumulative deletion exceeds the threshold, stays silent below it |
Phase 4A telemetry emission — KBRecorderService.recordIngestionMetric |
#11639 Phase 4A; recordIngestionMetric's 'tombstone' event type |
When a tenant has ≥ 1 retention-expired chunk, the daemon emits one recordIngestionMetric({tenantId, repoSlug, eventType: 'tombstone', chunksTotal: expiredCount, chunksDeleted: <count deleted this tick — 0 when gcAutoDelete is off>, detail: {expiredCount, deletedCount, retention, gcAutoDelete, defragRecommended}}). A clean tenant (zero expired) emits nothing. |
recordIngestionMetric is best-effort (never throws into the caller). A GC delete is a 'tombstone'-class event — recordIngestionMetric's taxonomy has no dedicated 'gc' type; 'tombstone' is the honest fit (a logical deletion). |
Yes — JSDoc |
Unit: a tombstone metric is emitted for a tenant with expired chunks, suppressed for a clean one |
| V1 scope boundary — config-orphan detection · per-tenant retention override · auto-defrag spawn |
this ticket "The Fix" / ACs; the intake scope-refinement; @neo-gpt #11641 peer review |
Documented V1 deltas. (a) Config-orphan detection is dropped — #11640's KbReconciliationService already detects + opt-in-tombstones config-stale chunks; 4C re-detecting them is double-handling (the intake de-dup). (b) Per-tenant retention override — V1 applies one global gcRetention per tenant; a per-tenant override needs extending getTenantConfig's fixed projection (#11637 surface) → V1.x. (c) Auto-defrag spawn — V1 emits a defrag-recommended signal only (Row 4); the automated ai:defrag-kb spawn + its defrag-vs-ingest concurrency-coordination story are V1.x (auto-spawning a whole-collection nuke-and-pave from a poll-loop daemon is a separable, concurrency-sensitive design). |
V1.x is a separate follow-up ticket; the PR body "Deltas" documents each. |
Yes — PR body "Deltas" + a V1.x follow-up |
N/A — explicit scope boundary |
Context
Sub of Phase 4 Epic #11628 (meta-Epic #11624).
Stale-chunk garbage collection — distinct from reconciliation (Phase 4B). Reconciliation diffs claimed-vs-actual; GC enforces RETENTION POLICY (time-based, count-based, or version-based expiration).
The Problem
Without active GC:
ai:defrag-kbprecedent: 10s wall / 321→189MB / 41% reduction on ~10k chunks) — but defrag is operator-triggered, not automatic per-tenantThe Fix
New daemon:
ai/scripts/kb-gc-daemon.mjs(sibling to existing daemons).Per scheduled tick (configurable; default daily):
aiConfig.knowledgeBase.tenantRetentionpolicy)VectorService.delete({ids}))ai:defrag-kbif cumulative deletion > threshold (10% collection size)Acceptance Criteria
ai/scripts/kb-gc-daemon.mjsexists; follows existing daemon patternaiConfig.knowledgeBase.gcIntervalMs; default 86400000 = 24h)Out of Scope
ai:defrag-kbalready exists; this daemon TRIGGERS it conditionally)Contract Ledger
aiConfig.knowledgeBaseGC config block — 5 new keysaiConfig.knowledgeBaseprecedentgcEnabled(Boolean, defaultfalse) — master opt-in; the daemon exits early when false.gcIntervalMs(Number, default86400000= 24h) — poll-tick interval.gcRetention(Object, default{}) —{maxAgeMs?, maxCount?}retention policy.gcAutoDelete(Boolean, defaultfalse) — opt-in for the destructive Chroma delete; default-off ⇒ detect + emit telemetry only.gcDefragThreshold(Number, default0.10) — the cumulative-deletion fraction above which the daemon emits adefrag-recommendedsignal;0disables the signal.config.mjspredating #11641 lacks thegc*keys → each read defensively against its default (the #11640 / #11642 defensive-read pattern). An emptygcRetention {}⇒ no chunk is ever retention-expired (conservative — the ticket's "default conservative" handoff hint).ai/config.template.mjsblock + JSDocKbGarbageCollectionEnginepure coreingestedAtchunk stamp; this ticket's retention AC; @neo-gpt #11641 peer reviewKbReconciliationEngine).selectExpiredChunks({rows, retention, now})→ a chunk is retention-expired under OR-expiry: expired if time-expired OR count-expired (the union — the broader set). Time-expiry:typeof metadata.ingestedAt === 'number' && now − ingestedAt > maxAgeMs. Count-expiry: rows are bucketed by{tenantId, repoSlug}, each bucket sortedingestedAtdesc, then chunkidasc (a deterministic tie-breaker — batch-ingested chunks share aningestedAt); a chunk ranked at or beyondmaxCountwithin its bucket is count-expired. Returns{expiredIds, expiredCount, evaluatedCount}. No I/O, no clock — the caller passesnow.ingestedAt(a pre-#11712 ingest) is never flagged — fail-safe for both time (age uncomputable) and count (unrankable → excluded from the expired set), mirroring #11640's missing-tenantConfigVersionskip. An empty / absentretentionpolicy → empty result.{tenantId, repoSlug}bucket, the OR-union, the deterministic tie-break on equalingestedAt, missing-ingestedAtskip, empty-policy no-opknowledge-baseChroma collection deletegcAutoDeleteistrue, the daemon deletes a tenant'sexpiredIdsviacollection.delete({ids}). Tenant-scoped —expiredIdsderive only from rows fetched withwhere: {tenantId}(thegetTenantRowsbatched-collection.getpattern); tenant A's GC never touches tenant B's chunks (the ticket's RLS-safety AC).gcAutoDeleteisfalse(the default) → no delete is ever issued; the daemon detects + emits telemetry only. Acollection.deletethrow →logger.error, best-effort; the daemon continues to the next tenant.gcDefragThresholdof the collection's chunk count, the daemon emits adefrag-recommendedsignal — alogger.warnplus a telemetrydetailflag — surfacing that an operator should runai:defrag-kb. V1 does not spawnai:defrag-kb— see Row 6.gcDefragThresholdis0→ no signal. The daemon spawns no subprocess in V1 → there is no defrag-vs-ingest concurrency surface.KBRecorderService.recordIngestionMetricrecordIngestionMetric's'tombstone'event typerecordIngestionMetric({tenantId, repoSlug, eventType: 'tombstone', chunksTotal: expiredCount, chunksDeleted: <count deleted this tick — 0 when gcAutoDelete is off>, detail: {expiredCount, deletedCount, retention, gcAutoDelete, defragRecommended}}). A clean tenant (zero expired) emits nothing.recordIngestionMetricis best-effort (never throws into the caller). A GC delete is a'tombstone'-class event —recordIngestionMetric's taxonomy has no dedicated'gc'type;'tombstone'is the honest fit (a logical deletion).tombstonemetric is emitted for a tenant with expired chunks, suppressed for a clean oneKbReconciliationServicealready detects + opt-in-tombstones config-stale chunks; 4C re-detecting them is double-handling (the intake de-dup). (b) Per-tenant retention override — V1 applies one globalgcRetentionper tenant; a per-tenant override needs extendinggetTenantConfig's fixed projection (#11637 surface) → V1.x. (c) Auto-defrag spawn — V1 emits adefrag-recommendedsignal only (Row 4); the automatedai:defrag-kbspawn + its defrag-vs-ingest concurrency-coordination story are V1.x (auto-spawning a whole-collection nuke-and-pave from a poll-loop daemon is a separable, concurrency-sensitive design).Prior Art / Defrag-Backup Substrate Cross-References
Substrate-correct V-B-A calibration 2026-05-19: per #10129 Phase 3 peer architecture,
defragChromaDB.mjsandbackup.mjsare peer scripts with orthogonal responsibilities, NOT delegates. Phase 4C extends/triggers existing defrag substrate:buildScripts/ai/defragChromaDB.mjs— 5-step "Nuke and Pave": (1) Pre-Nuke Snapshot viafs.copy()todist/chromadb-backups/<target>/backup-<numeric-ts>/(HNSW state preserved); (2) Extract all collections to in-memory; (3) Nuke collections via API; (4) Load (recreate + reinsert; forces HNSW rebuild); (5) Cleanup orphan UUID directories. Existing retention: keep last 3, delete others older than 7 days.buildScripts/ai/backup.mjs— JSONL bundle peer. Operators chainai:defrag-kb && ai:backupfor compacted bundles. Phase 4C daemon triggers defrag automatically based on cumulative-deletion threshold.where: {tenantId}filter onQueryServiceensures cross-tenant safety; the GC daemon enumerates orphans WITHIN a tenant's scope, never cross-tenant.test/playwright/unit/ai/buildScripts/backup.spec.mjs(defrag-trigger logic; retention pattern); KBDatabaseService.backup.spec.mjs(export/import lifecycle).Related
npm run ai:defrag-kb(memory anchor: 10s wall / 321→189MB / 41% reduction on ~10k chunks); peer scriptbuildScripts/ai/defragChromaDB.mjsOrigin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
ai:defrag-kbscript is the existing defrag substrate to integrate with