Context
Phase 4B (#11640, PR #11710) shipped the KB reconciliation daemon's V1: config-invalidation reconciliation — it detects chunks left stale by a tenant KnowledgeBaseTenantConfig change (via the tenantConfigVersion chunk stamp) and, opt-in, tombstones them. During #11640 intake a substrate sweep established that #11640's three other named failure modes cannot be detected from merged Phase 2 substrate — they were V1.x-deferred in the #11640 Contract Ledger (Row 6). This ticket is that V1.x increment.
The Problem
#11640 named four KB-drift failure modes. V1 (PR #11710) delivered failure-mode #3 (config-invalidation). The other three remain:
- Force-push / history rewrite — a tenant rewrites branch history; per-push revision-boundary signaling cannot express it.
- Mid-push hook failure — a client
pre-push hook fails partway; some files pushed, some deletes lost.
- Partial-push (network error) — a push half-completes; tenant Chroma state diverges from the repo.
All three are arbitrary drift: Chroma holds chunks for paths the tenant's repo no longer has, with no per-push signal to catch it. A periodic daemon cannot detect arbitrary drift without knowing the tenant's actual current path set. Phase 2 substrate provides no such knowledge: KnowledgeBaseIngestionService.getTenantConfig() returns a fixed 8-field projection with no path manifest; baseRevision / headRevision are per-push ingestSourceFiles payload params, never persisted; manifestSnapshot.pathsAfterPush is consumed at push-time (applyDeletionSignals) and discarded.
The Architectural Reality
ai/daemons/KbReconciliationService.mjs (#11640) — the poll-loop daemon. pulse() → per-tenant reconcileTenant(). V1 fetchTenantRows already fetches a tenant's full Chroma row set (where: {tenantId}); the V1 engine diffTenantChunks classifies config-staleness. This ticket adds a second diff axis: claimed-paths vs. Chroma-paths.
ai/services/knowledge-base/KnowledgeBaseIngestionService.mjs — ingestSourceFiles already receives manifestSnapshot.pathsAfterPush and baseRevision / headRevision; applyDeletionSignals uses them transiently. The missing piece is persistence.
KnowledgeBaseTenantConfig graph node (#11637) — the natural home for a persisted per-tenant claimed-state manifest, or a sibling node.
The Fix
Recommended: persist the post-push claimed-state manifest. Extend the ingestion path so each successful push records the tenant's claimed path set (the manifestSnapshot.pathsAfterPush already received) into durable storage — a kb-manifest:<tenantId> graph node, or a field on KnowledgeBaseTenantConfig. The reconciliation daemon then gains a manifest diff pass: a Chroma chunk whose sourcePath is absent from the persisted claimed manifest is a manifest orphan — the same actionability / opt-in-tombstone treatment as a config-stale orphan. Force-push is then a special case (the manifest simply no longer lists the rewritten-away paths) — no separate force-push primitive needed.
Concretely:
KnowledgeBaseIngestionService — persist pathsAfterPush per (tenant, repoSlug) on each successful push.
KbReconciliationEngine — add diffTenantManifest({rows, claimedPaths}) (pure, mirrors diffTenantChunks).
KbReconciliationService.reconcileTenant — run both diff passes; union the orphan sets.
Acceptance Criteria
Out of Scope
Two small independent enhancements also surfaced by #11640 (PR #11710 Follow-ups), each trackable as its own narrow ticket when picked up:
- A per-tenant
orphanVersionGap override (extends getTenantConfig's fixed projection — a #11637-surface change).
- A
chunks_total rollup metric in getTenantIngestionRollup + chunksTotal in KbAlertRuleEngine.KNOWN_METRICS — enables drift-volume threshold alerting (V1 supports drift-presence/frequency alerting via reconcileEvents).
Per-chunk garbage-collection scheduling → Phase 4C (#11641).
Avoided Traps
- Daemon-side repo walk (the daemon clones / fetches each tenant's repo to compute the actual path set) — rejected: a cloud KB server generally has no repo access or credentials for arbitrary tenant repos, and a periodic full-clone is heavy. The push path already receives the manifest; persisting what we already have is far cheaper and credential-free.
- A dedicated force-push primitive (detecting history rewrite via revision-graph comparison) — rejected: force-push is subsumed by the manifest diff (a rewritten-away path simply drops out of
pathsAfterPush). One mechanism covers all three remaining failure modes.
Contract Ledger
Provisional — to be finalized at intake once the manifest-storage shape is chosen. The recommended design's surfaces:
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
Persisted claimed-state manifest — kb-manifest:<tenantId> graph node (or a KnowledgeBaseTenantConfig field) |
this ticket; #11637 Phase 2E node precedent |
Each successful ingestSourceFiles push records the tenant's post-push claimed path set per repoSlug, durably. |
A push with no manifestSnapshot → the prior manifest is retained (a push that does not declare its path set does not invalidate the last known one). |
Yes — JSDoc + learn/agentos/cloud-deployment/ |
Unit: manifest write + read-back |
KbReconciliationEngine.diffTenantManifest |
this ticket; the V1 diffTenantChunks precedent |
Pure: a Chroma row whose metadata.sourcePath is absent from the tenant's claimed manifest is a manifest orphan. |
No persisted manifest for a tenant → no manifest-orphan classification (fail-safe: never auto-action without a claimed baseline). |
Yes — JSDoc |
Unit: manifest diff |
KbReconciliationService manifest reconciliation |
this ticket; #11640 |
Each pulse runs the manifest diff alongside the config-staleness diff; manifest orphans honor the reconciliationAutoTombstone opt-in + tenant-scoped delete. |
Same opt-in / fail-safe posture as V1. |
Yes — daemon JSDoc |
Unit + the AC8 force-push integration test |
Related
- #11640 — Phase 4B reconciliation daemon (V1; this ticket is its V1.x increment). PR #11710.
- #11628 — Phase 4 epic (parent).
- #11637 — Phase 2E
KnowledgeBaseTenantConfig (the persisted-config-node precedent).
- #11633 — Phase 2
KnowledgeBaseIngestionService (the ingestion path that will persist the manifest).
Origin Session ID
470c38e7-1ffc-4851-867d-d30c1b6fbdb2
Handoff Retrieval Hints
- The #11640 Contract Ledger (in the #11640 ticket body, Row 6) is the V1/V1.x scope boundary this ticket implements.
query_raw_memories: "KB reconciliation V1.x force-push manifest persistence"
Context
Phase 4B (#11640, PR #11710) shipped the KB reconciliation daemon's V1: config-invalidation reconciliation — it detects chunks left stale by a tenant
KnowledgeBaseTenantConfigchange (via thetenantConfigVersionchunk stamp) and, opt-in, tombstones them. During #11640 intake a substrate sweep established that #11640's three other named failure modes cannot be detected from merged Phase 2 substrate — they were V1.x-deferred in the #11640 Contract Ledger (Row 6). This ticket is that V1.x increment.The Problem
#11640 named four KB-drift failure modes. V1 (PR #11710) delivered failure-mode #3 (config-invalidation). The other three remain:
pre-pushhook fails partway; some files pushed, some deletes lost.All three are arbitrary drift: Chroma holds chunks for paths the tenant's repo no longer has, with no per-push signal to catch it. A periodic daemon cannot detect arbitrary drift without knowing the tenant's actual current path set. Phase 2 substrate provides no such knowledge:
KnowledgeBaseIngestionService.getTenantConfig()returns a fixed 8-field projection with no path manifest;baseRevision/headRevisionare per-pushingestSourceFilespayload params, never persisted;manifestSnapshot.pathsAfterPushis consumed at push-time (applyDeletionSignals) and discarded.The Architectural Reality
ai/daemons/KbReconciliationService.mjs(#11640) — the poll-loop daemon.pulse()→ per-tenantreconcileTenant(). V1fetchTenantRowsalready fetches a tenant's full Chroma row set (where: {tenantId}); the V1 enginediffTenantChunksclassifies config-staleness. This ticket adds a second diff axis: claimed-paths vs. Chroma-paths.ai/services/knowledge-base/KnowledgeBaseIngestionService.mjs—ingestSourceFilesalready receivesmanifestSnapshot.pathsAfterPushandbaseRevision/headRevision;applyDeletionSignalsuses them transiently. The missing piece is persistence.KnowledgeBaseTenantConfiggraph node (#11637) — the natural home for a persisted per-tenant claimed-state manifest, or a sibling node.The Fix
Recommended: persist the post-push claimed-state manifest. Extend the ingestion path so each successful push records the tenant's claimed path set (the
manifestSnapshot.pathsAfterPushalready received) into durable storage — akb-manifest:<tenantId>graph node, or a field onKnowledgeBaseTenantConfig. The reconciliation daemon then gains a manifest diff pass: a Chroma chunk whosesourcePathis absent from the persisted claimed manifest is a manifest orphan — the same actionability / opt-in-tombstone treatment as a config-stale orphan. Force-push is then a special case (the manifest simply no longer lists the rewritten-away paths) — no separate force-push primitive needed.Concretely:
KnowledgeBaseIngestionService— persistpathsAfterPushper(tenant, repoSlug)on each successful push.KbReconciliationEngine— adddiffTenantManifest({rows, claimedPaths})(pure, mirrorsdiffTenantChunks).KbReconciliationService.reconcileTenant— run both diff passes; union the orphan sets.Acceptance Criteria
(tenant, repoSlug), durable across restarts.KbReconciliationEnginegains a purediffTenantManifestpass (ChromasourcePath∉ claimed manifest → manifest orphan).KbReconciliationServiceruns both the config-staleness and manifest diff passes per tenant; orphan sets are unioned.reconciliationAutoTombstoneopt-in + tenant-scoped delete as V1.Out of Scope
Two small independent enhancements also surfaced by #11640 (PR #11710 Follow-ups), each trackable as its own narrow ticket when picked up:
orphanVersionGapoverride (extendsgetTenantConfig's fixed projection — a #11637-surface change).chunks_totalrollup metric ingetTenantIngestionRollup+chunksTotalinKbAlertRuleEngine.KNOWN_METRICS— enables drift-volume threshold alerting (V1 supports drift-presence/frequency alerting viareconcileEvents).Per-chunk garbage-collection scheduling → Phase 4C (#11641).
Avoided Traps
pathsAfterPush). One mechanism covers all three remaining failure modes.Contract Ledger
Provisional — to be finalized at intake once the manifest-storage shape is chosen. The recommended design's surfaces:
kb-manifest:<tenantId>graph node (or aKnowledgeBaseTenantConfigfield)ingestSourceFilespush records the tenant's post-push claimed path set perrepoSlug, durably.manifestSnapshot→ the prior manifest is retained (a push that does not declare its path set does not invalidate the last known one).learn/agentos/cloud-deployment/KbReconciliationEngine.diffTenantManifestdiffTenantChunksprecedentmetadata.sourcePathis absent from the tenant's claimed manifest is a manifest orphan.KbReconciliationServicemanifest reconciliationreconciliationAutoTombstoneopt-in + tenant-scoped delete.Related
KnowledgeBaseTenantConfig(the persisted-config-node precedent).KnowledgeBaseIngestionService(the ingestion path that will persist the manifest).Origin Session ID
470c38e7-1ffc-4851-867d-d30c1b6fbdb2Handoff Retrieval Hints
query_raw_memories: "KB reconciliation V1.x force-push manifest persistence"