Context
Sub of Phase 2 Epic #11626 (meta-Epic #11624). Graduated from Discussion #11623.
Substrate floor for Phase 2 facades. Both MCP facade (Phase 2B) + bulk facade (Phase 2C) consume this service.
The Problem
After Phase 0/1 contracts land, the substrate has stable schemas + registry + memorySharing port, but no actual ingestion entrypoint. Clients can't push code yet. Phase 2 needs a service-layer orchestrator BEFORE facades wire to it.
The Fix
New service: ai/services/knowledge-base/KnowledgeBaseIngestionService.mjs (Neo.core.Base extension; singleton).
Responsibilities:
- Validate tenant via AgentIdentity context (#9999 substrate)
- Apply tenant source/parser config from Phase 0/1B registry
- Server-side parsing for raw-file-delta payloads (when tenant's source uses server-shipped parser)
parsed-chunk-v1 validation for client-side-parsed payloads (rejects records with embedding field outside explicit restore mode)
- Server-stamp
{tenantId, visibility, originAgentIdentity?} per Phase 0/1C
- Apply tombstone/manifest/revision-boundary deletion-signaling per Phase 0/1A spec
- Route to
VectorService.embed (existing content-hash delta + server embeds + Chroma upsert)
- Return structured ingestion summary
{ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs}
Contract Ledger Matrix
KnowledgeBaseIngestionService introduces a service surface consumed by Phase 2B (MCP facade) and Phase 2C (bulk facade). Per learn/agentos/contract-ledger.md:
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
KnowledgeBaseIngestionService.ingestSourceFiles({tenantId, files, deleted?, manifestSnapshot?, baseRevision?, headRevision?}) |
Discussion #11623 §7 Phase 2; Phase 0/1 contracts — parsed-chunk-v1 schema (#11629), source/parser registry (#11658), write-side stamping (#11631) |
Orchestrates one ingestion: tenant validation via AgentIdentity → tenant source/parser config from the registry → server-parse raw-file deltas OR validate parsed-chunk-v1 for client-parsed payloads → server-stamp {tenantId, visibility, originAgentIdentity?} → apply tombstone/manifest/revision-boundary deletion-signaling → route to VectorService.embed. |
No AgentIdentity context (single-tenant / offline daemon) → defaults to the neo-shared tenant, consistent with the Phase 0/1 single-tenant fallthrough. |
Service JSDoc (Anchor & Echo); Phase 3 learn/agentos/cloud-deployment/HookWiring.md (#11627) documents the call contract for hook authors. |
Per-AC happy-path + error-path unit tests; end-to-end integration test against a mock tenant fixture. |
ingestSourceFiles return value — {ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs} |
#11633 "The Fix" §8 |
Returns a structured ingestion summary — counts + a structured errors array + tenantId + durationMs. Partial failures populate errors[]; the summary is never lost to a thrown exception. |
A fully-failed ingestion still returns the summary with errors[] populated and zero counts — callers branch on errors, not on a thrown exception. |
JSDoc on the return shape; HookWiring.md (#11627). |
Unit tests asserting the summary shape on happy + error paths. |
embedding-field rejection — parsed-chunk-v1 payloads carrying an embedding field |
Phase 0/1A parsed-chunk-v1 schema (#11629); #11633 AC |
A record carrying an embedding field is REJECTED with a structured error — never silently routed. Ingestion is for un-embedded content; pre-embedded records belong to the restore path. |
The rejection error names the restore path — manageDatabaseBackup({action: 'import'}) — so the caller is routed correctly. |
JSDoc; Phase 3 CustomParsers.md (#11627). |
Error-path unit test asserting the rejection + the structured error shape. |
Backfilled 2026-05-20 by @neo-opus-ada. #11633 was filed (origin session 7360e917) before the Contract Ledger discipline was applied to it; @neo-gpt's #11626 epic-review surfaced the gap. The ledger restates the KnowledgeBaseIngestionService consumed surface already specified in "The Fix" above — it adds no new scope.
Acceptance Criteria
Out of Scope
- MCP facade wiring → Phase 2B
- Bulk facade (CLI/HTTP) → Phase 2C
- Q12 hydration mode → Phase 2D
- Q5 tenant config storage → Phase 2E
- Synthetic external-workspace fixtures → Phase 2F
Related
- Parent: #11626
- Blocked-by: Phase 0/1 Epic #11625 completion
- Blocks: Phase 2B, 2C (facades consume this service)
- Discussion source: #11623 §1 #8 + §7 Phase 2
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
VectorService.embed (lines 188-274) is the downstream call target
KBRecorderService.mjs is sibling-service architectural pattern reference
MemoryService.mjs queryMemories method is the AgentIdentity-context propagation reference pattern
Context
Sub of Phase 2 Epic #11626 (meta-Epic #11624). Graduated from Discussion #11623.
Substrate floor for Phase 2 facades. Both MCP facade (Phase 2B) + bulk facade (Phase 2C) consume this service.
The Problem
After Phase 0/1 contracts land, the substrate has stable schemas + registry + memorySharing port, but no actual ingestion entrypoint. Clients can't push code yet. Phase 2 needs a service-layer orchestrator BEFORE facades wire to it.
The Fix
New service:
ai/services/knowledge-base/KnowledgeBaseIngestionService.mjs(Neo.core.Base extension; singleton).Responsibilities:
parsed-chunk-v1validation for client-side-parsed payloads (rejects records withembeddingfield outside explicit restore mode){tenantId, visibility, originAgentIdentity?}per Phase 0/1CVectorService.embed(existing content-hash delta + server embeds + Chroma upsert){ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs}Contract Ledger Matrix
KnowledgeBaseIngestionServiceintroduces a service surface consumed by Phase 2B (MCP facade) and Phase 2C (bulk facade). Perlearn/agentos/contract-ledger.md:KnowledgeBaseIngestionService.ingestSourceFiles({tenantId, files, deleted?, manifestSnapshot?, baseRevision?, headRevision?})parsed-chunk-v1schema (#11629), source/parser registry (#11658), write-side stamping (#11631)parsed-chunk-v1for client-parsed payloads → server-stamp{tenantId, visibility, originAgentIdentity?}→ apply tombstone/manifest/revision-boundary deletion-signaling → route toVectorService.embed.neo-sharedtenant, consistent with the Phase 0/1 single-tenant fallthrough.learn/agentos/cloud-deployment/HookWiring.md(#11627) documents the call contract for hook authors.ingestSourceFilesreturn value —{ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs}errorsarray +tenantId+durationMs. Partial failures populateerrors[]; the summary is never lost to a thrown exception.errors[]populated and zero counts — callers branch onerrors, not on a thrown exception.HookWiring.md(#11627).embedding-field rejection —parsed-chunk-v1payloads carrying anembeddingfieldparsed-chunk-v1schema (#11629); #11633 ACembeddingfield is REJECTED with a structured error — never silently routed. Ingestion is for un-embedded content; pre-embedded records belong to the restore path.manageDatabaseBackup({action: 'import'})— so the caller is routed correctly.CustomParsers.md(#11627).Acceptance Criteria
KnowledgeBaseIngestionServiceclass extendsNeo.core.Base(per Neo conventions); singletoningestSourceFiles({tenantId, files, deleted?, manifestSnapshot?, baseRevision?, headRevision?})implementedparsed-chunk-v1validation at service boundary (uses Phase 0/1A schema)embeddingfield REJECTED with structured error (NOT routed silently)VectorService.embedpreserves existing content-hash deltaOut of Scope
Related
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
VectorService.embed(lines 188-274) is the downstream call targetKBRecorderService.mjsis sibling-service architectural pattern referenceMemoryService.mjsqueryMemoriesmethod is the AgentIdentity-context propagation reference pattern