Context
Sub of Phase 2 Epic #11626 (meta-Epic #11624). Graduated from Discussion #11623.
Agent-native command-plane facade for small-batch ingestion. Subject to #10572 work-volume gate; routes bulk to Phase 2C facade.
The Problem
Agents (operator + cross-family swarm + external client agents) need an MCP-native invocation path for ingestion. Pure-bulk (CLI-only) loses the agent-native affordance for small hook batches. But #10572 gate (VectorService.mjs:216-240) refuses viaMcp syncs > mcpSyncMaxChunks (default 50) — bulk path is structurally necessary AS WELL.
The Fix
New MCP tool registered via existing toolService.mjs pattern:
ingestSourceFiles({
tenantId: 'string',
files: [{path, content, parser?, parsedChunks?}, ...],
deleted?: [path, ...],
manifestSnapshot?: {pathsAfterPush},
baseRevision?: 'SHA',
headRevision?: 'SHA'
})Wraps KnowledgeBaseIngestionService.ingestSourceFiles (Phase 2A) with:
- MCP tool-budget compliance (per #9903 precedent)
- Volume gate threading via
viaMcp: true (per #10572)
- Structured volume-gate response when batch > threshold:
{
error: 'KB_INGEST_VOLUME_EXCEEDED',
message: 'Batch size N exceeds MCP threshold M. Use bulk path.',
code: 'KB_INGEST_VOLUME_EXCEEDED',
bulkPath: 'npm run ai:ingest-tenant <tenantId>',
batchSize: N,
threshold: M
}
Contract Ledger Matrix
Backfilled per the Phase 2 Epic #11626 epic-review (Stage 4 Prescription Layer revision — #11634 / #11635 flagged as consumed-surface facade subs lacking a Contract Ledger). Pins the ingest_source_files MCP wire contract before sub-work. Surfaces verified against origin/dev.
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
ingest_source_files MCP tool — input envelope: {tenantId, files:[ {path, content, parser?, parsedChunks?} | parsed-chunk-v1 record ], deleted?:[{sourcePath, repoSlug?}], manifestSnapshot?:{repoSlug, pathsAfterPush}, baseRevision?, headRevision?} |
Discussion #11623 §4 Q3 + §7 Phase 2; KnowledgeBaseIngestionService.ingestSourceFiles JSDoc (KnowledgeBaseIngestionService.mjs:103); parser/parsed-chunk-v1.schema.json; parser/deletion-signaling-contract.md |
makeSafe Zod schema validates the envelope; files entries accept a raw {path, content} OR a parsed-chunk-v1 record; a record carrying an embedding field is rejected per the spoof-rejection invariant. |
Invalid envelope → the service's existing non-throwing structured errors (KB_INGEST_FILES_INVALID, KB_DELETE_SIGNAL_INVALID, KB_TOMBSTONE_INVALID, KB_MANIFEST_INVALID) surface in summary.errors. |
openapi.yaml tool description; JSDoc on the toolService.mjs dispatch wrapper. |
Unit: Zod accept/reject of both files shapes. Integration: end-to-end MCP invocation against a mock tenant. |
ingest_source_files — success response: {ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs} |
KnowledgeBaseIngestionService.ingestSourceFiles return contract (KnowledgeBaseIngestionService.mjs:101 JSDoc) |
The facade returns the service summary object unchanged; openapi.yaml response schema declares all six fields. |
Service-internal failures are collected non-throwing into summary.errors; the tool still resolves with the summary. |
openapi.yaml response schema. |
Unit: response-shape assertion. Per the PR #11681 get_neighbors openapi schema-drift lesson, the openapi.yaml response schema MUST declare the six-field shape — no schema drift. |
KB_INGEST_VOLUME_EXCEEDED — structured volume-gate response: {error, message, code:'KB_INGEST_VOLUME_EXCEEDED', bulkPath, batchSize, threshold} |
#10572 work-volume gate; VectorService.mjs KB_SYNC_VOLUME_EXCEEDED sibling precedent |
When the batch (file / chunk count) exceeds aiConfig.mcpSyncMaxChunks (default 50), the facade refuses before heavy embedding work and returns this shape; the KB server's 'error' in result contract converts it to isError: true. Gate mechanism is an intake Stage-2 decision: KnowledgeBaseIngestionService.ingestSourceFiles currently has no viaMcp parameter and no internal volume gate (the #10572 gate lives in VectorService.embed) — #11634 must EITHER thread a new viaMcp flag through ingestSourceFiles → embedChunkGroups → VectorService.embed, OR gate at the facade layer before dispatch. |
n/a — this surface IS the gate; bulkPath hands the agent the CLI escape hatch. |
openapi.yaml; the bulkPath field is self-documenting. |
Unit: gate fires at threshold+1 with the exact structured shape; gate does NOT fire at threshold. |
New ingest_source_files operation in ai/mcp/server/knowledge-base/openapi.yaml (path + input schema + success + KB_INGEST_VOLUME_EXCEEDED response schemas) |
toolService.mjs openApiFilePath contract — ToolService loads every tool schema from openapi.yaml; the manage_knowledge_base entry is the sibling registration precedent |
Add the operation; description single-line-preferred per pr-review §5.3. Tool registered in serviceMapping as ingest_source_files: args => KnowledgeBaseIngestionService.ingestSourceFiles({...args, viaMcp: true}) — snake_case tool name, matching the verified ask_knowledge_base / manage_knowledge_base convention (the ticket's informal ingestSourceFiles is the service-method name, not the MCP tool name). |
Absent openapi entry → tool not discoverable + schema-validation drift (the PR #11681 get_neighbors lesson). |
self — the openapi.yaml operation IS the doc. |
McpServerToolLimits test (≤ 1024-char description cap); cross-server listTools smoke coverage (cf. #11682). |
Acceptance Criteria
Out of Scope
- Bulk facade → Phase 2C
- Service-layer ingestion logic → Phase 2A (consumed here)
- Test fixtures → Phase 2F
Related
- Parent: #11626
- Blocked-by: Phase 2A (#11633) — ✅ satisfied:
KnowledgeBaseIngestionService.ingestSourceFiles shipped on dev (KnowledgeBaseIngestionService.mjs:103).
- Load-bearing dependency: #10572 (MCP work-volume gate)
- Tool-budget precedent: #9903 (script-over-tool pattern)
- Discussion source: #11623 §4 Q3 + §7 Phase 2
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
VectorService.mjs:216-240 is the #10572 volume gate reference
ai/mcp/server/knowledge-base/toolService.mjs is the MCP registration pattern
manageKnowledgeBase tool is a sibling MCP-tool architectural reference (DatabaseService.manageKnowledgeBase)
Context
Sub of Phase 2 Epic #11626 (meta-Epic #11624). Graduated from Discussion #11623.
Agent-native command-plane facade for small-batch ingestion. Subject to #10572 work-volume gate; routes bulk to Phase 2C facade.
The Problem
Agents (operator + cross-family swarm + external client agents) need an MCP-native invocation path for ingestion. Pure-bulk (CLI-only) loses the agent-native affordance for small hook batches. But #10572 gate (
VectorService.mjs:216-240) refusesviaMcpsyncs >mcpSyncMaxChunks(default 50) — bulk path is structurally necessary AS WELL.The Fix
New MCP tool registered via existing
toolService.mjspattern:ingestSourceFiles({ tenantId: 'string', files: [{path, content, parser?, parsedChunks?}, ...], deleted?: [path, ...], manifestSnapshot?: {pathsAfterPush}, baseRevision?: 'SHA', headRevision?: 'SHA' })Wraps
KnowledgeBaseIngestionService.ingestSourceFiles(Phase 2A) with:viaMcp: true(per #10572){ error: 'KB_INGEST_VOLUME_EXCEEDED', message: 'Batch size N exceeds MCP threshold M. Use bulk path.', code: 'KB_INGEST_VOLUME_EXCEEDED', bulkPath: 'npm run ai:ingest-tenant <tenantId>', batchSize: N, threshold: M }Contract Ledger Matrix
Backfilled per the Phase 2 Epic #11626 epic-review (Stage 4 Prescription Layer revision — #11634 / #11635 flagged as consumed-surface facade subs lacking a Contract Ledger). Pins the
ingest_source_filesMCP wire contract before sub-work. Surfaces verified againstorigin/dev.ingest_source_filesMCP tool — input envelope:{tenantId, files:[ {path, content, parser?, parsedChunks?} | parsed-chunk-v1 record ], deleted?:[{sourcePath, repoSlug?}], manifestSnapshot?:{repoSlug, pathsAfterPush}, baseRevision?, headRevision?}KnowledgeBaseIngestionService.ingestSourceFilesJSDoc (KnowledgeBaseIngestionService.mjs:103);parser/parsed-chunk-v1.schema.json;parser/deletion-signaling-contract.mdmakeSafeZod schema validates the envelope;filesentries accept a raw{path, content}OR aparsed-chunk-v1record; a record carrying anembeddingfield is rejected per the spoof-rejection invariant.KB_INGEST_FILES_INVALID,KB_DELETE_SIGNAL_INVALID,KB_TOMBSTONE_INVALID,KB_MANIFEST_INVALID) surface insummary.errors.openapi.yamltool description; JSDoc on thetoolService.mjsdispatch wrapper.filesshapes. Integration: end-to-end MCP invocation against a mock tenant.ingest_source_files— success response:{ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs}KnowledgeBaseIngestionService.ingestSourceFilesreturn contract (KnowledgeBaseIngestionService.mjs:101JSDoc)summaryobject unchanged;openapi.yamlresponse schema declares all six fields.summary.errors; the tool still resolves with the summary.openapi.yamlresponse schema.get_neighborsopenapi schema-drift lesson, theopenapi.yamlresponse schema MUST declare the six-field shape — no schema drift.KB_INGEST_VOLUME_EXCEEDED— structured volume-gate response:{error, message, code:'KB_INGEST_VOLUME_EXCEEDED', bulkPath, batchSize, threshold}VectorService.mjsKB_SYNC_VOLUME_EXCEEDEDsibling precedentaiConfig.mcpSyncMaxChunks(default 50), the facade refuses before heavy embedding work and returns this shape; the KB server's'error' in resultcontract converts it toisError: true. Gate mechanism is an intake Stage-2 decision:KnowledgeBaseIngestionService.ingestSourceFilescurrently has noviaMcpparameter and no internal volume gate (the #10572 gate lives inVectorService.embed) — #11634 must EITHER thread a newviaMcpflag throughingestSourceFiles→embedChunkGroups→VectorService.embed, OR gate at the facade layer before dispatch.bulkPathhands the agent the CLI escape hatch.openapi.yaml; thebulkPathfield is self-documenting.ingest_source_filesoperation inai/mcp/server/knowledge-base/openapi.yaml(path + input schema + success +KB_INGEST_VOLUME_EXCEEDEDresponse schemas)toolService.mjsopenApiFilePathcontract —ToolServiceloads every tool schema fromopenapi.yaml; themanage_knowledge_baseentry is the sibling registration precedentserviceMappingasingest_source_files: args => KnowledgeBaseIngestionService.ingestSourceFiles({...args, viaMcp: true})— snake_case tool name, matching the verifiedask_knowledge_base/manage_knowledge_baseconvention (the ticket's informalingestSourceFilesis the service-method name, not the MCP tool name).get_neighborslesson).openapi.yamloperation IS the doc.McpServerToolLimitstest (≤ 1024-char description cap); cross-serverlistToolssmoke coverage (cf. #11682).Acceptance Criteria
ingestSourceFilesregistered intoolService.mjsserviceMappingKnowledgeBaseIngestionService.ingestSourceFileswithviaMcp: trueaiConfig.mcpSyncMaxChunks(default 50; reuse #10572 threshold)makeSafeZod schema mirroringparsed-chunk-v1constraintsOut of Scope
Related
KnowledgeBaseIngestionService.ingestSourceFilesshipped ondev(KnowledgeBaseIngestionService.mjs:103).Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
VectorService.mjs:216-240is the #10572 volume gate referenceai/mcp/server/knowledge-base/toolService.mjsis the MCP registration patternmanageKnowledgeBasetool is a sibling MCP-tool architectural reference (DatabaseService.manageKnowledgeBase)