LearnNewsExamplesServices
Frontmatter
id11634
titlePhase 2B — MCP Facade: ingestSourceFiles Tool + #10572 Work-Volume Gate Threading
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-ada
createdAtMay 19, 2026, 1:55 PM
updatedAtJun 7, 2026, 7:13 PM
githubUrlhttps://github.com/neomjs/neo/issues/11634
authorneo-opus-ada
commentsCount0
parentIssue11626
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[x] 11633 Phase 2A — KnowledgeBaseIngestionService Core: Orchestrator + parsed-chunk-v1 Validation + Delta Integration
blocking[]
closedAtMay 20, 2026, 7:13 PM

Phase 2B — MCP Facade: ingestSourceFiles Tool + #10572 Work-Volume Gate Threading

Closed v13.0.0/archive-v13-0-0-chunk-12 enhancementaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 19, 2026, 1:55 PM

Context

Sub of Phase 2 Epic #11626 (meta-Epic #11624). Graduated from Discussion #11623.

Agent-native command-plane facade for small-batch ingestion. Subject to #10572 work-volume gate; routes bulk to Phase 2C facade.

The Problem

Agents (operator + cross-family swarm + external client agents) need an MCP-native invocation path for ingestion. Pure-bulk (CLI-only) loses the agent-native affordance for small hook batches. But #10572 gate (VectorService.mjs:216-240) refuses viaMcp syncs > mcpSyncMaxChunks (default 50) — bulk path is structurally necessary AS WELL.

The Fix

New MCP tool registered via existing toolService.mjs pattern:

ingestSourceFiles({
  tenantId: 'string',
  files: [{path, content, parser?, parsedChunks?}, ...],
  deleted?: [path, ...],
  manifestSnapshot?: {pathsAfterPush},
  baseRevision?: 'SHA',
  headRevision?: 'SHA'
})

Wraps KnowledgeBaseIngestionService.ingestSourceFiles (Phase 2A) with:

  1. MCP tool-budget compliance (per #9903 precedent)
  2. Volume gate threading via viaMcp: true (per #10572)
  3. Structured volume-gate response when batch > threshold:
       {
      error: 'KB_INGEST_VOLUME_EXCEEDED',
      message: 'Batch size N exceeds MCP threshold M. Use bulk path.',
      code: 'KB_INGEST_VOLUME_EXCEEDED',
      bulkPath: 'npm run ai:ingest-tenant <tenantId>',
      batchSize: N,
      threshold: M
    }

Contract Ledger Matrix

Backfilled per the Phase 2 Epic #11626 epic-review (Stage 4 Prescription Layer revision — #11634 / #11635 flagged as consumed-surface facade subs lacking a Contract Ledger). Pins the ingest_source_files MCP wire contract before sub-work. Surfaces verified against origin/dev.

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
ingest_source_files MCP tool — input envelope: {tenantId, files:[ {path, content, parser?, parsedChunks?} | parsed-chunk-v1 record ], deleted?:[{sourcePath, repoSlug?}], manifestSnapshot?:{repoSlug, pathsAfterPush}, baseRevision?, headRevision?} Discussion #11623 §4 Q3 + §7 Phase 2; KnowledgeBaseIngestionService.ingestSourceFiles JSDoc (KnowledgeBaseIngestionService.mjs:103); parser/parsed-chunk-v1.schema.json; parser/deletion-signaling-contract.md makeSafe Zod schema validates the envelope; files entries accept a raw {path, content} OR a parsed-chunk-v1 record; a record carrying an embedding field is rejected per the spoof-rejection invariant. Invalid envelope → the service's existing non-throwing structured errors (KB_INGEST_FILES_INVALID, KB_DELETE_SIGNAL_INVALID, KB_TOMBSTONE_INVALID, KB_MANIFEST_INVALID) surface in summary.errors. openapi.yaml tool description; JSDoc on the toolService.mjs dispatch wrapper. Unit: Zod accept/reject of both files shapes. Integration: end-to-end MCP invocation against a mock tenant.
ingest_source_filessuccess response: {ingested, deleted, embeddingsGenerated, errors, tenantId, durationMs} KnowledgeBaseIngestionService.ingestSourceFiles return contract (KnowledgeBaseIngestionService.mjs:101 JSDoc) The facade returns the service summary object unchanged; openapi.yaml response schema declares all six fields. Service-internal failures are collected non-throwing into summary.errors; the tool still resolves with the summary. openapi.yaml response schema. Unit: response-shape assertion. Per the PR #11681 get_neighbors openapi schema-drift lesson, the openapi.yaml response schema MUST declare the six-field shape — no schema drift.
KB_INGEST_VOLUME_EXCEEDEDstructured volume-gate response: {error, message, code:'KB_INGEST_VOLUME_EXCEEDED', bulkPath, batchSize, threshold} #10572 work-volume gate; VectorService.mjs KB_SYNC_VOLUME_EXCEEDED sibling precedent When the batch (file / chunk count) exceeds aiConfig.mcpSyncMaxChunks (default 50), the facade refuses before heavy embedding work and returns this shape; the KB server's 'error' in result contract converts it to isError: true. Gate mechanism is an intake Stage-2 decision: KnowledgeBaseIngestionService.ingestSourceFiles currently has no viaMcp parameter and no internal volume gate (the #10572 gate lives in VectorService.embed) — #11634 must EITHER thread a new viaMcp flag through ingestSourceFilesembedChunkGroupsVectorService.embed, OR gate at the facade layer before dispatch. n/a — this surface IS the gate; bulkPath hands the agent the CLI escape hatch. openapi.yaml; the bulkPath field is self-documenting. Unit: gate fires at threshold+1 with the exact structured shape; gate does NOT fire at threshold.
New ingest_source_files operation in ai/mcp/server/knowledge-base/openapi.yaml (path + input schema + success + KB_INGEST_VOLUME_EXCEEDED response schemas) toolService.mjs openApiFilePath contract — ToolService loads every tool schema from openapi.yaml; the manage_knowledge_base entry is the sibling registration precedent Add the operation; description single-line-preferred per pr-review §5.3. Tool registered in serviceMapping as ingest_source_files: args => KnowledgeBaseIngestionService.ingestSourceFiles({...args, viaMcp: true})snake_case tool name, matching the verified ask_knowledge_base / manage_knowledge_base convention (the ticket's informal ingestSourceFiles is the service-method name, not the MCP tool name). Absent openapi entry → tool not discoverable + schema-validation drift (the PR #11681 get_neighbors lesson). self — the openapi.yaml operation IS the doc. McpServerToolLimits test (≤ 1024-char description cap); cross-server listTools smoke coverage (cf. #11682).

Acceptance Criteria

  • MCP tool ingestSourceFiles registered in toolService.mjs serviceMapping
  • Tool dispatches to KnowledgeBaseIngestionService.ingestSourceFiles with viaMcp: true
  • Volume gate fires when batch > aiConfig.mcpSyncMaxChunks (default 50; reuse #10572 threshold)
  • Structured volume-gate response includes batch size + threshold + bulk path
  • Tool registered with makeSafe Zod schema mirroring parsed-chunk-v1 constraints
  • Unit tests: dispatch + volume gate firing + structured response shape
  • Integration test: end-to-end MCP invocation against mock tenant

Out of Scope

  • Bulk facade → Phase 2C
  • Service-layer ingestion logic → Phase 2A (consumed here)
  • Test fixtures → Phase 2F

Related

  • Parent: #11626
  • Blocked-by: Phase 2A (#11633) — ✅ satisfied: KnowledgeBaseIngestionService.ingestSourceFiles shipped on dev (KnowledgeBaseIngestionService.mjs:103).
  • Load-bearing dependency: #10572 (MCP work-volume gate)
  • Tool-budget precedent: #9903 (script-over-tool pattern)
  • Discussion source: #11623 §4 Q3 + §7 Phase 2

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

  • VectorService.mjs:216-240 is the #10572 volume gate reference
  • ai/mcp/server/knowledge-base/toolService.mjs is the MCP registration pattern
  • manageKnowledgeBase tool is a sibling MCP-tool architectural reference (DatabaseService.manageKnowledgeBase)
tobiu referenced in commit 1d03e38 - "feat(kb): add ingest_source_files MCP facade with volume gate (#11634) (#11688) on May 20, 2026, 7:13 PM
tobiu closed this issue on May 20, 2026, 7:13 PM
tobiu referenced in commit b7a8d75 - "docs(agentos): Phase 3B cloud-deployment guides + examples (#11627) (#11707) on May 21, 2026, 8:49 AM