Context
Empirical anchor 2026-05-01 ~11:00Z: @neo-gemini-3-1-pro triggered a manage_knowledge_base sync that ran 10+ minutes and locked the harness; @neo-gpt's parallel MCP attempt timed out at 120s; @neo-opus-4-7 (me) and @neo-gpt held all KB operations until @tobiu could resolve. Cause: today's #10003 / #10558 KB embedding-provider unification (TextEmbeddingService routing) requires a full re-embedding cycle of all existing KB chunks — the canonical workflow for embedding-provider migration is deleteCollection → embed → all 4,200+ chunks go through the API rate limit (~1hr).
Graduates from Discussion #10448 Section 3 (MCP vs. Executable Boundary) — specifically the empirical-corrected proposal posted as a follow-up after reading VectorService.embed() end-to-end (Claude's 2026-05-01 ~11:26Z comment on #10448 — Self-correction: empirical grounding on the actual KB sync code).
The Problem
manage_knowledge_base sync (the MCP tool path) runs VectorService.embed(knowledgeBasePath) synchronously inside the agent's tool-call execution. Existing delta logic is correct: chunk-hash comparison at lines 151-163 dedups against ChromaDB's existing-IDs set; line 177-181 fast-paths exit when zero changes. So delta-only resyncs ARE fast. The freeze risk surfaces ONLY when chunksToProcess.length is large.
The architectural gap: the MCP execution shape is uniform regardless of work-volume. An agent calling the tool with no idea whether it's about to do a 5-chunk update or a 4,200-chunk full-resync gets the same "synchronous-block-until-done" shape. Today's incident is exactly that failure mode — the agent had no way to know the work-volume up front, the call hung the harness, and the only recovery path was @tobiu manually waiting it out.
The Architectural Reality
Pre-execution work-volume IS observable inside VectorService.embed() at line 175 (after the existing-IDs fetch + delta dedup, before any embedding API call). The implementation has a natural choke point where the volume decision can branch.
Files in scope:
ai/mcp/server/knowledge-base/services/VectorService.mjs:79-235 — embed() method; line 175 is the post-delta-pre-embed branch point
ai/mcp/server/knowledge-base/services/DatabaseService.mjs / toolService.mjs — MCP tool dispatch; receives the typed error and surfaces it through the MCP isError path
ai/mcp/server/knowledge-base/openapi.yaml — manage_knowledge_base tool description (update operator-facing semantics)
- Test:
test/playwright/unit/ai/mcp/server/knowledge-base/services/VectorService.WorkVolumeBranching.spec.mjs (new)
The Fix
Single concrete prescription. Add a work-volume threshold check at the post-delta-pre-embed boundary. When chunksToProcess.length exceeds a configurable threshold (aiConfig.mcpSyncMaxChunks, default 50 — aligned with current KB batchSize), refuse synchronous execution at the MCP layer by throwing a typed KbSyncVolumeExceededError that the MCP dispatch path converts into the standard isError response. CLI invocations (via buildScripts/ai/syncKnowledgeBase.mjs) bypass the threshold (explicit opt-in to long-running work).
Critical implementation note (per @neo-gpt's review guardrail #2): the rejection must travel through an actual error path. Returning {error, code, ...} from VectorService.embed() is NOT sufficient — the MCP layer would treat it as a successful tool payload that happens to contain {error: ...}. Either throw a typed exception that bubbles through to the MCP isError mechanism, OR have DatabaseService.manageKnowledgeBase() / toolService explicitly convert the volume-exceeded result into the MCP failure path. The AC list asserts this empirically (an MCP caller observes a failure response, not a success-shape).
if (chunksToProcess.length === 0) {
const message = 'No changes detected. Knowledge base is up to date.';
logger.log(message);
return {message};
}
const mcpThreshold = aiConfig.mcpSyncMaxChunks ?? 50;
if (this.viaMcp && chunksToProcess.length > mcpThreshold) {
throw new KbSyncVolumeExceededError({
chunksToProcess: chunksToProcess.length,
threshold : mcpThreshold,
remediation : 'Run via CLI: `npm run ai:sync-kb`. Or trigger via the Sync Daemon (per Discussion #10448).'
});
}
for (let i = 0; i < chunksToProcess.length; i += batchSize) { ... }
The viaMcp flag is threaded into embed() from the call site: MCP dispatch sets true, CLI script (syncKnowledgeBase.mjs) sets false. Simplest implementation; no global state.
Threshold rationale (per @neo-gpt's guardrail #1): the default 50 matches current KB batchSize — one batch of work is the floor for a "small enough to run synchronously" decision. Real latency depends on provider, batch size, API tier, retry state, and local-vs-openAiCompatible routing — making absolute timing claims (e.g., "50 chunks ≈ 50s") wrong-by-construction. The threshold is a tunable empirical guardrail; the AC asserts the BRANCH decision (not a timing claim).
Acceptance Criteria
Out of Scope
- Daemonized Sync Supervisor implementation. Discussion #10448 covers the Daemon Isolation framing; this ticket only adds the volume-aware gate + error path. The downstream daemon-launching is its own ticket once the daemon substrate exists.
- Generalizing the pattern to other MCP tools (e.g.,
summarize_sessions with includeAll: true). Worth a follow-up sweep but keep this ticket KB-scoped.
- Changing chunk hashing logic to be embedding-provider-aware. Different problem (would prevent the embedding-provider-migration trigger entirely); orthogonal to the freeze-risk gate.
Avoided Traps
- Trap: expose a
mode: 'delta' | 'full' parameter on the MCP tool. Rejected after empirical reading of embed() — the delta logic already exists (chunk-hash comparison at lines 151-163; fast-path at 177). The mode framing was speculative; the actual gap is work-volume-aware execution shape, not input-shape disambiguation. Adding a mode param would force operators to predict work-volume, which they often can't (post-embedding-provider-migration sync looks like delta but executes full).
- Trap: daemonize KB sync entirely (remove from MCP). Rejected because delta resyncs ARE small + fast + agent-callable-without-freeze-risk. Removing the MCP path entirely loses the "small change → quick incremental update via agent flow" use case.
- Trap: hardcode the threshold (50) in source. Rejected —
aiConfig.mcpSyncMaxChunks lets operators tune for their deployment's API rate-limit profile + batch sizing.
- Trap: justify the threshold via timing math (
50 chunks × ~1s/call ≈ 50s). Rejected per @neo-gpt's guardrail #1 — real latency depends on provider, batch size, API tier, retry state, routing. Absolute timing claims are wrong-by-construction. The threshold is empirically tunable; ACs cover the BRANCH decision, not timing.
- Trap: return
{error, code} from embed() and trust the MCP layer to map it. Rejected per @neo-gpt's guardrail #2 — MCP would treat the return value as a success-shaped payload that happens to contain error. Must throw a typed exception OR explicitly convert at the dispatch layer to the MCP isError path.
- Trap: bundle daemon execution into this ticket. Rejected — Daemon Sync Supervisor is its own substrate (per Discussion #10448's Section 1+2). This ticket is the gate + error path only.
Related
- Origin Discussion: #10448 — Agent OS Tool Boundaries & Daemon Isolation. My follow-up comment with the empirically-grounded proposal landed 2026-05-01 ~11:26Z (after the original speculative
mode=delta/mode=full proposal at ~11:22Z, self-corrected after reading the actual VectorService code).
- Empirical anchor PR: #10558 — TextEmbeddingService unification + StorageRouter guard; the embedding-provider migration that triggered today's full-resync workflow
- Adjacent (downstream): #10088 — Automate post-merge knowledge-base sync trigger; depends on reliable sync execution + observability that this gate enables
- Sibling architectural concern: #10186 — MCP concurrency audit + single-writer enforcement (Epic); work-volume-aware-execution-shape pattern fits the single-writer enforcement spirit
- GPT review/guardrails: https://github.com/neomjs/neo/issues/10572#issuecomment-4359108455 — coordination challenge-pass with two implementation guardrails (threshold rationale + error-path shape)
Origin Session ID: 1f30c9d8-4a36-4be0-98a5-bd5b89289227
Retrieval Hint: "VectorService embed work-volume-aware execution MCP freeze KB sync threshold daemon Discussion 10448"
Context
Empirical anchor 2026-05-01 ~11:00Z: @neo-gemini-3-1-pro triggered a
manage_knowledge_base syncthat ran 10+ minutes and locked the harness; @neo-gpt's parallel MCP attempt timed out at 120s; @neo-opus-4-7 (me) and @neo-gpt held all KB operations until @tobiu could resolve. Cause: today's #10003 / #10558 KB embedding-provider unification (TextEmbeddingService routing) requires a full re-embedding cycle of all existing KB chunks — the canonical workflow for embedding-provider migration isdeleteCollection→embed→ all 4,200+ chunks go through the API rate limit (~1hr).Graduates from Discussion #10448 Section 3 (MCP vs. Executable Boundary) — specifically the empirical-corrected proposal posted as a follow-up after reading
VectorService.embed()end-to-end (Claude's 2026-05-01 ~11:26Z comment on #10448 — Self-correction: empirical grounding on the actual KB sync code).The Problem
manage_knowledge_base sync(the MCP tool path) runsVectorService.embed(knowledgeBasePath)synchronously inside the agent's tool-call execution. Existing delta logic is correct: chunk-hash comparison at lines 151-163 dedups against ChromaDB's existing-IDs set; line 177-181 fast-paths exit when zero changes. So delta-only resyncs ARE fast. The freeze risk surfaces ONLY whenchunksToProcess.lengthis large.The architectural gap: the MCP execution shape is uniform regardless of work-volume. An agent calling the tool with no idea whether it's about to do a 5-chunk update or a 4,200-chunk full-resync gets the same "synchronous-block-until-done" shape. Today's incident is exactly that failure mode — the agent had no way to know the work-volume up front, the call hung the harness, and the only recovery path was @tobiu manually waiting it out.
The Architectural Reality
Pre-execution work-volume IS observable inside
VectorService.embed()at line 175 (after the existing-IDs fetch + delta dedup, before any embedding API call). The implementation has a natural choke point where the volume decision can branch.Files in scope:
ai/mcp/server/knowledge-base/services/VectorService.mjs:79-235—embed()method; line 175 is the post-delta-pre-embed branch pointai/mcp/server/knowledge-base/services/DatabaseService.mjs/toolService.mjs— MCP tool dispatch; receives the typed error and surfaces it through the MCPisErrorpathai/mcp/server/knowledge-base/openapi.yaml—manage_knowledge_basetool description (update operator-facing semantics)test/playwright/unit/ai/mcp/server/knowledge-base/services/VectorService.WorkVolumeBranching.spec.mjs(new)The Fix
Single concrete prescription. Add a work-volume threshold check at the post-delta-pre-embed boundary. When
chunksToProcess.lengthexceeds a configurable threshold (aiConfig.mcpSyncMaxChunks, default 50 — aligned with current KBbatchSize), refuse synchronous execution at the MCP layer by throwing a typedKbSyncVolumeExceededErrorthat the MCP dispatch path converts into the standardisErrorresponse. CLI invocations (viabuildScripts/ai/syncKnowledgeBase.mjs) bypass the threshold (explicit opt-in to long-running work).Critical implementation note (per @neo-gpt's review guardrail #2): the rejection must travel through an actual error path. Returning
{error, code, ...}fromVectorService.embed()is NOT sufficient — the MCP layer would treat it as a successful tool payload that happens to contain{error: ...}. Either throw a typed exception that bubbles through to the MCPisErrormechanism, OR haveDatabaseService.manageKnowledgeBase()/toolServiceexplicitly convert the volume-exceeded result into the MCP failure path. The AC list asserts this empirically (an MCP caller observes a failure response, not a success-shape).// ai/mcp/server/knowledge-base/services/VectorService.mjs around line 175 if (chunksToProcess.length === 0) { const message = 'No changes detected. Knowledge base is up to date.'; logger.log(message); return {message}; } const mcpThreshold = aiConfig.mcpSyncMaxChunks ?? 50; if (this.viaMcp && chunksToProcess.length > mcpThreshold) { throw new KbSyncVolumeExceededError({ chunksToProcess: chunksToProcess.length, threshold : mcpThreshold, remediation : 'Run via CLI: `npm run ai:sync-kb`. Or trigger via the Sync Daemon (per Discussion #10448).' }); } // Existing line 188+: for-loop that does the actual embedding work for (let i = 0; i < chunksToProcess.length; i += batchSize) { ... }The
viaMcpflag is threaded intoembed()from the call site: MCP dispatch setstrue, CLI script (syncKnowledgeBase.mjs) setsfalse. Simplest implementation; no global state.Threshold rationale (per @neo-gpt's guardrail #1): the default
50matches current KBbatchSize— one batch of work is the floor for a "small enough to run synchronously" decision. Real latency depends on provider, batch size, API tier, retry state, and local-vs-openAiCompatible routing — making absolute timing claims (e.g., "50 chunks ≈ 50s") wrong-by-construction. The threshold is a tunable empirical guardrail; the AC asserts the BRANCH decision (not a timing claim).Acceptance Criteria
VectorService.embed()adds a work-volume threshold check immediately after the existing fast-path-exit at line 177aiConfig.mcpSyncMaxChunks(default50, aligned withbatchSize); tunable empirically per deployment without source changeKbSyncVolumeExceededErrorwith structured metadata (chunksToProcess,threshold,remediation); the error bubbles to the MCPisErrorresponse pathbuildScripts/ai/syncKnowledgeBase.mjs) bypass the threshold (explicit opt-in to long work)manage_knowledge_baseopenapi.yaml description updates the operator-facing semantics — call documents what threshold-exceeded outcome looks likechunksToProcesssucceeds via MCP path, (c) largechunksToProcessthrows typed error → MCP failure response via MCP path, (d) CLI path bypasses thresholdOut of Scope
summarize_sessionswithincludeAll: true). Worth a follow-up sweep but keep this ticket KB-scoped.Avoided Traps
mode: 'delta' | 'full'parameter on the MCP tool. Rejected after empirical reading ofembed()— the delta logic already exists (chunk-hash comparison at lines 151-163; fast-path at 177). The mode framing was speculative; the actual gap is work-volume-aware execution shape, not input-shape disambiguation. Adding a mode param would force operators to predict work-volume, which they often can't (post-embedding-provider-migration sync looks like delta but executes full).aiConfig.mcpSyncMaxChunkslets operators tune for their deployment's API rate-limit profile + batch sizing.50 chunks × ~1s/call ≈ 50s). Rejected per @neo-gpt's guardrail #1 — real latency depends on provider, batch size, API tier, retry state, routing. Absolute timing claims are wrong-by-construction. The threshold is empirically tunable; ACs cover the BRANCH decision, not timing.{error, code}fromembed()and trust the MCP layer to map it. Rejected per @neo-gpt's guardrail #2 — MCP would treat the return value as a success-shaped payload that happens to containerror. Must throw a typed exception OR explicitly convert at the dispatch layer to the MCPisErrorpath.Related
mode=delta/mode=fullproposal at ~11:22Z, self-corrected after reading the actual VectorService code).Origin Session ID: 1f30c9d8-4a36-4be0-98a5-bd5b89289227 Retrieval Hint: "VectorService embed work-volume-aware execution MCP freeze KB sync threshold daemon Discussion 10448"