Context
Phase 0/1B sub-ticket of #11625 (Phase 0/1 of Epic #11624 — Cloud-Native KB Ingestion). Continues the increment pattern set by Phase 0/1A (#11629 merged as PR #11647) which shipped parsed-chunk-v1.schema.json + backup-record-v1.schema.json + the path-identity tuple + deletion-signaling contract.
Phase 0/1A defined the data contracts. Phase 0/1B replaces the hardcoded source-list at ai/services/knowledge-base/DatabaseService.mjs:470-481 with a data-driven registry, gated by useDefaultSources / useDefaultParsers boolean configs — the substrate floor that lets cloud deployments mix Neo's curated sources with tenant-supplied custom sources.
The Problem
DatabaseService.createKnowledgeBase() currently iterates a hardcoded array of 10 Neo-curated Source classes (AdrSource, ApiSource, ConceptSource, etc.). Cloud-deployed Agent OS workspaces need to:
- Toggle Neo's defaults off — e.g., a tenant whose external repo has nothing to do with Neo's curated content can pass
useDefaultSources: false and ingest only their own content.
- Register custom Source classes — a tenant with a
.proto parser, an ES5 codebase, or a C++ project needs a registration API that doesn't require forking Neo.
- Override per-source paths —
ApiSource.sourceMap etc. currently hardcode Neo's local file layout; cloud deployments need per-tenant path config.
The path-prefix matcher pattern from the FileSystemIngestor incident (#11650/#11651) is the cautionary tale here — hardcoded structural assumptions become substrate-level invariants that are hard to unwind once consumed.
The Architectural Reality
This phase touches:
| File |
Change |
NEW ai/services/knowledge-base/source/SourceRegistry.mjs |
Registry singleton holding registered Source classes + custom Parser classes. Default sources auto-register on import when aiConfig.useDefaultSources !== false. Provides registerSource(class, options) + registerParser(class, options) public API. |
ai/mcp/server/knowledge-base/config.mjs + config.template.mjs |
Add useDefaultSources: true (default) + useDefaultParsers: true (default) booleans. Add customSources: [] + customParsers: [] arrays (default empty) for declarative registration via config. |
ai/services/knowledge-base/DatabaseService.mjs:470-481 |
Replace hardcoded sources array with SourceRegistry.getSources(). Honor aiConfig.useDefaultSources toggle. |
ai/services/knowledge-base/source/index.mjs (or _export.mjs if existing convention) |
Centralized re-export of all default Source classes; auto-registers them when registry-singleton imports the index. |
Per-source sourceMap / path config |
Externalized to aiConfig.knowledgeBase.sourcePaths.<sourceName> with sensible Neo-default values; consumers read from config instead of hardcoded constants. |
The registration API mirrors the established Neo.setupClass pattern — Source classes are singleton-extending Neo.core.Base subclasses, so SourceRegistry.registerSource(MySource) accepts the class itself; the registry calls Neo.setupClass(MySource) if not already done.
The Fix
1. SourceRegistry singleton (new)
class SourceRegistry extends Base {
static config = {
className: 'Neo.ai.services.knowledge-base.source.SourceRegistry',
singleton: true
}
#sources = new Map();
#parsers = new Map();
registerSource(SourceClass, {sourceName} = {}) { }
registerParser(ParserClass, {parserId} = {}) { }
getSources() { return Array.from(this.#sources.values()); }
getParsers() { return Array.from(this.#parsers.values()); }
hasSource(n) { return this.#sources.has(n); }
hasParser(id) { return this.#parsers.has(id); }
}2. Auto-registration of defaults
ai/services/knowledge-base/source/index.mjs imports each default Source class + calls SourceRegistry.registerSource(...) for each, conditionally on aiConfig.useDefaultSources !== false.
3. DatabaseService.createKnowledgeBase() refactor
async createKnowledgeBase() {
const sources = SourceRegistry.getSources();
}4. Config additions
config.mjs + config.template.mjs get:
knowledgeBase: {
useDefaultSources: true,
useDefaultParsers: true,
customSources : [],
customParsers : [],
sourcePaths : {
ApiSource : 'docs/output/all.json',
LearningSource: 'learn/tree.json',
}
}5. Byte-equivalence fixture test
A unit test that:
- Generates
dist/ai-knowledge-base.jsonl with the pre-registry code path (snapshot fixture).
- Generates the same with the post-registry code path under
useDefaultSources: true.
- Asserts byte-for-byte equivalence (or chunk-set equivalence — file order may differ if registry iteration semantics change).
This guarantees the refactor doesn't subtly change Neo's KB output.
Acceptance Criteria
Out of Scope
- Cross-server push pipeline + MCP small-batch facade (Phase 2 #11626 — blocked by all of Phase 0/1)
- KB Tenant Isolation (write-side stamping + read-side filter; tracked separately in remaining Phase 0/1 ACs — will be a sibling Phase 0/1C sub-issue)
- Custom Parser registration end-to-end runtime (this PR ships the API surface + config; actual cross-language parser execution belongs to Phase 2 / Phase 3 demos)
- ES5 + C++ workspace integration fixtures (deferred to Phase 2 per #11625 body)
- HTTP / streaming transport for cross-server tenant push (Phase 2)
Avoided Traps
| Trap |
Why rejected |
Skipping the registry, keeping hardcoded array but adding useDefaultSources: false early-return |
The hardcoded array is the substrate-level invariant. Registry refactor is required for tenant-supplied sources; gating early-return doesn't enable custom registration. |
Putting config in aiConfig.knowledgeBase.sources as inline registration object |
Loses class-extension semantics; tenants need to subclass Base.mjs extract() method. Class registration matches the Source-class shape. |
Auto-registering Neo defaults via customSources config |
Couples default discipline to operator config. Defaults must be code-level discoverable (ai/services/knowledge-base/source/index.mjs) so adding a new Neo source doesn't require config sync. |
Renaming Source.extract signature to accept sourcePath parameter |
Cross-cuts every existing Source class and the test suite. Keep extract(writeStream, createHashFn) stable; per-source path config is read inside each source's class via aiConfig.knowledgeBase.sourcePaths.<name>. |
Related
- Parent Phase Epic: #11625 (Phase 0/1: Contracts, Source/Parser Registry, KB Tenant Isolation)
- Predecessor Sub-ticket: #11629 (Phase 0/1A: schemas + identity tuple + deletion-signaling) → merged as PR #11647
- Parent Epic: #11624
- Origin Discussion: #11623
- Sibling that depends on this: #11626 Phase 2 (Ingestion Service + MCP facade) —
KnowledgeBaseIngestionService consumes the registry for tenant-source discovery
- Substrate-audit-consumer-sweep anchor: Phase 0/1A repair cycles (GPT C1 + Gemini C1 + operator V-B-A) identified consumer surfaces that need stable contracts before this registry layer can land safely — those landed in Phase 0/1A first.
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
query_raw_memories({query: 'Source registry useDefaultSources useDefaultParsers KB ingestion phase 0/1B'})
ask_knowledge_base({query: 'DatabaseService createKnowledgeBase hardcoded sources array', type: 'src'})
- Empirical anchor:
ai/services/knowledge-base/DatabaseService.mjs:470-481 hardcoded source array — the substrate-mechanical surface to refactor
- Pattern reference:
ai/services/knowledge-base/parser/ from Phase 0/1A shipped #11647 (schemas + JSDoc convention)
Context
Phase 0/1B sub-ticket of #11625 (Phase 0/1 of Epic #11624 — Cloud-Native KB Ingestion). Continues the increment pattern set by Phase 0/1A (#11629 merged as PR #11647) which shipped
parsed-chunk-v1.schema.json+backup-record-v1.schema.json+ the path-identity tuple + deletion-signaling contract.Phase 0/1A defined the data contracts. Phase 0/1B replaces the hardcoded source-list at
ai/services/knowledge-base/DatabaseService.mjs:470-481with a data-driven registry, gated byuseDefaultSources/useDefaultParsersboolean configs — the substrate floor that lets cloud deployments mix Neo's curated sources with tenant-supplied custom sources.The Problem
DatabaseService.createKnowledgeBase()currently iterates a hardcoded array of 10 Neo-curated Source classes (AdrSource,ApiSource,ConceptSource, etc.). Cloud-deployed Agent OS workspaces need to:useDefaultSources: falseand ingest only their own content..protoparser, an ES5 codebase, or a C++ project needs a registration API that doesn't require forking Neo.ApiSource.sourceMapetc. currently hardcode Neo's local file layout; cloud deployments need per-tenant path config.The path-prefix matcher pattern from the FileSystemIngestor incident (#11650/#11651) is the cautionary tale here — hardcoded structural assumptions become substrate-level invariants that are hard to unwind once consumed.
The Architectural Reality
This phase touches:
ai/services/knowledge-base/source/SourceRegistry.mjsaiConfig.useDefaultSources !== false. ProvidesregisterSource(class, options)+registerParser(class, options)public API.ai/mcp/server/knowledge-base/config.mjs+config.template.mjsuseDefaultSources: true(default) +useDefaultParsers: true(default) booleans. AddcustomSources: []+customParsers: []arrays (default empty) for declarative registration via config.ai/services/knowledge-base/DatabaseService.mjs:470-481SourceRegistry.getSources(). HonoraiConfig.useDefaultSourcestoggle.ai/services/knowledge-base/source/index.mjs(or_export.mjsif existing convention)sourceMap/ path configaiConfig.knowledgeBase.sourcePaths.<sourceName>with sensible Neo-default values; consumers read from config instead of hardcoded constants.The registration API mirrors the established
Neo.setupClasspattern — Source classes are singleton-extendingNeo.core.Basesubclasses, soSourceRegistry.registerSource(MySource)accepts the class itself; the registry callsNeo.setupClass(MySource)if not already done.The Fix
1.
SourceRegistrysingleton (new)class SourceRegistry extends Base { static config = { className: 'Neo.ai.services.knowledge-base.source.SourceRegistry', singleton: true } #sources = new Map(); // sourceName → Source class #parsers = new Map(); // parserId → Parser class registerSource(SourceClass, {sourceName} = {}) { /* ... */ } registerParser(ParserClass, {parserId} = {}) { /* ... */ } getSources() { return Array.from(this.#sources.values()); } getParsers() { return Array.from(this.#parsers.values()); } hasSource(n) { return this.#sources.has(n); } hasParser(id) { return this.#parsers.has(id); } }2. Auto-registration of defaults
ai/services/knowledge-base/source/index.mjsimports each default Source class + callsSourceRegistry.registerSource(...)for each, conditionally onaiConfig.useDefaultSources !== false.3.
DatabaseService.createKnowledgeBase()refactorasync createKnowledgeBase() { // ... const sources = SourceRegistry.getSources(); // ... existing iteration logic unchanged }4. Config additions
config.mjs+config.template.mjsget:knowledgeBase: { useDefaultSources: true, useDefaultParsers: true, customSources : [], // [{className, sourceName, sourcePath}, ...] customParsers : [], // [{className, parserId, parserVersion}, ...] sourcePaths : { ApiSource : 'docs/output/all.json', LearningSource: 'learn/tree.json', // ... } }5. Byte-equivalence fixture test
A unit test that:
dist/ai-knowledge-base.jsonlwith the pre-registry code path (snapshot fixture).useDefaultSources: true.This guarantees the refactor doesn't subtly change Neo's KB output.
Acceptance Criteria
SourceRegistry.mjsexists withregisterSource/registerParser/getSources/getParsers/hasSource/hasParserpublic methodsuseDefaultSources(defaulttrue) +useDefaultParsers(defaulttrue) configs inconfig.mjs+config.template.mjscustomSources+customParsersarrays in config (default[]) for declarative-registration pathai/services/knowledge-base/source/index.mjsDatabaseService.createKnowledgeBase()usesSourceRegistry.getSources()instead of hardcoded arrayaiConfig.knowledgeBase.sourcePaths.*test/playwright/unit/ai/services/knowledge-base/source/for SourceRegistry: register/dedup/unregister/list semantics,useDefaultSources: falseskip-defaults behavior, custom source registration round-tripuseDefaultSources: true)npm run ai:sync-kbproduces unchanged output under default config (manual smoke test documented in PR)Out of Scope
Avoided Traps
useDefaultSources: falseearly-returnaiConfig.knowledgeBase.sourcesas inline registration objectBase.mjsextract() method. Class registration matches the Source-class shape.customSourcesconfigai/services/knowledge-base/source/index.mjs) so adding a new Neo source doesn't require config sync.Source.extractsignature to acceptsourcePathparameterextract(writeStream, createHashFn)stable; per-source path config is read inside each source's class viaaiConfig.knowledgeBase.sourcePaths.<name>.Related
KnowledgeBaseIngestionServiceconsumes the registry for tenant-source discoveryOrigin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
query_raw_memories({query: 'Source registry useDefaultSources useDefaultParsers KB ingestion phase 0/1B'})ask_knowledge_base({query: 'DatabaseService createKnowledgeBase hardcoded sources array', type: 'src'})ai/services/knowledge-base/DatabaseService.mjs:470-481hardcoded source array — the substrate-mechanical surface to refactorai/services/knowledge-base/parser/from Phase 0/1A shipped #11647 (schemas + JSDoc convention)