Context
Phase 0/1B-β sub-ticket of #11625 (Phase 0/1 of Epic #11624 — Cloud-Native KB Ingestion). Closes the deferred AC from #11658 (merged as PR #11659) which shipped the SourceRegistry + useDefaultSources / useDefaultParsers / customSources / customParsers configs but explicitly deferred per-source-path externalization to keep the PR diff reviewable.
#11658 stays open under #11625; this sub-issue ships the remaining slice.
The Problem
Each default Neo Source class hardcodes its source-of-truth paths inside the class body. For example:
ApiSource.mjs hardcodes docs/output/all.json as the API-docs input
LearningSource.mjs hardcodes learn/tree.json as the guides input
AdrSource.mjs hardcodes learn/agentos/decisions/ as the ADR directory
- etc.
Cloud deployments where the tenant's filesystem layout differs from Neo's (e.g., guides under docs/guides/ instead of learn/) cannot reuse Neo's curated Source classes — they must fork the class to override the path. This breaks the Phase 0/1B contract of "default Neo sources can be reused alongside custom tenant sources" because the defaults assume Neo's exact layout.
The Architectural Reality
Phase 0/1B (#11658 / PR #11659) introduced aiConfig.useDefaultSources boolean + aiConfig.customSources declarative array. The natural sibling is aiConfig.knowledgeBase.sourcePaths — a per-source-name → override-path map — with Neo's defaults pre-populated so zero-config deployments behave identically and tenants can override only the keys they need:
aiConfig = {
useDefaultSources: true,
customSources : [],
sourcePaths : {
ApiSource : 'docs/output/all.json',
LearningSource : 'learn/tree.json',
AdrSource : 'learn/agentos/decisions/',
ConceptSource : 'learn/agentos/concepts/',
SkillSource : '.agents/skills/',
}
};Each Source class reads its own path via aiConfig.sourcePaths[this.constructor.sourceName] ?? <hardcoded-fallback>. The hardcoded-fallback preserves the byte-equivalence contract — if a config-mjs is missing the new key, the source still works against the legacy Neo layout.
The Fix
1. Config additions
ai/mcp/server/knowledge-base/config.mjs + config.template.mjs:
sourcePaths: {
ApiSource : 'docs/output/all.json',
LearningSource : 'learn/tree.json',
AdrSource : 'learn/agentos/decisions/',
ConceptSource : 'learn/agentos/concepts/',
DiscussionSource : 'resources/content/discussions/',
PullRequestSource : 'resources/content/pulls/',
ReleaseNotesSource: 'resources/release-notes/',
SkillSource : '.agents/skills/',
TestSource : 'test/playwright/unit/',
TicketSource : 'resources/content/issues/'
},Exact path defaults match each Source class's current hardcoded behavior — final values established empirically by reading each Source class during impl.
2. Source class refactor (per-class)
Each of the 10 default Source classes refactored to read its path from config with a hardcoded fallback:
import aiConfig from '../../../mcp/server/knowledge-base/config.mjs';
class ApiSource extends Base {
static get sourcePath() {
return aiConfig.sourcePaths?.ApiSource ?? 'docs/output/all.json';
}
async extract(writeStream, createHashFn) {
const sourcePath = this.constructor.sourcePath;
}
}The hardcoded fallback inside ?? is the byte-equivalence anchor — if a config.mjs lacks the new sourcePaths key, the source still resolves to the legacy Neo layout, so existing deployments don't break.
3. Byte-equivalence fixture test
A unit test that:
- Generates
dist/ai-knowledge-base.jsonl with aiConfig.sourcePaths populated (Neo defaults).
- Generates the same with
aiConfig.sourcePaths = {} (forces fallback paths).
- Asserts chunk-set equivalence between both outputs (same source files, same chunks, same hashes).
This guarantees the refactor doesn't subtly change Neo's KB output.
4. Per-source spec updates
Each existing Source class's spec under test/playwright/unit/ai/services/knowledge-base/source/*Source.spec.mjs gets one new test case verifying:
aiConfig.sourcePaths.<SourceName> override is respected
- Missing config key falls through to hardcoded fallback (no throw)
Acceptance Criteria
Out of Scope
- Tenant identity / write-side stamping (Phase 0/1C-α — separate sub-ticket)
- Read-side tenant
where clause filter (Phase 0/1C-β — separate sub-ticket)
- Custom-source path override via
customSources array (already supported by Phase 0/1B — this ticket only externalizes the default Neo source paths)
- Per-source-path migration for
ApiSource.sourceMap-style multi-path sources (covered as part of this work; the Fix section uses sourceMap-class single-path for clarity, multi-path Source classes will need a tuple/object shape in their respective config entry)
Avoided Traps
| Trap |
Why rejected |
Putting path overrides inside customSources array entries |
customSources is for new tenant-supplied Source classes; overriding default Neo Source class paths is a separate concern with its own config shape |
| Externalizing per-source paths as instance properties |
Source classes are singletons in Neo's Neo.setupClass shape; reading from config at static get sourcePath() time keeps instance-state clean |
Removing the hardcoded fallback inside ?? |
Hardcoded fallback IS the byte-equivalence anchor — pre-#NEW-TICKET-N config.mjs files without the new sourcePaths key still work identically. Removing it would break every existing deployment on first restart |
| Bundle all 10 Source class refactors into a single mega-commit |
Each Source class refactor is mechanically identical but should be reviewable independently. Single commit is acceptable IF the PR body lists each Source class + before/after path for each |
Related
- Parent Phase Epic: #11625 (Phase 0/1: Contracts, Source/Parser Registry, KB Tenant Isolation)
- Predecessor Sub-ticket: #11658 (Phase 0/1B: Source/Parser registry + useDefaultSources configs) → merged as PR #11659; stays open for this remaining AC
- Parent Epic: #11624
- Origin Discussion: #11623
- Sibling Phase 0/1C-α work (parallel lane proposed to @neo-gpt): VectorService.embed write-side tenant stamping + tenant-aware chunkId hash. Different files, no merge collision risk.
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
query_raw_memories({query: 'per-source path externalization sourcePaths aiConfig Phase 0/1B-β'})
ask_knowledge_base({query: 'ApiSource LearningSource hardcoded path config externalization', type: 'src'})
- Empirical anchor: 10 default Neo Source classes at
ai/services/knowledge-base/source/*Source.mjs — each currently hardcodes its input path
- Pattern reference: Phase 0/1B SourceRegistry shipped #11659 — this PR's config-externalization shape mirrors that PR's
customSources declarative pattern
Context
Phase 0/1B-β sub-ticket of #11625 (Phase 0/1 of Epic #11624 — Cloud-Native KB Ingestion). Closes the deferred AC from #11658 (merged as PR #11659) which shipped the SourceRegistry +
useDefaultSources/useDefaultParsers/customSources/customParsersconfigs but explicitly deferred per-source-path externalization to keep the PR diff reviewable.#11658 stays open under #11625; this sub-issue ships the remaining slice.
The Problem
Each default Neo Source class hardcodes its source-of-truth paths inside the class body. For example:
ApiSource.mjshardcodesdocs/output/all.jsonas the API-docs inputLearningSource.mjshardcodeslearn/tree.jsonas the guides inputAdrSource.mjshardcodeslearn/agentos/decisions/as the ADR directoryCloud deployments where the tenant's filesystem layout differs from Neo's (e.g., guides under
docs/guides/instead oflearn/) cannot reuse Neo's curated Source classes — they must fork the class to override the path. This breaks the Phase 0/1B contract of "default Neo sources can be reused alongside custom tenant sources" because the defaults assume Neo's exact layout.The Architectural Reality
Phase 0/1B (#11658 / PR #11659) introduced
aiConfig.useDefaultSourcesboolean +aiConfig.customSourcesdeclarative array. The natural sibling isaiConfig.knowledgeBase.sourcePaths— a per-source-name → override-path map — with Neo's defaults pre-populated so zero-config deployments behave identically and tenants can override only the keys they need:aiConfig = { // ... useDefaultSources: true, customSources : [], sourcePaths : { // Neo defaults — overridable per source class ApiSource : 'docs/output/all.json', LearningSource : 'learn/tree.json', AdrSource : 'learn/agentos/decisions/', ConceptSource : 'learn/agentos/concepts/', SkillSource : '.agents/skills/', // ... other defaults ... } };Each Source class reads its own path via
aiConfig.sourcePaths[this.constructor.sourceName] ?? <hardcoded-fallback>. The hardcoded-fallback preserves the byte-equivalence contract — if a config-mjs is missing the new key, the source still works against the legacy Neo layout.The Fix
1. Config additions
ai/mcp/server/knowledge-base/config.mjs+config.template.mjs:/** * Phase 0/1B-β (#NEW-TICKET-N): per-source path overrides keyed by Source-class registry name. * Neo defaults pre-populated; tenant deployments override only the keys they need. * Empty object also works — each Source class falls through to its hardcoded fallback path * (preserves byte-equivalence with pre-Phase-0/1B behavior). * @type {Object<string,string>} */ sourcePaths: { ApiSource : 'docs/output/all.json', LearningSource : 'learn/tree.json', AdrSource : 'learn/agentos/decisions/', ConceptSource : 'learn/agentos/concepts/', DiscussionSource : 'resources/content/discussions/', PullRequestSource : 'resources/content/pulls/', ReleaseNotesSource: 'resources/release-notes/', SkillSource : '.agents/skills/', TestSource : 'test/playwright/unit/', TicketSource : 'resources/content/issues/' },Exact path defaults match each Source class's current hardcoded behavior — final values established empirically by reading each Source class during impl.
2. Source class refactor (per-class)
Each of the 10 default Source classes refactored to read its path from config with a hardcoded fallback:
// In e.g. ApiSource.mjs import aiConfig from '../../../mcp/server/knowledge-base/config.mjs'; class ApiSource extends Base { static get sourcePath() { return aiConfig.sourcePaths?.ApiSource ?? 'docs/output/all.json'; } async extract(writeStream, createHashFn) { const sourcePath = this.constructor.sourcePath; // ... existing logic using `sourcePath` instead of the prior hardcoded value } }The hardcoded fallback inside
??is the byte-equivalence anchor — if a config.mjs lacks the newsourcePathskey, the source still resolves to the legacy Neo layout, so existing deployments don't break.3. Byte-equivalence fixture test
A unit test that:
dist/ai-knowledge-base.jsonlwithaiConfig.sourcePathspopulated (Neo defaults).aiConfig.sourcePaths = {}(forces fallback paths).This guarantees the refactor doesn't subtly change Neo's KB output.
4. Per-source spec updates
Each existing Source class's spec under
test/playwright/unit/ai/services/knowledge-base/source/*Source.spec.mjsgets one new test case verifying:aiConfig.sourcePaths.<SourceName>override is respectedAcceptance Criteria
aiConfig.sourcePathsobject added toconfig.mjs+config.template.mjswith all 10 Neo source defaults pre-populatedaiConfig.sourcePaths[<className-segment>] ?? <hardcoded-fallback>instead of hardcoded path constantssourcePaths(fallback path), (c) custom override pathnpm run ai:sync-kboutput (manual smoke test documented in PR)Out of Scope
whereclause filter (Phase 0/1C-β — separate sub-ticket)customSourcesarray (already supported by Phase 0/1B — this ticket only externalizes the default Neo source paths)ApiSource.sourceMap-style multi-path sources (covered as part of this work; the Fix section usessourceMap-class single-path for clarity, multi-path Source classes will need a tuple/object shape in their respective config entry)Avoided Traps
customSourcesarray entriescustomSourcesis for new tenant-supplied Source classes; overriding default Neo Source class paths is a separate concern with its own config shapeNeo.setupClassshape; reading from config atstatic get sourcePath()time keeps instance-state clean??sourcePathskey still work identically. Removing it would break every existing deployment on first restartRelated
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
query_raw_memories({query: 'per-source path externalization sourcePaths aiConfig Phase 0/1B-β'})ask_knowledge_base({query: 'ApiSource LearningSource hardcoded path config externalization', type: 'src'})ai/services/knowledge-base/source/*Source.mjs— each currently hardcodes its input pathcustomSourcesdeclarative pattern