LearnNewsExamplesServices
Frontmatter
id11660
titleKB Ingestion Phase 0/1B-β: Externalize per-source paths to aiConfig.sourcePaths
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-ada
createdAtMay 20, 2026, 3:01 AM
updatedAtMay 20, 2026, 9:48 AM
githubUrlhttps://github.com/neomjs/neo/issues/11660
authorneo-opus-ada
commentsCount0
parentIssue11625
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 20, 2026, 9:48 AM

KB Ingestion Phase 0/1B-β: Externalize per-source paths to aiConfig.sourcePaths

Closed v13.0.0/archive-v13-0-0-chunk-12 enhancementaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 20, 2026, 3:01 AM

Context

Phase 0/1B-β sub-ticket of #11625 (Phase 0/1 of Epic #11624 — Cloud-Native KB Ingestion). Closes the deferred AC from #11658 (merged as PR #11659) which shipped the SourceRegistry + useDefaultSources / useDefaultParsers / customSources / customParsers configs but explicitly deferred per-source-path externalization to keep the PR diff reviewable.

#11658 stays open under #11625; this sub-issue ships the remaining slice.

The Problem

Each default Neo Source class hardcodes its source-of-truth paths inside the class body. For example:

  • ApiSource.mjs hardcodes docs/output/all.json as the API-docs input
  • LearningSource.mjs hardcodes learn/tree.json as the guides input
  • AdrSource.mjs hardcodes learn/agentos/decisions/ as the ADR directory
  • etc.

Cloud deployments where the tenant's filesystem layout differs from Neo's (e.g., guides under docs/guides/ instead of learn/) cannot reuse Neo's curated Source classes — they must fork the class to override the path. This breaks the Phase 0/1B contract of "default Neo sources can be reused alongside custom tenant sources" because the defaults assume Neo's exact layout.

The Architectural Reality

Phase 0/1B (#11658 / PR #11659) introduced aiConfig.useDefaultSources boolean + aiConfig.customSources declarative array. The natural sibling is aiConfig.knowledgeBase.sourcePaths — a per-source-name → override-path map — with Neo's defaults pre-populated so zero-config deployments behave identically and tenants can override only the keys they need:

aiConfig = {
    // ...
    useDefaultSources: true,
    customSources    : [],
    sourcePaths      : {
        // Neo defaults — overridable per source class
        ApiSource        : 'docs/output/all.json',
        LearningSource   : 'learn/tree.json',
        AdrSource        : 'learn/agentos/decisions/',
        ConceptSource    : 'learn/agentos/concepts/',
        SkillSource      : '.agents/skills/',
        // ... other defaults ...
    }
};

Each Source class reads its own path via aiConfig.sourcePaths[this.constructor.sourceName] ?? <hardcoded-fallback>. The hardcoded-fallback preserves the byte-equivalence contract — if a config-mjs is missing the new key, the source still works against the legacy Neo layout.

The Fix

1. Config additions

ai/mcp/server/knowledge-base/config.mjs + config.template.mjs:

/**
 * Phase 0/1B-β (#NEW-TICKET-N): per-source path overrides keyed by Source-class registry name.
 * Neo defaults pre-populated; tenant deployments override only the keys they need.
 * Empty object also works — each Source class falls through to its hardcoded fallback path
 * (preserves byte-equivalence with pre-Phase-0/1B behavior).
 * @type {Object<string,string>}
 */
sourcePaths: {
    ApiSource         : 'docs/output/all.json',
    LearningSource    : 'learn/tree.json',
    AdrSource         : 'learn/agentos/decisions/',
    ConceptSource     : 'learn/agentos/concepts/',
    DiscussionSource  : 'resources/content/discussions/',
    PullRequestSource : 'resources/content/pulls/',
    ReleaseNotesSource: 'resources/release-notes/',
    SkillSource       : '.agents/skills/',
    TestSource        : 'test/playwright/unit/',
    TicketSource      : 'resources/content/issues/'
},

Exact path defaults match each Source class's current hardcoded behavior — final values established empirically by reading each Source class during impl.

2. Source class refactor (per-class)

Each of the 10 default Source classes refactored to read its path from config with a hardcoded fallback:

// In e.g. ApiSource.mjs
import aiConfig from '../../../mcp/server/knowledge-base/config.mjs';

class ApiSource extends Base {
    static get sourcePath() {
        return aiConfig.sourcePaths?.ApiSource ?? 'docs/output/all.json';
    }

    async extract(writeStream, createHashFn) {
        const sourcePath = this.constructor.sourcePath;
        // ... existing logic using `sourcePath` instead of the prior hardcoded value
    }
}

The hardcoded fallback inside ?? is the byte-equivalence anchor — if a config.mjs lacks the new sourcePaths key, the source still resolves to the legacy Neo layout, so existing deployments don't break.

3. Byte-equivalence fixture test

A unit test that:

  1. Generates dist/ai-knowledge-base.jsonl with aiConfig.sourcePaths populated (Neo defaults).
  2. Generates the same with aiConfig.sourcePaths = {} (forces fallback paths).
  3. Asserts chunk-set equivalence between both outputs (same source files, same chunks, same hashes).

This guarantees the refactor doesn't subtly change Neo's KB output.

4. Per-source spec updates

Each existing Source class's spec under test/playwright/unit/ai/services/knowledge-base/source/*Source.spec.mjs gets one new test case verifying:

  • aiConfig.sourcePaths.<SourceName> override is respected
  • Missing config key falls through to hardcoded fallback (no throw)

Acceptance Criteria

  • aiConfig.sourcePaths object added to config.mjs + config.template.mjs with all 10 Neo source defaults pre-populated
  • All 10 default Source classes refactored to read aiConfig.sourcePaths[<className-segment>] ?? <hardcoded-fallback> instead of hardcoded path constants
  • Byte-equivalence test verifies output stable under (a) default config, (b) empty sourcePaths (fallback path), (c) custom override path
  • Existing per-Source spec files extended with config-override + fallback test cases
  • No regression to existing npm run ai:sync-kb output (manual smoke test documented in PR)
  • #11658 close-target updated — once this PR merges, #11658 can be closed as fully resolved (all #11625 Phase 0/1B ACs satisfied across #11658 + this sub-ticket)

Out of Scope

  • Tenant identity / write-side stamping (Phase 0/1C-α — separate sub-ticket)
  • Read-side tenant where clause filter (Phase 0/1C-β — separate sub-ticket)
  • Custom-source path override via customSources array (already supported by Phase 0/1B — this ticket only externalizes the default Neo source paths)
  • Per-source-path migration for ApiSource.sourceMap-style multi-path sources (covered as part of this work; the Fix section uses sourceMap-class single-path for clarity, multi-path Source classes will need a tuple/object shape in their respective config entry)

Avoided Traps

Trap Why rejected
Putting path overrides inside customSources array entries customSources is for new tenant-supplied Source classes; overriding default Neo Source class paths is a separate concern with its own config shape
Externalizing per-source paths as instance properties Source classes are singletons in Neo's Neo.setupClass shape; reading from config at static get sourcePath() time keeps instance-state clean
Removing the hardcoded fallback inside ?? Hardcoded fallback IS the byte-equivalence anchor — pre-#NEW-TICKET-N config.mjs files without the new sourcePaths key still work identically. Removing it would break every existing deployment on first restart
Bundle all 10 Source class refactors into a single mega-commit Each Source class refactor is mechanically identical but should be reviewable independently. Single commit is acceptable IF the PR body lists each Source class + before/after path for each

Related

  • Parent Phase Epic: #11625 (Phase 0/1: Contracts, Source/Parser Registry, KB Tenant Isolation)
  • Predecessor Sub-ticket: #11658 (Phase 0/1B: Source/Parser registry + useDefaultSources configs) → merged as PR #11659; stays open for this remaining AC
  • Parent Epic: #11624
  • Origin Discussion: #11623
  • Sibling Phase 0/1C-α work (parallel lane proposed to @neo-gpt): VectorService.embed write-side tenant stamping + tenant-aware chunkId hash. Different files, no merge collision risk.

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

  • query_raw_memories({query: 'per-source path externalization sourcePaths aiConfig Phase 0/1B-β'})
  • ask_knowledge_base({query: 'ApiSource LearningSource hardcoded path config externalization', type: 'src'})
  • Empirical anchor: 10 default Neo Source classes at ai/services/knowledge-base/source/*Source.mjs — each currently hardcodes its input path
  • Pattern reference: Phase 0/1B SourceRegistry shipped #11659 — this PR's config-externalization shape mirrors that PR's customSources declarative pattern
tobiu referenced in commit d1b3cf7 - "feat(ai): externalize per-source paths to aiConfig.sourcePaths (#11660) (#11661) on May 20, 2026, 9:48 AM
tobiu closed this issue on May 20, 2026, 9:48 AM