Context
Sub of Phase 0/1 Epic #11625 (meta-Epic #11624). Graduated from Discussion #11623.
Removes the hardcoded single-source-repo assumption from the substrate WITHOUT changing chunk-shape semantics. Validated by byte-equivalence fixture (current Neo source output stable before/after extraction).
The Problem
The configurability gap is mechanically located:
const sources = [
AdrSource, ApiSource, ConceptSource, DiscussionSource,
LearningSource, PullRequestSource, ReleaseNotesSource,
SkillSource, TicketSource, TestSource
];Plus each Source subclass hardcodes its paths (e.g., ApiSource.sourceMap maps src/apps/examples/docs/app/ai — Neo-specific).
Cloud deployments need:
- Data-driven source registration
- Per-source path externalization
useDefaultSources / useDefaultParsers opt-in/out booleans
- Custom-source/parser registration API
- PROOF that the extraction doesn't regress retrieval quality (byte-equivalence fixture)
The Fix
- Data-driven source registry in
aiConfig.knowledgeBase.sources: sources: [
{ sourceClass: 'AdrSource', paths: {...} },
{ sourceClass: 'ApiSource', paths: { src: 'src', apps: 'app', ... } },
]
useDefaultSources: true / useDefaultParsers: true (default in aiConfig) preserves current 10-source / 3-parser binding for zero-config Neo deployments
- Custom-source registration API: tenants can append/override entries in
aiConfig.knowledgeBase.sources and aiConfig.knowledgeBase.parsers
- Per-source path externalization:
ApiSource.sourceMap etc. → consume from aiConfig.knowledgeBase.sources[X].paths
- Byte-equivalence fixture at
test/playwright/unit/ai/knowledge-base/byte-equivalence.spec.mjs:
- Run current 10 sources × current parsers × current paths → capture chunk JSONL output
- Run new registry-driven path → capture chunk JSONL output
- Assert: chunk-level byte-equivalence (
chunk.hash stable; chunk.content stable; chunk.metadata stable except for new path-identity fields when reformulated as chunk.metadata.source = {tenantId: 'neo-shared', ...} — fixture validates the reformulation is non-disruptive for existing content semantics)
Acceptance Criteria
Out of Scope
- Schema authoring → Phase 0/1A (blocker)
- memorySharing chunk metadata fields → Phase 0/1C/D
- Runtime ingestion endpoint → Phase 2
Related
- Parent: #11625
- Blocked-by: Phase 0/1A (#TBD — needs schemas stable to validate registry output)
- Blocks: Phase 0/1C (needs registry to know how chunkId derivation interacts with tenantId)
- Discussion source: #11623 §7 Phase 0/1, §4 Q2 (registry shape)
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
DatabaseService.mjs:460-471 is the mechanical surface
ApiSource.mjs:67-73 is the canonical hardcoded-paths example
- Byte-equivalence fixture is the load-bearing safety net — author it FIRST to capture current behavior, then refactor against it
Context
Sub of Phase 0/1 Epic #11625 (meta-Epic #11624). Graduated from Discussion #11623.
Removes the hardcoded single-source-repo assumption from the substrate WITHOUT changing chunk-shape semantics. Validated by byte-equivalence fixture (current Neo source output stable before/after extraction).
The Problem
The configurability gap is mechanically located:
// ai/services/knowledge-base/DatabaseService.mjs:460-471 const sources = [ AdrSource, ApiSource, ConceptSource, DiscussionSource, LearningSource, PullRequestSource, ReleaseNotesSource, SkillSource, TicketSource, TestSource ];Plus each
Sourcesubclass hardcodes its paths (e.g.,ApiSource.sourceMapmapssrc/apps/examples/docs/app/ai— Neo-specific).Cloud deployments need:
useDefaultSources/useDefaultParsersopt-in/out booleansThe Fix
aiConfig.knowledgeBase.sources:sources: [ { sourceClass: 'AdrSource', paths: {...} }, { sourceClass: 'ApiSource', paths: { src: 'src', apps: 'app', ... } }, // ... ]useDefaultSources: true/useDefaultParsers: true(default inaiConfig) preserves current 10-source / 3-parser binding for zero-config Neo deploymentsaiConfig.knowledgeBase.sourcesandaiConfig.knowledgeBase.parsersApiSource.sourceMapetc. → consume fromaiConfig.knowledgeBase.sources[X].pathstest/playwright/unit/ai/knowledge-base/byte-equivalence.spec.mjs:chunk.hashstable;chunk.contentstable;chunk.metadatastable except for new path-identity fields when reformulated aschunk.metadata.source = {tenantId: 'neo-shared', ...}— fixture validates the reformulation is non-disruptive for existing content semantics)Acceptance Criteria
aiConfig.knowledgeBase.sourcesdata-driven registry shape definedaiConfig.knowledgeBase.parsersdata-driven parser registry shape defineduseDefaultSourcesboolean config (defaulttrue)useDefaultParsersboolean config (defaulttrue)sourcesarray inDatabaseService.mjs:460-471replaced with registry consumptionOut of Scope
Related
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
DatabaseService.mjs:460-471is the mechanical surfaceApiSource.mjs:67-73is the canonical hardcoded-paths example