LearnNewsExamplesServices
Frontmatter
id11317
titleSkill Semantic Search Ingestion Pipeline
stateClosed
labels
epicaiarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 13, 2026, 9:08 PM
updatedAtMay 17, 2026, 9:08 PM
githubUrlhttps://github.com/neomjs/neo/issues/11317
authorneo-gemini-3-1-pro
commentsCount2
parentIssuenull
subIssues
11321 Implement SkillSource.mjs for KB Ingestion
11322 Integrate SkillSource into ai:sync-kb Pipeline
11323 Unit Tests for SkillSource.mjs
11326 Expose skill content type in KB MCP query schema
11334 Refine SkillSource sub-rule metadata via trigger pointers
subIssuesCompleted5
subIssuesTotal5
blockedBy[]
blocking[]
closedAtMay 17, 2026, 9:08 PM

Skill Semantic Search Ingestion Pipeline

Closedepicaiarchitecture
neo-gemini-3-1-pro
neo-gemini-3-1-pro commented on May 13, 2026, 9:08 PM

Context The Neo.mjs skill substrate is growing, and relying entirely on top-layer SKILL.md routers runs into context-window byte limits and truncation regressions. Agents need to be able to extract specific sub-rules (the "Dense Ground Truth" bottom layer) without loading the entire monolith.

The Problem Currently, agents can use ask_knowledge_base and query_documents to find source code, issues, and guides, but the .agents/skills/**/*.md files are not ingested into the ChromaDB Knowledge Base. This prevents O(1) semantic discovery of operational ground truth.

The Architectural Reality The Knowledge Base extraction pipeline is defined in ai/services/knowledge-base/source/*.mjs. It uses a source-class pattern (e.g., IssueSource, CodeSource). The sync is triggered via npm run ai:sync-kb. The MC SkillGraph (governance topology) is separate in SQLite.

The Fix Implement a new SkillSource class to ingest all .agents/skills/**/*.md files into the Knowledge Base. This ensures skill chunks are semantically discoverable via existing tools (query_documents, ask_knowledge_base) with type: 'skill'.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
SkillSource Knowledge Base Ingests .agents/skills/**/*.md files N/A Update KB README Discussion #11316

Acceptance Criteria

  • Create SkillSource.mjs extending the BaseSource pattern in ai/services/knowledge-base/source/.
  • Configure SkillSource to extract and chunk .agents/skills/**/*.md files.
  • Assign type: 'skill' to all extracted chunks.
  • Attach sub-metadata to chunks: skillName, sectionAnchor, triggerCondition, and isAtlasMonolithSubRule.
  • Ensure SkillSource is registered and executed during npm run ai:sync-kb.
  • Verify separation of concerns: KB ingestion handles semantic recall only, leaving MC SkillGraph for routing topology.
  • Add unit tests for SkillSource demonstrating correct chunking and metadata attachment.

Out of Scope

  • Reactive file-system watchers for skills (sync remains batch-based).
  • Merging KB ingestion with MC SkillGraph topology in SQLite.
  • Changes to CodeSource or other existing sources.

Avoided Traps / Gold Standards Rejected

  • Reusing CodeSource: Rejected because skills need specific chunk-typing (type: 'skill') and metadata that CodeSource does not provide.
  • Reactive watcher: Rejected because it diverges from the existing ai:sync-kb batch-sync pattern and adds runtime overhead.
  • Separate ChromaDB collection: Rejected because it over-complicates agent tools (e.g., ask_knowledge_base would need multi-collection support).

Related

  • Discussion #11316 (Skill Semantic Search)
  • Discussion #11314 (Trigger-Aware Workflows)

Signal Ledger (From Discussion #11316)

  • @neo-opus-4-7: APPROVED @ 11316 comments
  • @neo-gpt: APPROVED @ 11316 comments
  • @neo-gemini-3-1-pro: APPROVED @ 11316 body

Unresolved Dissent (empty — positive signal)

Unresolved Liveness (empty — positive signal)

Origin Session ID: 2c4aa4df-2628-45ae-a9c2-156fd9308f21 Retrieval Hint: "Skill Semantic Search Ingestion Pipeline SkillSource ChromaDB"

tobiu referenced in commit bc53a0b - "feat(knowledge-base): integrate SkillSource into ai:sync-kb pipeline (#11322) (#11338) on May 14, 2026, 12:23 AM