LearnNewsExamplesServices
Frontmatter
id7903
titleRefactor Knowledge Base crawling logic into Source Providers
stateClosed
labels
airefactoring
assigneestobiu
createdAtNov 25, 2025, 4:44 PM
updatedAtNov 25, 2025, 5:16 PM
githubUrlhttps://github.com/neomjs/neo/issues/7903
authortobiu
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtNov 25, 2025, 5:16 PM

Refactor Knowledge Base crawling logic into Source Providers

Closed v11.11.0 airefactoring
tobiu
tobiu commented on Nov 25, 2025, 4:44 PM

Goal: Further decouple DatabaseService.mjs by extracting the crawling and iteration logic into dedicated Source classes. This respects the Single Responsibility Principle: DatabaseService should manage the DB, while Sources should know how to find and extract content.

Architecture:

  • Create a new namespace Neo.ai.mcp.server.knowledge-base.source.
  • File Location: ai/mcp/server/knowledge-base/source/ (Sibling to services/ and parser/).
  • Each Source class should be a Neo.core.Base singleton.

New Classes:

  1. source/Base.mjs: Abstract base class defining the extract(writeStream) contract.
  2. source/ApiSource.mjs: Handles docs/output/all.json iteration -> Uses ApiParser.
  3. source/LearningSource.mjs: Handles learn/tree.json and learn/ traversal -> Uses DocumentationParser.
  4. source/ReleaseNotesSource.mjs: Handles .github/RELEASE_NOTES scanning.
  5. source/TicketSource.mjs: Handles .github/ISSUE_ARCHIVE scanning.
  6. source/TestSource.mjs: Handles test/playwright scanning -> Uses TestParser.

Impact: DatabaseService.createKnowledgeBase will become a simple loop that delegates extraction to these sources, removing all file-system traversal logic from the core service.

tobiu added the ai label on Nov 25, 2025, 4:44 PM
tobiu added the refactoring label on Nov 25, 2025, 4:44 PM
tobiu assigned to @tobiu on Nov 25, 2025, 4:44 PM
tobiu referenced in commit 9dd9549 - "Refactor Knowledge Base crawling logic into Source Providers #7903" on Nov 25, 2025, 5:13 PM
tobiu
tobiu Nov 25, 2025, 5:14 PM

Input from Neo Agent:

◆ I have successfully implemented the refactoring of the Knowledge Base parsing and crawling logic.

Verification Results:

  • Architecture: The system now uses dedicated Parser and Source singletons, decoupling content extraction from the DatabaseService.
  • Parsing: Playwright tests are correctly decomposed into granular chunks with line number metadata.
  • Crawling: The new Source classes (ApiSource, LearningSource, etc.) correctly iterate their respective domains.
  • Sync: A full manual database synchronization was performed, confirming that the new architecture correctly rebuilds the ai-knowledge-base.jsonl file and updates ChromaDB (7448 chunks).

The codebase is now significantly cleaner, adhering to SRP and better prepared for future content types.

tobiu closed this issue on Nov 25, 2025, 5:16 PM