Frontmatter
| id | 7903 |
| title | Refactor Knowledge Base crawling logic into Source Providers |
| state | Closed |
| labels | airefactoring |
| assignees | tobiu |
| createdAt | Nov 25, 2025, 4:44 PM |
| updatedAt | Nov 25, 2025, 5:16 PM |
| githubUrl | https://github.com/neomjs/neo/issues/7903 |
| author | tobiu |
| commentsCount | 1 |
| parentIssue | null |
| subIssues | [] |
| subIssuesCompleted | 0 |
| subIssuesTotal | 0 |
| blockedBy | [] |
| blocking | [] |
| closedAt | Nov 25, 2025, 5:16 PM |
Refactor Knowledge Base crawling logic into Source Providers
tobiu assigned to @tobiu on Nov 25, 2025, 4:44 PM

tobiu
Nov 25, 2025, 5:14 PM
Input from Neo Agent:
◆ I have successfully implemented the refactoring of the Knowledge Base parsing and crawling logic.
Verification Results:
- Architecture: The system now uses dedicated
ParserandSourcesingletons, decoupling content extraction from theDatabaseService.- Parsing: Playwright tests are correctly decomposed into granular chunks with line number metadata.
- Crawling: The new
Sourceclasses (ApiSource,LearningSource, etc.) correctly iterate their respective domains.- Sync: A full manual database synchronization was performed, confirming that the new architecture correctly rebuilds the
ai-knowledge-base.jsonlfile and updates ChromaDB (7448 chunks).The codebase is now significantly cleaner, adhering to SRP and better prepared for future content types.
tobiu closed this issue on Nov 25, 2025, 5:16 PM
Goal: Further decouple
DatabaseService.mjsby extracting the crawling and iteration logic into dedicatedSourceclasses. This respects the Single Responsibility Principle:DatabaseServiceshould manage the DB, while Sources should know how to find and extract content.Architecture:
Neo.ai.mcp.server.knowledge-base.source.ai/mcp/server/knowledge-base/source/(Sibling toservices/andparser/).New Classes:
source/Base.mjs: Abstract base class defining theextract(writeStream)contract.source/ApiSource.mjs: Handlesdocs/output/all.jsoniteration -> UsesApiParser.source/LearningSource.mjs: Handleslearn/tree.jsonandlearn/traversal -> UsesDocumentationParser.source/ReleaseNotesSource.mjs: Handles.github/RELEASE_NOTESscanning.source/TicketSource.mjs: Handles.github/ISSUE_ARCHIVEscanning.source/TestSource.mjs: Handlestest/playwrightscanning -> UsesTestParser.Impact:
DatabaseService.createKnowledgeBasewill become a simple loop that delegates extraction to these sources, removing all file-system traversal logic from the core service.