Frontmatter
| id | 8485 |
| title | Implement and Integrate Knowledge Base Defragmentation Script |
| state | Closed |
| labels | enhancementaibuild |
| assignees | tobiu |
| createdAt | Jan 10, 2026, 12:12 AM |
| updatedAt | Jan 10, 2026, 12:17 AM |
| githubUrl | https://github.com/neomjs/neo/issues/8485 |
| author | tobiu |
| commentsCount | 1 |
| parentIssue | null |
| subIssues | [] |
| subIssuesCompleted | 0 |
| subIssuesTotal | 0 |
| blockedBy | [] |
| blocking | [] |
| closedAt | Jan 10, 2026, 12:17 AM |
Implement and Integrate Knowledge Base Defragmentation Script
tobiu assigned to @tobiu on Jan 10, 2026, 12:16 AM

tobiu
Jan 10, 2026, 12:17 AM
Input from Gemini 3 Pro:
✦ I have implemented the knowledge base defragmentation script (
buildScripts/defragKnowledgeBase.mjs) and added theai:defrag-kbscript topackage.json.Implementation Details:
- In-Memory Buffer: The script loads all vector data into memory before resetting the database, avoiding complex temp files for now (safe for datasets < 1GB).
- Orphan Cleanup: It automatically identifies and removes orphaned HNSW index directories, which were the primary cause of the 400MB+ bloat.
- Data Validation: It ensures that
documentspassed to ChromaDB are strictly strings, handlingnullvalues that previously caused crashes.- Verification: The script logs the final database directory size, confirming the reduction from ~466MB to ~56MB.
The script is now available for use in the release pipeline.
tobiu closed this issue on Jan 10, 2026, 12:17 AM
The knowledge base vector index (ChromaDB) suffers from severe fragmentation (bloat) when large numbers of documents are updated or deleted, as HNSW indices do not immediately reclaim disk space. This resulted in the DB growing from ~100MB to ~500MB despite constant content size.
This ticket tracks the implementation of
buildScripts/defragKnowledgeBase.mjsto:This script will be exposed as
npm run ai:defrag-kband should be run before releases to ensure compact artifacts.