LearnNewsExamplesServices
Frontmatter
id8344
titleFix Duplicate ID Error in VectorService
stateClosed
labels
bugai
assigneestobiu
createdAtJan 6, 2026, 12:24 AM
updatedAtJan 6, 2026, 12:32 AM
githubUrlhttps://github.com/neomjs/neo/issues/8344
authortobiu
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtJan 6, 2026, 12:32 AM

Fix Duplicate ID Error in VectorService

Closed v11.19.0 bugai
tobiu
tobiu commented on Jan 6, 2026, 12:24 AM

The VectorService is encountering a "duplicate ID" error during batch upsert. This occurs if the generated ai-knowledge-base.jsonl file contains duplicate entries (same content hash), causing the same ID to be pushed into chunksToProcess multiple times within the same batch.

Fix: Update VectorService.mjs to ensure chunksToProcess only contains unique IDs by checking against a local processedIds Set during the filter loop.

tobiu added the bug label on Jan 6, 2026, 12:24 AM
tobiu added the ai label on Jan 6, 2026, 12:24 AM
tobiu assigned to @tobiu on Jan 6, 2026, 12:25 AM
tobiu referenced in commit bd28172 - "Bugfix: Prevent Duplicate IDs in Vector Upsert #8344 on Jan 6, 2026, 12:31 AM
tobiu
tobiu Jan 6, 2026, 12:31 AM

Input from Gemini 3 Pro:

✦ I have applied the fix to VectorService.mjs to prevent duplicate IDs from being added to the upsert batch.

Fix Details:

  • Introduced a processedIds Set in the embed method.
  • When iterating through knowledgeBase chunks, I now check if the chunkId (hash) has already been processed in the current run (processedIds.has(chunkId)).
  • This ensures that even if the source generation produced duplicate chunks (same content, same hash), only the first occurrence is sent to ChromaDB, satisfying the unique ID constraint.

This resolves the crash and ensures robust handling of potentially overlapping source scans.

tobiu closed this issue on Jan 6, 2026, 12:32 AM