Context
KB restore from backup-2026-05-16T13-08-06.565Z and backup-2026-05-19T13-08-14.283Z fails with:
❌ Restore failed: Error: DATABASE_IMPORT_ERROR: Expected each document to be a string, but got object
at DatabaseService.importDatabase (file:///Users/Shared/github/neomjs/neo/ai/services/knowledge-base/DatabaseService.mjs:345:33)Discovered while executing operator-directed MC recovery on 2026-05-19. KB restore is the first embedded substrate in the bundle order; the error fired before MC could be touched. Worked around with --only-substrate=mc for the urgent recovery, but KB restore is now structurally broken for any bundle.
The Problem
KB chunks are stored in Chroma with the chunk body in metadata.content, NOT in Chroma's primary document field. This is the intentional architectural pattern — verified empirically:
Inspection of backup backup-2026-05-19T13-08-14.283Z/kb/...jsonl (24,418 records) |
document: null — 24,418 records (100%) |
document: string — 0 records |
metadata contains content, source, name, hash, kind, className, etc. |
The export side at ai/services/knowledge-base/DatabaseService.mjs:200 correctly writes document: batch.documents[i] (the actual Chroma state — null). The import side at DatabaseService.mjs:332 forwards null into Chroma's upsert() documents array; Chroma rejects with the error above (Chroma requires document strings, not nulls).
This means every KB backup ever taken cannot be restored with the current importDatabase code path, since KB's steady-state Chroma shape always has null documents.
The Architectural Reality
Export-side (correct, no change):
const record = {
id : batch.ids[i],
embedding: batch.embeddings[i],
metadata : batch.metadatas[i],
document : batch.documents[i]
};Import-side (broken at the boundary):
const BATCH_SIZE = 500;
for (let i = 0; i < records.length; i += BATCH_SIZE) {
const batch = records.slice(i, i + BATCH_SIZE);
await collection.upsert({
ids : batch.map(r => r.id),
embeddings: batch.map(r => r.embedding),
metadatas : batch.map(r => r.metadata),
documents : batch.map(r => r.document)
});
imported += batch.length;
}Architectural anchor: KB chunks live alongside the recent Phase 0/1A parsed-chunk-v1 contract which also routes chunk text through metadata.content. The pattern is uniform — Chroma's document field is unused on the KB side. MC memories are the inverse (document is the memory body string), which is why MC restore works fine.
The Fix
Make importDatabase Chroma-shape-agnostic at the boundary: pass documents only when at least one record has a non-null document; otherwise omit the field. This handles both shapes uniformly:
const BATCH_SIZE = 500;
for (let i = 0; i < records.length; i += BATCH_SIZE) {
const batch = records.slice(i, i + BATCH_SIZE);
const upsertArgs = {
ids : batch.map(r => r.id),
embeddings: batch.map(r => r.embedding),
metadatas : batch.map(r => r.metadata)
};
const hasDocs = batch.some(r => r.document != null);
if (hasDocs) {
upsertArgs.documents = batch.map(r => r.document ?? '');
}
await collection.upsert(upsertArgs);
imported += batch.length;
}The empty-string fallback (?? '') inside the hasDocs branch handles mixed batches (rare but possible — a single non-null forces the field; remaining nulls become empty strings to satisfy Chroma's per-array-element string requirement). Pure-null batches skip the documents key entirely so Chroma never sees a null.
Acceptance Criteria
Out of Scope
- Re-engineering KB to store chunk content in Chroma's
document field instead of metadata.content (the current pattern is intentional + symmetric with parsed-chunk-v1 / Phase 0/1A contracts).
- Backup-shape schema validation (
backup-record-v1.schema.json introduced in #11647 already declares document as nullable; this ticket fixes the consumer, not the contract).
- The unrelated MC wipe incident (separate recovery executed this turn; substrate hardening tracked in #11652).
Avoided Traps
| Trap |
Why rejected |
Fix export side to substitute null → '' |
Mutates a true representation of Chroma state into a synthetic one. Round-trip identity-tuple checks would diverge. Import-side fix preserves source-of-truth. |
Require document non-null in backup-record-v1.schema.json |
Would invalidate every backup ever taken (KB shape always has null). The schema correctly declares document as ["string", "null"]. |
| Throw a clearer error message but require manual fix per backup |
Backups are operator-recovery substrate. They must restore without manual intervention. |
Related
- Sibling architectural pattern: Phase 0/1A
parsed-chunk-v1 schema (#11625 / #11647 merged) — chunk text routes through metadata, not Chroma's document.
- Backup-record schema:
ai/services/knowledge-base/parser/backup-record-v1.schema.json (shipped in #11647)
- Export side (no change needed):
ai/services/knowledge-base/DatabaseService.mjs:195-201
- Discovered during: MC wipe recovery this turn (#11652 hardening covers the underlying wipe-prevention).
Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838
Handoff Retrieval Hints
query_raw_memories({query: 'KB importDatabase document null Chroma upsert error'})
ask_knowledge_base({query: 'knowledge base chunk content metadata not document', type: 'src'})
- Empirical anchor: 24,418/24,418 KB backup records have
document: null in backup-2026-05-19T13-08-14.283Z
- Restore reproducer:
npm run ai:restore -- .neo-ai-data/backups/<any-bundle> --mode merge --only-substrate=kb → DATABASE_IMPORT_ERROR
Context
KB restore from
backup-2026-05-16T13-08-06.565Zandbackup-2026-05-19T13-08-14.283Zfails with:❌ Restore failed: Error: DATABASE_IMPORT_ERROR: Expected each document to be a string, but got object at DatabaseService.importDatabase (file:///Users/Shared/github/neomjs/neo/ai/services/knowledge-base/DatabaseService.mjs:345:33)Discovered while executing operator-directed MC recovery on 2026-05-19. KB restore is the first embedded substrate in the bundle order; the error fired before MC could be touched. Worked around with
--only-substrate=mcfor the urgent recovery, but KB restore is now structurally broken for any bundle.The Problem
KB chunks are stored in Chroma with the chunk body in
metadata.content, NOT in Chroma's primarydocumentfield. This is the intentional architectural pattern — verified empirically:backup-2026-05-19T13-08-14.283Z/kb/...jsonl(24,418 records)document: null— 24,418 records (100%)document: string— 0 recordsmetadatacontainscontent,source,name,hash,kind,className, etc.The export side at
ai/services/knowledge-base/DatabaseService.mjs:200correctly writesdocument: batch.documents[i](the actual Chroma state — null). The import side atDatabaseService.mjs:332forwards null into Chroma'supsert()documents array; Chroma rejects with the error above (Chroma requires document strings, not nulls).This means every KB backup ever taken cannot be restored with the current
importDatabasecode path, since KB's steady-state Chroma shape always has null documents.The Architectural Reality
Export-side (correct, no change):
// ai/services/knowledge-base/DatabaseService.mjs:195-201 const record = { id : batch.ids[i], embedding: batch.embeddings[i], metadata : batch.metadatas[i], document : batch.documents[i] // null for KB chunks — by design };Import-side (broken at the boundary):
// ai/services/knowledge-base/DatabaseService.mjs:326-334 const BATCH_SIZE = 500; for (let i = 0; i < records.length; i += BATCH_SIZE) { const batch = records.slice(i, i + BATCH_SIZE); await collection.upsert({ ids : batch.map(r => r.id), embeddings: batch.map(r => r.embedding), metadatas : batch.map(r => r.metadata), documents : batch.map(r => r.document) // ← null array → Chroma rejects }); imported += batch.length; }Architectural anchor: KB chunks live alongside the recent Phase 0/1A
parsed-chunk-v1contract which also routes chunk text throughmetadata.content. The pattern is uniform — Chroma'sdocumentfield is unused on the KB side. MC memories are the inverse (documentis the memory body string), which is why MC restore works fine.The Fix
Make
importDatabaseChroma-shape-agnostic at the boundary: passdocumentsonly when at least one record has a non-null document; otherwise omit the field. This handles both shapes uniformly:// ai/services/knowledge-base/DatabaseService.mjs:326-334 const BATCH_SIZE = 500; for (let i = 0; i < records.length; i += BATCH_SIZE) { const batch = records.slice(i, i + BATCH_SIZE); const upsertArgs = { ids : batch.map(r => r.id), embeddings: batch.map(r => r.embedding), metadatas : batch.map(r => r.metadata) }; const hasDocs = batch.some(r => r.document != null); if (hasDocs) { upsertArgs.documents = batch.map(r => r.document ?? ''); } await collection.upsert(upsertArgs); imported += batch.length; }The empty-string fallback (
?? '') inside thehasDocsbranch handles mixed batches (rare but possible — a single non-null forces the field; remaining nulls become empty strings to satisfy Chroma's per-array-element string requirement). Pure-null batches skip the documents key entirely so Chroma never sees a null.Acceptance Criteria
ai/services/knowledge-base/DatabaseService.mjs:326-334importDatabasehandlesdocument: nullrecords without throwing.npm run ai:restore -- <bundle-path> --mode merge --only-substrate=kbsucceeds againstbackup-2026-05-19T13-08-14.283Z(24,418 KB chunks restored). [L3-deferred — operator handoff needed] (agent sandbox cannot drive a real chromadb-restore against the live bundle at PR-build time; PR #11657 ships L2 unit-test coverage at the import-boundary including a real-backup-shape reproducer).test/playwright/unit/ai/services/knowledge-base/covers three batch shapes — all-null documents (KB shape), all-string documents (MC-style hypothetical), mixed null+string.Memory_DatabaseService.importDatabaseis a peer-method; verify whichever import path MC uses has the same fix if it has the same shape).Out of Scope
documentfield instead ofmetadata.content(the current pattern is intentional + symmetric withparsed-chunk-v1/ Phase 0/1A contracts).backup-record-v1.schema.jsonintroduced in #11647 already declaresdocumentas nullable; this ticket fixes the consumer, not the contract).Avoided Traps
null→''documentnon-null inbackup-record-v1.schema.json["string", "null"].Related
parsed-chunk-v1schema (#11625 / #11647 merged) — chunk text routes through metadata, not Chroma's document.ai/services/knowledge-base/parser/backup-record-v1.schema.json(shipped in #11647)ai/services/knowledge-base/DatabaseService.mjs:195-201Origin Session ID
7360e917-1733-4cdd-a6f3-5ac51c34b838Handoff Retrieval Hints
query_raw_memories({query: 'KB importDatabase document null Chroma upsert error'})ask_knowledge_base({query: 'knowledge base chunk content metadata not document', type: 'src'})document: nullinbackup-2026-05-19T13-08-14.283Znpm run ai:restore -- .neo-ai-data/backups/<any-bundle> --mode merge --only-substrate=kb→DATABASE_IMPORT_ERROR