What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	11653
title	KB importDatabase rejects backups with `document: null` (steady-state KB shape)
state	Closed
labels	bugai
assignees	neo-opus-ada
createdAt	May 19, 2026, 5:37 PM
updatedAt	Jun 7, 2026, 7:13 PM
githubUrl	https://github.com/neomjs/neo/issues/11653
author	neo-opus-ada
commentsCount	0
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 20, 2026, 2:50 AM

KB importDatabase rejects backups with `document: null` (steady-state KB shape)

Closed v13.0.0/archive-v13-0-0-chunk-12 bugai

neo-opus-ada commented on May 19, 2026, 5:37 PM

Context

KB restore from backup-2026-05-16T13-08-06.565Z and backup-2026-05-19T13-08-14.283Z fails with:

❌ Restore failed: Error: DATABASE_IMPORT_ERROR: Expected each document to be a string, but got object
    at DatabaseService.importDatabase (file:///Users/Shared/github/neomjs/neo/ai/services/knowledge-base/DatabaseService.mjs:345:33)

Discovered while executing operator-directed MC recovery on 2026-05-19. KB restore is the first embedded substrate in the bundle order; the error fired before MC could be touched. Worked around with --only-substrate=mc for the urgent recovery, but KB restore is now structurally broken for any bundle.

The Problem

KB chunks are stored in Chroma with the chunk body in metadata.content, NOT in Chroma's primary document field. This is the intentional architectural pattern — verified empirically:

Inspection of backup `backup-2026-05-19T13-08-14.283Z/kb/...jsonl` (24,418 records)
`document: null` — 24,418 records (100%)
`document: string` — 0 records
`metadata` contains `content`, `source`, `name`, `hash`, `kind`, `className`, etc.

The export side at ai/services/knowledge-base/DatabaseService.mjs:200 correctly writes document: batch.documents[i] (the actual Chroma state — null). The import side at DatabaseService.mjs:332 forwards null into Chroma's upsert() documents array; Chroma rejects with the error above (Chroma requires document strings, not nulls).

This means every KB backup ever taken cannot be restored with the current importDatabase code path, since KB's steady-state Chroma shape always has null documents.

The Architectural Reality

Export-side (correct, no change):

// ai/services/knowledge-base/DatabaseService.mjs:195-201
const record = {
    id       : batch.ids[i],
    embedding: batch.embeddings[i],
    metadata : batch.metadatas[i],
    document : batch.documents[i]  // null for KB chunks — by design
};

Import-side (broken at the boundary):

// ai/services/knowledge-base/DatabaseService.mjs:326-334
const BATCH_SIZE = 500;
for (let i = 0; i < records.length; i += BATCH_SIZE) {
    const batch = records.slice(i, i + BATCH_SIZE);
    await collection.upsert({
        ids       : batch.map(r => r.id),
        embeddings: batch.map(r => r.embedding),
        metadatas : batch.map(r => r.metadata),
        documents : batch.map(r => r.document)  // ← null array → Chroma rejects
    });
    imported += batch.length;
}

Architectural anchor: KB chunks live alongside the recent Phase 0/1A parsed-chunk-v1 contract which also routes chunk text through metadata.content. The pattern is uniform — Chroma's document field is unused on the KB side. MC memories are the inverse (document is the memory body string), which is why MC restore works fine.

The Fix

Make importDatabase Chroma-shape-agnostic at the boundary: pass documents only when at least one record has a non-null document; otherwise omit the field. This handles both shapes uniformly:

// ai/services/knowledge-base/DatabaseService.mjs:326-334
const BATCH_SIZE = 500;
for (let i = 0; i < records.length; i += BATCH_SIZE) {
    const batch    = records.slice(i, i + BATCH_SIZE);
    const upsertArgs = {
        ids       : batch.map(r => r.id),
        embeddings: batch.map(r => r.embedding),
        metadatas : batch.map(r => r.metadata)
    };
    const hasDocs = batch.some(r => r.document != null);
    if (hasDocs) {
        upsertArgs.documents = batch.map(r => r.document ?? '');
    }
    await collection.upsert(upsertArgs);
    imported += batch.length;
}

The empty-string fallback (?? '') inside the hasDocs branch handles mixed batches (rare but possible — a single non-null forces the field; remaining nulls become empty strings to satisfy Chroma's per-array-element string requirement). Pure-null batches skip the documents key entirely so Chroma never sees a null.

Acceptance Criteria

ai/services/knowledge-base/DatabaseService.mjs:326-334 importDatabase handles document: null records without throwing.
End-to-end: npm run ai:restore -- <bundle-path> --mode merge --only-substrate=kb succeeds against backup-2026-05-19T13-08-14.283Z (24,418 KB chunks restored). [L3-deferred — operator handoff needed] (agent sandbox cannot drive a real chromadb-restore against the live bundle at PR-build time; PR #11657 ships L2 unit-test coverage at the import-boundary including a real-backup-shape reproducer).
Unit test: new spec under test/playwright/unit/ai/services/knowledge-base/ covers three batch shapes — all-null documents (KB shape), all-string documents (MC-style hypothetical), mixed null+string.
Round-trip parity: export → restore on a fresh test collection produces byte-identical chunk count + identity-tuple membership. [L3-deferred — operator handoff needed] (sandbox cannot drive live KB collection export-then-restore round-trip at PR-build time; PR #11657 verifies the import-boundary semantics that the round-trip depends on).
No regression to MC restore path (Memory_DatabaseService.importDatabase is a peer-method; verify whichever import path MC uses has the same fix if it has the same shape).

Out of Scope

Re-engineering KB to store chunk content in Chroma's document field instead of metadata.content (the current pattern is intentional + symmetric with parsed-chunk-v1 / Phase 0/1A contracts).
Backup-shape schema validation (backup-record-v1.schema.json introduced in #11647 already declares document as nullable; this ticket fixes the consumer, not the contract).
The unrelated MC wipe incident (separate recovery executed this turn; substrate hardening tracked in #11652).

Avoided Traps

Trap	Why rejected
Fix export side to substitute `null` → `''`	Mutates a true representation of Chroma state into a synthetic one. Round-trip identity-tuple checks would diverge. Import-side fix preserves source-of-truth.
Require `document` non-null in `backup-record-v1.schema.json`	Would invalidate every backup ever taken (KB shape always has null). The schema correctly declares document as `["string", "null"]`.
Throw a clearer error message but require manual fix per backup	Backups are operator-recovery substrate. They must restore without manual intervention.

Sibling architectural pattern: Phase 0/1A parsed-chunk-v1 schema (#11625 / #11647 merged) — chunk text routes through metadata, not Chroma's document.
Backup-record schema: ai/services/knowledge-base/parser/backup-record-v1.schema.json (shipped in #11647)
Export side (no change needed): ai/services/knowledge-base/DatabaseService.mjs:195-201
Discovered during: MC wipe recovery this turn (#11652 hardening covers the underlying wipe-prevention).

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

query_raw_memories({query: 'KB importDatabase document null Chroma upsert error'})
ask_knowledge_base({query: 'knowledge base chunk content metadata not document', type: 'src'})
Empirical anchor: 24,418/24,418 KB backup records have document: null in backup-2026-05-19T13-08-14.283Z
Restore reproducer: npm run ai:restore -- .neo-ai-data/backups/<any-bundle> --mode merge --only-substrate=kb → DATABASE_IMPORT_ERROR

tobiu referenced in commit fdfb48f - "fix(ai): KB importDatabase tolerates null document field (#11653) (#11657) on May 20, 2026, 2:50 AM

tobiu closed this issue on May 20, 2026, 2:50 AM