What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	11625
title	KB Ingestion Phase 0/1: Contracts, Source/Parser Registry, KB Tenant Isolation
state	Closed
labels	epicaiarchitecture
assignees	neo-opus-ada
createdAt	May 19, 2026, 1:34 PM
updatedAt	Jun 7, 2026, 7:13 PM
githubUrl	https://github.com/neomjs/neo/issues/11625
author	neo-opus-ada
commentsCount	3
parentIssue	11624
subIssues	11629 Phase 0/1A — Ingestion Contracts: parsed-chunk-v1 + backup-record-v1 + Path-Identity Tuple + Tombstone Spec 11630 Phase 0/1B — Source/Parser Registry Extraction + Per-Source Path Externalization + Byte-Equivalence Fixture 11631 Phase 0/1C — KB Tenant Isolation Write-Side: VectorService Server-Stamping + Tenant-Aware chunkId + Spoof-Rejection 11632 Phase 0/1D — KB Tenant Isolation Read-Side: QueryService/SearchService where-Filter + Fail-Closed Test Suite 11658 KB Ingestion Phase 0/1B: Source/Parser registry + useDefaultSources/useDefaultParsers configs 11660 KB Ingestion Phase 0/1B-β: Externalize per-source paths to aiConfig.sourcePaths
subIssuesCompleted	6
subIssuesTotal	6
blockedBy	[]
blocking	[x] 11626 KB Ingestion Phase 2: Ingestion Service + MCP Small-Batch Facade + Bulk Facade
closedAt	May 20, 2026, 12:57 PM

KB Ingestion Phase 0/1: Contracts, Source/Parser Registry, KB Tenant Isolation

Closed v13.0.0/archive-v13-0-0-chunk-12 epicaiarchitecture

neo-opus-ada commented on May 19, 2026, 1:34 PM

Context

Phase 0/1 Epic (parent of meta-Epic #11624 — Cloud-Native KB Ingestion for External Workspaces). Graduated from Discussion #11623. This phase ships the contracts (parsed-chunk-v1, backup-record-v1, path-identity tuple, tombstone/manifest/revision-boundary, registry, KB tenant isolation = memorySharing pattern applied to the knowledge-base Chroma collection) BEFORE the Phase 2 ingestion service implementation — substrate-correct shape per cross-family peer convergence.

Topology anchor: Per ADR 0003 — Chroma Topology Unified Only, KB + MC are SEPARATE MCP servers sharing ONE Chroma daemon but maintaining SEPARATE collections (knowledge-base, neo-agent-memory, neo-agent-sessions). This Phase adds tenant-scoping metadata + filter logic to the knowledge-base collection ONLY — no topology mutation, no storage relocation, no collection sharing.

Standalone win: enables same-server custom workspaces (useDefaultSources / useDefaultParsers configurability + custom source/parser registration) without network substrate. Sets the stable contract floor that Phase 2 facades can build on.

The Problem

Current KB substrate hardcodes single-repo assumption at multiple layers (Epic #11624 "Architectural Reality" section). This phase removes the structural blockers in the substrate FIRST, defining stable contracts before any new transport endpoint exists:

Hardcoded source-class array at DatabaseService.mjs:460-471
Per-Source hardcoded paths (e.g., ApiSource.sourceMap maps Neo-specific paths)
memorySharing enum is Memory-Core-only today (0 KB references; verified via grep). Pattern reused, infrastructure new.
Path-determinism baked into chunk emit + hydration (ApiSource.mjs:101-105 + SearchService.mjs:118-120)
Content-hash delta only deletes under full-corpus sync (VectorService.mjs:198-207); incremental push needs explicit deletion-signaling
importDatabase conflated with ingest — actually RESTORE-only (skips re-embedding); distinct contract from ingest

The Architectural Reality

This phase touches:

File	Change
`DatabaseService.mjs`	Replace hardcoded source array with data-driven registry; thread `useDefaultSources` / `useDefaultParsers` config
`source/Base.mjs`	Abstract `extract(writeStream, createHashFn)` contract PRESERVED — already clean
`source/ApiSource.mjs` + 9 sibling sources	Externalize hardcoded paths to config; data-driven registration
`VectorService.mjs`	Write-side: inject server-derived `{tenantId, visibility, originAgentIdentity?}` at `embed` upsert; tenant-aware `chunkId` hash derivation; reject/overwrite client-supplied tenant fields
`QueryService.mjs`	Read-side: inject `where: {tenantId: {$in: [<requester>, '<team-namespace>']}}` from authenticated AgentIdentity into every `collection.query()` call
`SearchService.mjs`	Tenant-aware hydration per Q12 (chunk-metadata-embedded vs server-mirror — choice deferred to Phase 2; Phase 0/1 only marks the boundary)
NEW `ai/services/knowledge-base/parser/parsed-chunk-v1.schema.json`	JSON Schema for client-side parser output
NEW `ai/services/knowledge-base/parser/backup-record-v1.schema.json`	JSON Schema formalizing existing `importDatabase` `{id, embedding, metadata, document}` shape

The Fix

1. Source/Parser Registry Extraction

Replace hardcoded sources array in DatabaseService.createKnowledgeBase() with data-driven registry consumed by aiConfig.knowledgeBase.sources (or similar). Default value preserves current 10-source set when useDefaultSources: true. useDefaultParsers: true similarly preserves current parser binding. Custom sources/parsers register via explicit API.

2. `parsed-chunk-v1` JSON Schema

Define at ai/services/knowledge-base/parser/parsed-chunk-v1.schema.json:

{
  "$id": "neo:parsed-chunk-v1",
  "schemaVersion": "1.0.0",
  "tenantId": "string",
  "repoSlug": "string",
  "rootKind": "neo-workspace | bare-repo | ...",
  "sourcePath": "string",
  "content": "string",
  "hashInputs": ["type","name","content","..."],
  "parserId": "string",
  "parserVersion": "semver-like",
  "kind": "module-context | class-properties | class-config | method | doc-section | skill-section | ...",
  "name": "string",
  "line_start": "integer?",
  "line_end": "integer?",
  "className": "string?",
  "extends": "string?",
  "customMeta": "object?"
}

Server-side validator rejects records missing required identity fields OR carrying an embedding field (routed to restore-only path).

3. `backup-record-v1` JSON Schema

Define at ai/services/knowledge-base/parser/backup-record-v1.schema.json formalizing existing {id, embedding, metadata, document} shape. Distinct contract from ingest. Used only by manageDatabaseBackup({action: 'import'}) (restore-only path).

4. Path-Identity Tuple

Document {tenantId, repoSlug, rootKind, sourcePath} semantics in learn/agentos/cloud-deployment/ (placeholder doc landed Phase 3; Phase 0/1 lands inline JSDoc references). chunk.metadata.source becomes the structured tuple, NOT the bare neoRootDir-relative string.

5. Tombstone / Manifest / Revision-Boundary Contract

Spec'd in same schema directory: three mutually-supporting deletion-signaling mechanisms:

Explicit tombstones ({deleted: [paths]})
Manifest snapshot ({manifestSnapshot: {pathsAfterPush}})
Revision boundary ({baseRevision, headRevision})

Phase 0/1 spec only; Phase 2 wires into endpoint.

6. memorySharing KB Port — Two Halves

Write-side (server-stamping invariant):

VectorService.embed upsert injects {tenantId, visibility, originAgentIdentity?} from server-authenticated AgentIdentity context
Ingestion path REJECTS or server-OVERWRITES client-supplied tenant/visibility fields (configurable: overwrite + warning log; REJECT escalation on spoof-rate telemetry threshold)
Tenant-aware chunkId hash derivation: hash includes tenantId + repoSlug so same source content under two tenants yields distinct ids
Neo's curated content tagged with shared tenantId constant (e.g., 'neo-shared'); per-tenant content tagged with <tenantId>

Read-side (retrieval filter):

QueryService.queryDocuments + SearchService inject where: {tenantId: {$in: [<requester>, '<team-namespace>']}} into every Chroma collection.query call
Filter context derived from server-side authenticated AgentIdentity, NOT untrusted client payload
Mirrors MemoryService.mjs:391-410 query-time policy filter pattern

7. Byte-Equivalence Fixture

Test fixture: current Neo source output (10 sources × current parsers × current paths) BEFORE registry extraction === output AFTER. Verifies migration doesn't regress retrieval quality + adding tenantId field doesn't perturb chunk-hash semantics for existing content (hash-derivation function backwards-compatible for Neo's tenantId).

8. Fail-Closed Test Suite

Required tenant-isolation tests:

Forged client tenantId rejected/overwritten through every public KB query facade
Forged client visibility rejected/overwritten
Tenant A cannot retrieve tenant B private chunks
Neo team (shared) chunks visible across tenants
Same sourcePath under two tenants → distinct chunk ids (no shadow attack)
Chunk-schema-v1 validation: rejects records carrying embedding field outside restore mode (forces routing through manageDatabaseBackup)

Acceptance Criteria

useDefaultSources / useDefaultParsers boolean configs in aiConfig (default true for zero-config Neo deployments)
Hardcoded source array at DatabaseService.mjs:460-471 replaced with data-driven registry
Per-source paths (ApiSource.sourceMap etc.) externalized to config
Custom source / custom parser registration API documented + tested
ai/services/knowledge-base/parser/parsed-chunk-v1.schema.json created with validator at service boundary
ai/services/knowledge-base/parser/backup-record-v1.schema.json created formalizing existing shape
Path-identity tuple {tenantId, repoSlug, rootKind, sourcePath} in chunk metadata
Tombstone / manifest / revision-boundary contract spec'd in parsed-chunk-v1 companion schema
VectorService.embed write-side server-stamping ({tenantId, visibility, originAgentIdentity?}) from authenticated AgentIdentity
VectorService.embed rejects or server-OVERWRITES client-supplied tenant fields (mode configurable; default: overwrite + warning log)
Tenant-aware chunkId hash derivation (hash includes tenantId + repoSlug)
QueryService.queryDocuments injects tenant/visibility where clause from authenticated AgentIdentity
SearchService hydration tenant-aware (Phase 2 may not implement retrieval flow before choosing Q12 hydration mode)
Byte-equivalence fixture passes: current Neo source output stable before/after registry extraction
Fail-closed test suite (8 cases above) passes
Unit tests under test/playwright/unit/ai/knowledge-base/
Integration tests with synthetic external-workspace fixtures (at minimum: mini-neo-workspace/, mini-custom-source/); ES5 + C++ fixtures can defer to Phase 2 if they require client-side parser-runner infrastructure

Out of Scope

KnowledgeBaseIngestionService singleton implementation → Phase 2
MCP tool ingestSourceFiles → Phase 2
Bulk facade (CLI/HTTP/streaming) → Phase 2
Q12 hydration mode choice (chunk-metadata-embedded vs server-mirror vs hybrid) → Phase 2 sub-ticket
ES5 + C++ workspace fixtures requiring client-side parser-runner → Phase 2 (depends on push pipeline)
Cloud deployment guide → Phase 3
Runtime tenant-registered server-side parser code (operator-installed/Neo-shipped/signed-package only; future WASM/tree-sitter sandboxing is a separate Discussion)

Avoided Traps

Trap	Why rejected
Implementing Phase 0/1 + Phase 2 in same PR	Contract-risk: endpoint-shape can drift before contracts stabilize. Substrate-correct ordering: contracts first.
Skipping byte-equivalence fixture	Adding `tenantId` field to chunk-hash inputs without verification = silent retrieval-quality regression for existing content
Application-layer retrieval filter as final shape	Vulnerable to bug-bypass (forgotten filter call = data leak). V1 application-layer is fine; Phase 2 reconsiders Chroma-layer hardening per Q13b lean Option C (hybrid).
Skipping fail-closed test suite	Server-stamping + spoof-rejection is a load-bearing SECURITY invariant; untested = unverified

Parent Epic: #11624
Origin Discussion: #11623 (archaeological source post-graduation)
Sibling Epic: #9999 (read-side multi-tenancy; pattern source for memorySharing KB port)
Related substrates: #10010, #10011, #10016, #10030, #10097, #10129, #10572

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

query_raw_memories({query: 'parsed-chunk-v1 schema KB ingestion contract'})
query_raw_memories({query: 'memorySharing KB port write-side stamping read-side filter'})
ask_knowledge_base({query: 'KB source parser registry default inheritance', type: 'src'})
Discussion #11623 §7 Phase 0/1 + §8 test substrate + §10 Graduation Criteria #12+#13 are the architectural source-of-authority
Begin with byte-equivalence fixture authoring + parsed-chunk-v1.schema.json draft — these have lowest-implementation-risk + serve as the substrate floor everything else builds on