What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	11628
title	KB Ingestion Phase 4: Operations + Observability for Cloud-Native Deployments
state	Closed
labels	epicaiarchitecture
assignees	[]
createdAt	May 19, 2026, 1:52 PM
updatedAt	Jun 7, 2026, 7:13 PM
githubUrl	https://github.com/neomjs/neo/issues/11628
author	neo-opus-ada
commentsCount	5
parentIssue	11624
subIssues	11639 Phase 4A — Per-Tenant Ingestion Observability Daemon (KBRecorderService Extension) 11640 Phase 4B — Manifest Reconciliation Daemon: Tenant-State vs Chroma-Actual Sync 11641 Phase 4C — Stale-Chunk Garbage Collection Daemon: Orphan Detection + Retention Enforcement 11642 Phase 4D — Operator Alerting Surface: Telemetry Thresholds → A2A + External Notification 11649 Per-substrate retention policy configuration for KB/MC backup mechanisms 11663 KB Ingestion Phase 4: Configurable bundle retention policy via aiConfig.backupRetention 11665 KB Ingestion Phase 4A: Multi-tenant ingestion observability daemon scaffold + telemetry schema 11711 KB reconciliation: force-push & manifest-orphan drift detection 11716 Audit kb-config daemon readability under GraphService RLS
subIssuesCompleted	9
subIssuesTotal	9
blockedBy	[x] 11626 KB Ingestion Phase 2: Ingestion Service + MCP Small-Batch Facade + Bulk Facade
blocking	[]
closedAt	May 21, 2026, 3:00 PM

KB Ingestion Phase 4: Operations + Observability for Cloud-Native Deployments

Closed v13.0.0/archive-v13-0-0-chunk-12 epicaiarchitecture

neo-opus-ada commented on May 19, 2026, 1:52 PM

Context

Phase 4 sub-Epic of meta-Epic #11624 (Cloud-Native KB Ingestion for External Workspaces). NOT originally in Discussion #11623; surfaced 2026-05-19 during post-graduation operator-directed sub-decomposition review. Operator-mention of "daemons etc." prompted explicit operational-substrate decomposition.

Blocked-by Phase 2 #11626 — most observability/reconciliation work needs the ingestion service + facades stable. Operator alerting + dashboard surfacing may begin in parallel once Phase 0/1 #11625 contracts land (daemon-scaffolding can start pre-Phase-2).

The Problem

Phase 2 ships the push pipeline; Phase 3 ships the operator-facing guide. What's missing for production cloud Agent OS deployments:

No per-tenant ingestion observability (push frequency, error rates, ingestion latency, embedding-budget burn)
No periodic state-reconciliation (catches missed tombstones; force-push detection beyond per-push payload)
No stale-chunk garbage collection (orphans from source removal, parser swap, retention policy)
No operator alerting surface (quota/error thresholds → A2A or external notification)

These concerns aren't covered by any existing phase. Without them, cloud operators have no visibility into the ingestion fleet's health + no recovery story for tenant state drift.

Architectural Substrate Precedent (V-B-A grounded)

Neo already has daemon-pattern substrate this Epic builds on:

ai/scripts/orchestrator-daemon.mjs — cross-daemon orchestration pattern
ai/scripts/swarm-heartbeat-daemon.mjs — A2A liveness pattern
ai/scripts/bridge-daemon.mjs — bridge pattern
ai/services/knowledge-base/KBRecorderService.mjs — already captures KB query telemetry; per its @summary: "persists every KB MCP tool invocation into the shared Memory Core SQLite database, then projects repeated ask_knowledge_base / query_documents questions into kb_query_faqs". Daemon-adjacent; observability-substrate-ready.
Neo.ai.daemons.services.GapInferenceEngine (referenced in KBRecorderService) — daemon namespace ai.daemons.services confirms architectural pattern

New observability daemons EXTEND KBRecorderService (multi-tenant telemetry collection) rather than introduce a new substrate; reconciliation + GC daemons follow the orchestrator-daemon pattern.

KB-as-Cache vs MC-as-Store (load-bearing invariant for Phase 4 daemons)

(Added 2026-05-19 per operator V-B-A on backup substrate symmetry framing post-PR-#11647 merge.)

A structural distinction between the two substrates this Epic operates on:

Knowledge Base is a cache+index over external sources. Neo's KB content is derivable from the Neo repo via npm run ai:sync-kb (full re-sync regenerates all chunks). Phase 2 cross-tenant ingestion preserves this property: each tenant's content originates from the tenant's repo and is recoverable via tenant-side re-push (hook re-run OR npm run ai:ingest-tenant <tenantId> bulk facade). KB wipe is recoverable from external sources at any v-version. The asymmetry-collapse-at-v13 framing initially proposed was incorrect.
Memory Core is a primary store. Conversations, agent-thoughts, and session-summaries are unique runtime artifacts with no external source-of-truth. MC wipe between backups IS amnesia — daily-daemon-driven JSONL bundles minimize the amnesia window to ≤24h but cannot eliminate it.

Phase 4 daemon value-prop reset (per the cache vs store distinction)

Phase 4B reconciliation daemon (#11640) value for KB is operational-cost-of-recovery reduction, NOT data-loss prevention. The daemon catches tenant-state drift WITHOUT requiring full re-sync orchestration across N tenants. For MC, the daemon has equivalent value to operational-cost reduction (catches missed-delete-signaling without requiring restore-from-bundle).

Phase 4C GC daemon (#11641) value is symmetric across KB and MC — both benefit from automated stale-chunk cleanup. The cache/store distinction doesn't change GC semantics.

Phase 4D alerting (#11642) surfaces wipe / drift events for BOTH substrates, but the severity calculus differs: a KB wipe alert is "orchestrate N tenant re-syncs"; an MC wipe alert is "amnesia event — cannot fully recover post-last-backup". Per-substrate severity threshold configuration follows from this.

Retention policy implications (per-substrate, not symmetric)

KB JSONL bundle: lighter retention acceptable (weekly cadence; defrag pre-nuke 7d unchanged) — backup is cost-optimization for re-sync orchestration, not data-loss prevention
MC JSONL bundle: daily cadence + 3-30 day retention (status quo) — backup IS data-loss prevention

This per-substrate retention asymmetry should be configurable via aiConfig.{knowledgeBase,memoryCore}.backupRetention.* — follow-up ticket scope (deferred to Phase 4 implementation; let implementer-hot-context shape).

The Fix

Four sub-tickets:

Phase 4A — Per-tenant ingestion observability daemon (extends KBRecorderService with multi-tenant telemetry; persists per-tenant metrics to Memory Core SQLite; surfaces via existing portal app OR sandman_handoff.md health section)
Phase 4B — Manifest reconciliation daemon (periodic tenant-claimed-state vs Chroma-actual-state reconciliation; catches missed tombstones; handles force-push edge cases)
Phase 4C — Stale-chunk garbage collection daemon (detects chunks orphaned by source-config changes, parser swaps, retention-policy expiration)
Phase 4D — Operator alerting surface (telemetry thresholds → A2A notification OR external channel; per-tenant quota tracking; error-rate alerts)

Acceptance Criteria

Cross-phase ACs:

All 4 sub-tickets filed with explicit cross-references back to this Epic
Daemons follow existing ai/scripts/ daemon pattern (orchestrator-daemon precedent)
Telemetry persists to shared Memory Core SQLite (KBRecorderService extension pattern, NOT new database)
Operator-facing surfaces integrate with existing portal app OR sandman_handoff.md (not new dashboard infrastructure)
Per-tenant retention policy enforceable via tenant config (Q5 from Discussion #11623 — tenant config storage)
Test coverage: each daemon has unit tests + integration tests against synthetic multi-tenant ingestion fixtures (reuse Phase 2 test fixtures)

Out of Scope

New dashboard infrastructure (extend portal app or sandman_handoff; don't build standalone)
Per-tenant SLA / quota enforcement engine (this Epic surfaces the data; SLA enforcement is post-V1 commercialization scope)
Cross-deployment fleet management (single-deployment scope for V1; multi-deployment is separate substrate)
ML-driven anomaly detection (rule-based thresholds for V1; ML can be layered later)

Avoided Traps

Trap	Why rejected
New telemetry database	KBRecorderService already uses Memory Core SQLite; reuse the substrate
Standalone dashboard infrastructure	Portal app + sandman_handoff exist; surface there rather than fork the operability story
Reconciliation as user-on-demand only	Force-push + missed-tombstone classes need PROACTIVE detection; on-demand reconciliation misses production-class failures
Mixing observability + alerting in one daemon	Concerns separate: telemetry collection ≠ threshold alerting; split into independent daemons for testability
Building before Phase 2 service exists	Observability daemon needs ingestion-service hooks; reconciliation needs push-pipeline state. Sequence: Phase 2 first, then Phase 4 (most subs blocked-by Phase 2)

Parent meta-Epic: #11624
Blocked-by: Phase 2 #11626 (ingestion service + facades must stabilize)
Sibling phases: Phase 0/1 #11625, Phase 3 #11627
Daemon precedents: orchestrator-daemon, swarm-heartbeat-daemon, bridge-daemon
Telemetry substrate: KBRecorderService.mjs (extension target)
Origin Discussion: #11623 (Phase 4 not in original §7 decomposition; surfaced post-graduation)

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

query_raw_memories({query: 'Phase 4 KB ingestion observability daemon multi-tenant'})
ask_knowledge_base({query: 'KBRecorderService GapInferenceEngine daemon pattern', type: 'src'})
Existing daemons: ai/scripts/orchestrator-daemon.mjs, swarm-heartbeat-daemon.mjs, bridge-daemon.mjs are pattern references
Operator framing 2026-05-19: "future sessions have amnesia ... better iron out subs now while context is hot" — applied to surfacing operational-substrate that the Discussion didn't decompose

tobiu referenced in commit 3c47411 - "feat(ai): configurable bundle retention via aiConfig.backupRetention (#11663) (#11664) on May 20, 2026, 8:00 AM

tobiu referenced in commit 5d64a1f - "feat(ai): KB ingestion telemetry schema + recordIngestionMetric API (#11639) (#11667) on May 20, 2026, 8:01 AM

tobiu referenced in commit ad4a108 - "feat(ai): KB Multi-Tenant Health section in Sandman handoff (#11639) (#11708) on May 21, 2026, 8:03 AM