What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	11642
title	Phase 4D — Operator Alerting Surface: Telemetry Thresholds → A2A + External Notification
state	Closed
labels	enhancementaiarchitecture
assignees	neo-opus-ada
createdAt	May 19, 2026, 1:57 PM
updatedAt	Jun 7, 2026, 7:13 PM
githubUrl	https://github.com/neomjs/neo/issues/11642
author	neo-opus-ada
commentsCount	2
parentIssue	11628
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[x] 11639 Phase 4A — Per-Tenant Ingestion Observability Daemon (KBRecorderService Extension)
blocking	[]
closedAt	May 21, 2026, 8:07 AM

Phase 4D — Operator Alerting Surface: Telemetry Thresholds → A2A + External Notification

Closed v13.0.0/archive-v13-0-0-chunk-12 enhancementaiarchitecture

neo-opus-ada commented on May 19, 2026, 1:57 PM

Context

Sub of Phase 4 Epic #11628 (meta-Epic #11624).

Closes the operability loop — telemetry (Phase 4A) without alerting is just data. Alerting surfaces actionable issues to cloud operators.

The Problem

Phase 4A collects telemetry; Phase 4B reconciles; Phase 4C garbage-collects. But cloud operators need PROACTIVE notification when thresholds breach:

Tenant quota exhausted (push frequency > threshold; chunk count > threshold)
Tenant error rate spike (errors/min > threshold)
Tenant schema-version drift (old schemaVersion seen → deprecation warning)
Reconciliation finds drift > threshold (tenant-side push pipeline broken?)
Embedding-budget burn threatens provider quota

Without alerting, operators must manually poll telemetry tables. Production-grade cloud Agent OS needs push-based ops.

The Fix

New daemon (or integration with Phase 4A): ai/scripts/kb-alerting-daemon.mjs OR embedded in Phase 4A daemon (decision: split for testability per Phase 4 Epic Avoided Trap).

Threshold-rule engine:

// aiConfig.knowledgeBase.alertRules
[
  {
    metric: 'tenant.error_rate_5min',
    threshold: 0.1,  // 10% error rate
    severity: 'warning',
    channels: ['a2a:AGENT:*', 'console']
  },
  {
    metric: 'tenant.chunk_count',
    threshold: 100000,  // chunks per tenant
    severity: 'critical',
    channels: ['a2a:operator', 'webhook:https://...']
  },
  // ...
]

Channels:

A2A — add_message({to: '@<operator-identity>' | 'AGENT:*', subject: '[alert] ...', body: ...}). Reuses existing A2A substrate.
Console — logger.warn / logger.error in KB server logs
Webhook (V1.5) — POST to external URL per tenant config (Slack, PagerDuty, etc.)

Acceptance Criteria

aiConfig.knowledgeBase.alertRules schema defined
Rule engine: matches telemetry metrics against rules; fires alerts when threshold crossed
A2A channel: alerts emit add_message per rule.channels[i]
Console channel: alerts emit logger.warn / logger.error
Webhook channel (V1.5 if shipped): POSTs to per-tenant URL
Hysteresis: alerts don't re-fire within configurable cooldown window (default 1h)
Unit tests: rule matching, channel dispatch, hysteresis
Integration test: simulate threshold breach → alert delivered

Out of Scope

Threshold tuning per tenant (V1: global thresholds; per-tenant tuning future ticket)
ML-driven anomaly detection (rule-based for V1)
Alert UI / dashboard (sandman_handoff + portal app surface for V1)
Webhook channel may defer to V1.5 if measurement justifies

Contract Ledger

Added at intake by @neo-opus-ada (Claude Code) 2026-05-21 — satisfies the ticket-intake §7 Contract Completeness readiness gate (intake comment: https://github.com/neomjs/neo/issues/11642#issuecomment-4504320783). The original author session is inactive; per ticket-intake §7 the claiming maintainer authors the missing ledger. This ledger folds in @neo-gpt's "Phase 4D Alerting Channel Overlay" (comment 2026-05-20T03:04Z) — its 6 substrate-grounded recommendations are the binding contract here. Tier target: T3 (Explicit Matrix). The ledger is the precise contract; the loose Acceptance Criteria checklist above is unchanged and is refined by these rows.

Target Surface	Source of Authority	Proposed Behavior	Fallback / Edge Case	Docs	Evidence
`aiConfig.knowledgeBase.alertRules` — config schema	#11628 Phase 4D; this ticket; @neo-gpt overlay #1/#3	An array of rule objects `{metric, threshold, severity, channels, deliveryMode?}`. `metric` is a per-tenant telemetry field name from `KBRecorderService.getTenantIngestionRollup` (`errorRate`, `eventCount`, `errorEvents`, `chunksEmbedded`, …). `threshold` is a number; a rule fires when the metric value exceeds it. `severity` is `warning` or `critical`. `channels` is an explicit, non-empty array per rule — there is no implicit default channel. `deliveryMode` is `wake` (default) or `audit`. The daemon evaluates every rule against the current per-tenant rollup each tick.	Missing / empty `alertRules` → the daemon runs and fires nothing (no-op, not an error). A malformed rule (unknown `metric`, non-numeric `threshold`, bad `severity`, empty `channels`) → skip that rule with a `logger.warn`; the daemon continues. No `a2a:AGENT:*` default — a rule that does not explicitly list a broadcast channel never broadcasts.	Yes — `aiConfig` template + JSDoc + a `learn/agentos/` note	Unit: rule-schema parse + validation (valid / malformed / empty)
A2A alert channel — `a2a:<target>`	this ticket; the `add_message` MCP contract; @neo-gpt overlay #2/#3	A `channels` entry `a2a:<target>` dispatches `add_message({to: <target>, subject: '[alert] <severity>: <metric> over <threshold> (tenant <tenantId>)', body: <detail>})`. `<target>` is a canonical `@<identity>` direct recipient (first-class) OR `AGENT:` for broadcast — broadcast occurs only when a rule explicitly lists `a2a:AGENT:`. `deliveryMode: wake` → wakeful (`wakeSuppressed` omitted); `deliveryMode: audit` → `wakeSuppressed: true` (durable mailbox-only record, no wake).	An invalid `<target>` (not a registered AgentIdentity and not `AGENT:*`) → skip the channel with a `logger.warn` before dispatch; never dispatch to an unresolved target. An `add_message` failure → `logger.error`, best-effort; the daemon continues.	Yes — daemon / service JSDoc	Unit: channel dispatch — direct-DM, explicit broadcast, invalid-target rejection, wake vs. audit delivery mode
Console alert channel — `console`	this ticket; `ai/mcp/server/knowledge-base/logger.mjs`	A `channels` entry `console` dispatches `logger.warn` for `severity: warning` and `logger.error` for `severity: critical`, with the alert detail.	The logger is always available; a throw inside logging is swallowed (best-effort).	Yes — JSDoc	Unit: console dispatch maps severity → logger level
Webhook alert channel — `webhook:<url>`	this ticket Out-of-Scope; @neo-gpt overlay #6	V1.5-deferred. V1 recognizes a `webhook:` channel spec but does NOT POST — it emits a `logger.warn` ("webhook channel deferred to V1.5") and skips. V1.5 (a separate ticket) ships the POST path only alongside an allowlisted-target + secret-handling story — arbitrary-URL POST from alert config is a higher-blast surface than the A2A / console paths.	V1: a `webhook:` channel spec → warn + skip; no network call is made.	Yes — when V1.5 ships	Unit: a V1 `webhook:` spec produces warn-skip with no network call
Alert cadence / hysteresis	this ticket AC; @neo-gpt overlay #4	A fired alert is suppressed from re-firing within a cooldown window (default 1h, configurable via `aiConfig`). The cooldown key is the tuple `(tenantId, metric, severity, channelTarget)` — a single noisy tenant cannot cause a wake-storm, while distinct tenants / metrics / targets alert independently.	Cooldown state is in-memory per daemon process; a daemon restart resets it (acceptable — at most one extra alert per key after a restart).	Yes — JSDoc	Unit: cooldown suppresses a re-fire within the window and permits it after; per-key independence

Parent: #11628
Blocked-by: Phase 4A (telemetry source)
A2A substrate: existing add_message MCP tool
Pattern reference: swarm-heartbeat-daemon.mjs (existing daemon-with-A2A-output pattern)

Origin Session ID

7360e917-1733-4cdd-a6f3-5ac51c34b838

Handoff Retrieval Hints

add_message is the A2A delivery primitive
logger substrate in ai/mcp/server/knowledge-base/logger.mjs is the console channel
swarm-heartbeat-daemon.mjs emits A2A messages — pattern reference for alert delivery

tobiu referenced in commit 5d64a1f - "feat(ai): KB ingestion telemetry schema + recordIngestionMetric API (#11639) (#11667) on May 20, 2026, 8:01 AM

tobiu referenced in commit b1b7005 - "feat(ai): KB operator-alerting daemon — Phase 4D (#11642) (#11709) on May 21, 2026, 8:07 AM

tobiu closed this issue on May 21, 2026, 8:07 AM