What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10650
title	Codify wake restart and incident protocol
state	Closed
labels	documentationenhancementairegressionarchitecture
assignees	neo-opus-ada
createdAt	May 3, 2026, 4:56 PM
updatedAt	May 4, 2026, 12:49 AM
githubUrl	https://github.com/neomjs/neo/issues/10650
author	neo-gpt
commentsCount	6
parentIssue	10647
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 3, 2026, 7:00 PM

Codify wake restart and incident protocol

Closed v13.0.0/archive-v13-0-0-chunk-7 documentationenhancementairegressionarchitecture

neo-gpt commented on May 3, 2026, 4:56 PM

Context

Child of #10647. The 2026-05-03 wake regression got worse immediately after the bridge/MCP restart that the previous session expected to improve the system. That is the operational smell: restart is not a neutral act for this substrate. It changes daemon process state, MCP cache state, harness focus/keybindings, subscriptions, and heartbeat process population.

Current user/operator stance:

A2A message save/read still works.
Wakeups are deactivated intentionally.
Heartbeat processes were killed to stop fresh-session spawning while active sessions were still ongoing.
We need to coordinate through durable messages for now, not rely on wake delivery.

This protocol ticket turns that incident handling into a repeatable release/restart checklist.

Duplicate Sweep Notes

Creation sweep performed as part of #10647:

Live latest-20 open GitHub issues were read with number/title/author/labels/URL. Adjacent tickets include #10646 (ticket-create live sweep), #10645 (bootstrap cache), #10644 (Antigravity prompt landing), #10643 (checkSunsetted ordering), #10633/#10627 (heartbeat/fresh-session correctness), #10601 (parent epic), and #10517 (routing semantics). None own the restart/incident protocol for wake-substrate reactivation.
Local resource search found process discussions (#10547 track budget, #10629 unattended driver-not-passenger) and prior validation gaps (#10440), but no current incident protocol ticket.
ask_knowledge_base(type: 'ticket') found no equivalent ticket.

The Problem

The swarm has been treating wake-substrate changes as ordinary code changes: merge, restart, assume better, then react to the next breakage. That is no longer acceptable for a subsystem that can:

paste into files;
spawn new agent sessions;
steal focus;
mutate the Memory Core session grouping;
interrupt active work;
create repeated coordination noise while @tobiu is not actively watching.

The recurring cause is not one model or one file. It is missing operational discipline around restart/re-enable moments.

The Architectural Reality

Wake restart touches at least these layers:

bridge-daemon.mjs process and its app/focus permissions.
Memory Core MCP server process and GraphService/AgentIdentity cache hydration.
WAKE_SUBSCRIPTION nodes and harnessTargetMetadata templates.
swarm-heartbeat.sh PIDs and cooldown/idempotency state.
resumeHarness.mjs fresh-session boot behavior.
Harness app UI state/keybindings for Claude Desktop, Antigravity, and Codex Desktop.
Agent protocol state: sunset is terminal; fresh-session spawn requires explicit sunset/unsubscribe; normal A2A messages should remain durable mailbox records while wakes are disabled.

This ticket should be documentation/protocol first. If implementation hooks are needed later, create narrow follow-ups.

The Fix

Codify a wake-substrate restart and incident protocol in a repo-visible place, likely under learn/agentos/ or the wake-substrate ADR/runbook area.

The protocol should define:

Incident declaration: when repeated unsafe wake symptoms require heartbeat/wake deactivation.
Freeze rule: while incident is active, do not reactivate heartbeat/wake delivery from individual local fixes.
Restart checklist: bridge daemon, Memory Core MCP, subscriptions, harness apps, and heartbeat PIDs must be inventoried explicitly.
Reactivation evidence: #10649 prompt-landing matrix and #10648 safety gate must be green or @tobiu must explicitly override.
Coordination mode: use add_message / list_messages durable mailbox while wakeups are disabled; do not assume wake interrupts will arrive.
Ownership split: local bug owners fix their tickets; one coordinator owns the reactivation gate.
Post-incident retrospective: record what layer failed and whether a new guard/test/protocol was added.

Acceptance Criteria

A repo-visible wake restart/incident protocol exists and is linked from #10647.
The protocol states that restarting bridge/MCP/harness processes is a release event requiring validation, not a neutral cleanup step.
The protocol defines when to disable/keep disabled heartbeat and wake delivery.
The protocol requires live inventory of bridge daemon PID, heartbeat PIDs, Memory Core MCP state, wake subscriptions, and harness targets before reactivation.
The protocol requires #10649 prompt-landing matrix evidence before declaring wake delivery healthy.
The protocol requires #10648 safety gate/circuit breaker to be in place before heartbeat is re-enabled.
The protocol names durable A2A mailbox (add_message/list_messages) as the coordination fallback while wake delivery is disabled.
The protocol includes a short incident-retrospective template: symptom, failed layer, proof, fix ticket, validation evidence, avoided recurrence guard.

Out of Scope

Implementing the #10648 circuit breaker.
Implementing the #10649 prompt-landing matrix itself.
Fixing local regressions #10643/#10644/#10645/#10633/#10627.
Editing Progressive Disclosure skill routers unless a later implementation explicitly requires it. If .agents/skills/** are touched, the implementer must invoke create-skill and keep heavy content in references.
Reactivating heartbeat.

Avoided Traps

Trap: treat restart as cleanup. Rejected. Restart changes substrate state and must be validated.
Trap: rely on wakeups to coordinate during a wake incident. Rejected. Durable mailbox works and should be the fallback path.
Trap: no single coordinator. Rejected. Local bug owners can fix local tickets, but reactivation needs one gate owner.
Trap: write a postmortem without a guard. Rejected. Retrospective must link to a test, guard, or protocol change.

Parent epic: #10647.
Grandparent epic: #10601.
Safety gate: #10648.
Prompt-landing validation: #10649.
Local regressions: #10643, #10644, #10645, #10633, #10627.

Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1

Retrieval Hint: wake substrate restart incident protocol heartbeat disabled bridge MCP harness reactivation durable mailbox.

tobiu referenced in commit 8f5bdc4 - "feat(ai): add wake safety gate and circuit breaker (#10648) (#10653) on May 3, 2026, 5:40 PM

tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM

tobiu closed this issue on May 3, 2026, 7:00 PM

tobiu referenced in commit 73dfaf6 - "fix(ai): extend focus seed default to Codex (#10662) (#10663) on May 3, 2026, 10:00 PM

tobiu referenced in commit 60b9c7b - "fix(ai): fail closed for Codex UI wake (#10664) (#10665) on May 3, 2026, 10:42 PM