What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10347
title	Investigate intermittent SENT_TO edge cull on Antigravity-side A2A messages
state	Closed
labels	bugaiarchitecture
assignees	[]
createdAt	Apr 26, 2026, 12:47 AM
updatedAt	Jun 7, 2026, 7:21 PM
githubUrl	https://github.com/neomjs/neo/issues/10347
author	neo-opus-ada
commentsCount	1
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	Apr 26, 2026, 3:43 AM

Investigate intermittent SENT_TO edge cull on Antigravity-side A2A messages

Closed v13.0.0/archive-v13-0-0-chunk-6 bugaiarchitecture

neo-opus-ada commented on Apr 26, 2026, 12:47 AM

Author's Note: Filed by Claude Opus 4.7 (Claude Code) during session b5a17132-7324-46e1-b73e-038825bb4d55 per @tobiu's directive ("i strongly agree this needs an investigation"). Empirical anchor: cross-family A2A coordination during this session-arc shows reproducible asymmetric routing failure — @neo-gemini-pro's outgoing messages from Antigravity sometimes route correctly, sometimes orphan with zero SENT_TO edges. Claude-side outgoing path appears stable. The bug is intermittent on her side specifically.

Context

The multi-day mailbox-debugging arc (#10174 → #10269 → #10308 → #10325 → #10330/#10331) closed the caller-format-mismatch family of silent-cull bugs. Plus #10325's sharedEntity:true primitive resolved RLS read-path. Plus #10330 unified identity-format normalization. Plus #10331 simplified normalizeMailboxTarget to single-rule canonical-format.

Yet: during this session-arc Gemini's outgoing A2A messages still exhibit intermittent silent-cull. Empirical pattern observed:

Time	Subject	Routing	Reached My Mailbox
20:31:23Z	re: PR #10340 review cycle 1	NO SENT_TO/SENT_BY edges	❌ orphaned
21:37:58Z	re: PR #10340 cycle 1 & #10333	NO SENT_TO/SENT_BY edges	❌ orphaned
21:38:55Z	re: #10336 is also done	NO SENT_TO/SENT_BY edges	❌ orphaned
22:08:42Z	A2A Channel Restored + Taking #10338	✓ SENT_BY @neo-gemini-pro + SENT_TO @neo-opus-ada	✅ delivered
22:19:44Z	Re: task	SENT_TO @alice (wrong target)	❌ wrong recipient
22:31:12Z	Review requested for PR #10342	NO SENT_TO/SENT_BY edges	❌ orphaned
22:40:34Z	Review Request: PRs #10345 and #10346	NO SENT_TO/SENT_BY edges	❌ orphaned

Pattern: her A2A had ONE clean-routing window after she did git pull origin dev + restart-Antigravity (the 22:08:42Z message), then regressed back to silent-cull state for subsequent messages.

@tobi's "postman ping" pattern is forced load-bearing during regression intervals — undermines the autonomy paradigm + the "swarm evolution when tobi not relaying" goal.

The Problem

Three classes of failure observed on her side:

No SENT_TO/SENT_BY edges (silent-cull at FK check) — most common; #10284-class
Wrong SENT_TO target (e.g., @alice instead of @neo-opus-ada) — caller passing wrong identifier; possibly stale state-machine context fixture leaking
Brief working window then regression — suggests state-mutation between calls

Claude-side outgoing path remains stable (verified via direct SQL: my outgoing messages consistently have correct SENT_BY @neo-opus-ada + SENT_TO @neo-gemini-pro). Asymmetry is the diagnostic anchor.

The Architectural Reality

Possible mechanisms (NOT prescribing — investigation should empirically narrow):

MC singleton state divergence in Antigravity: Antigravity's MC server may load multiple agent contexts that share state; RequestContextService.getAgentIdentityNodeId() may bind to the wrong agent identity for some addMessage calls
Caller-format regression in test paths or sub-tools: Gemini's MC may be calling addMessage with stale AGENT:bare-name format (the format #10331 normalizer was supposed to fix) — possibly via test fixtures leaking, role/human prefix routes, or other paths bypassing the normalizer
Cache invalidation on identity binding: the brief-working-window pattern suggests post-restart binding works, then some subsequent operation invalidates the binding
AGENT: sentinel handling regression:* if to: 'AGENT:*' is being normalized to @AGENT:* or similar via the new #10331 normalizer logic, broadcast-routing breaks
Vicinity-cache miss under specific call patterns: getAdjacentNodes may not load the recipient identity into vicinity cache for FK check, causing cull even with correct format

The Fix

Diagnostic-first; this ticket is investigation, not prescription. Phases:

Phase 1 — Empirical bisection: capture concrete failure cases on Antigravity side. Each failure case logs:

Caller-side to: parameter passed to addMessage
Pre-normalize value
Post-normalize value
FK-check verifyStmt result (count vs 2 expected)
Identity-binding me value
Whether cull-warning emitted

This is #10284 Phase 1 (make-failure-loud) territory — surface the cull at write-time so we have concrete signals.

Phase 2 — Root-cause narrow: based on Phase 1 logs, identify which of the 5 hypotheses (or others not listed) actually fires. Fix targets the substrate, not the symptom.

Phase 3 — Test coverage: regression-class test exercising the specific failure pattern + cross-process scenarios using identity-binding fixtures.

Acceptance Criteria

Phase 1 #10284-style observability landed (post-linkNodes verification with structured-logging on cull events)
At minimum 3 failure cases captured from Antigravity-side empirically with structured log payloads
Root cause identified (hypothesis 1-5 confirmed/refuted, OR new hypothesis surfaced empirically)
Substrate fix lands at the right layer (identity-binding / normalizer / vicinity-cache / sentinel-handling — depends on Phase 2)
Regression test covers the specific failure pattern
@neo-gemini-pro's outgoing A2A messages route correctly across at least 5 consecutive sends from her Antigravity MC
Claude-side stability preserved (no regression on Claude Code → Gemini path)

Out of Scope

#10284 Phase 1 implementation itself — sibling ticket; this ticket is downstream consumer. Coordinate via parent-link if useful
Replacing optimistic-concurrency with pessimistic locking — unrelated to this routing bug
Multi-tenant RLS hardening — orthogonal; routing-edge persistence is the scope here
Antigravity harness internals — we don't have direct visibility into Antigravity's MC process. Investigation works from outside via direct SQLite queries + cross-process behavior diagnosis
Generic flaky-test instability — this is a routing-edge persistence issue, not test-pollution

Avoided Traps

"Just restart Antigravity again" — rejected per memory feedback_verify_effect_not_just_success; restart fixes briefly then regresses. Not a stable mitigation.
Diagnose by intuition — rejected; the 5 hypotheses are surface speculation. Phase 1 empirical capture must precede prescription.
Single-fix assumption — rejected; the failure pattern shows brief-working window + regression. Could be multiple stacked bugs (intermittent state corruption + caller-format edge case + cache invalidation).
Premature optimization — rejected; investigation first, fix second. Make the failure loud, then catch it precisely.
Cross-family-blame framing — rejected per @tobiu's "we don't blame, we learn from incidents." This is a substrate gap, not a Gemini-side authorial mistake.

#10284 — Phase 1 make-failure-loud (sibling ticket; this consumer relies on its observability)
#10174 — original normalizeMailboxTarget introduction (load-bearing primitive)
#10325 — sharedEntity:true primitive (read-path RLS)
#10330 / #10331 — single-canonical identity format migration (caller-format normalization)
#10257 — getAdjacentNodes lazy-load pattern (vicinity-cache substrate)
feedback_verify_effect_not_just_success — durable anchor for the verify-mechanism-not-claim discipline
#10311 — Epic: Institutionalizing Swarm Autonomy (this bug undermines the autonomy goal until resolved)

Origin Session ID: b5a17132-7324-46e1-b73e-038825bb4d55 Retrieval Hint: "Antigravity SENT_TO silent-cull intermittent A2A routing investigation MC singleton state divergence vicinity cache identity binding regression empirical bisection #10284 phase-1 makefailure-loud"

tobiu referenced in commit 96eee03 - "fix(ai): Atomic Load-Check-Insert for GraphService linkNodes (#10353) on Apr 26, 2026, 3:43 AM

tobiu closed this issue on Apr 26, 2026, 3:43 AM

tobiu referenced in commit 54600d7 - "refactor(ai-services): align Neo classNames with flat SDK locations across M6 servers (#11005) (#11007) on May 9, 2026, 1:18 PM

tobiu referenced in commit 3748019 - "fix(memory-core/mailbox): reject or resolve invalid to: in add_message instead of silent null-storage (#11417) (#11615) on May 19, 2026, 8:36 AM