LearnNewsExamplesServices
Frontmatter
id10649
titleAdd cross-harness wake prompt-landing matrix
stateClosed
labels
enhancementaitestingarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 3, 2026, 4:55 PM
updatedAtMay 3, 2026, 10:08 PM
githubUrlhttps://github.com/neomjs/neo/issues/10649
authorneo-gpt
commentsCount5
parentIssue10647
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 3, 2026, 6:59 PM

Add cross-harness wake prompt-landing matrix

Closedenhancementaitestingarchitecture
neo-gpt
neo-gpt commented on May 3, 2026, 4:55 PM

Context

Child of #10647. The 2026-05-03 regression showed that the team repeatedly accepted layer-local success as full-loop wake success. The clearest example is #10644: the bridge believed it delivered to Antigravity because osascript exited 0, but the payload landed in editor/file content instead of the agent prompt surface.

The wake substrate needs a validation matrix that says exactly what must be true before anyone can call wake delivery working again.

Duplicate Sweep Notes

Creation sweep performed as part of #10647:

  • Live latest-20 open GitHub issues were read with number/title/author/labels/URL. Adjacent entries included #10644 (Antigravity-specific prompt-surface bug), #10645 (Codex bootstrap), #10633/#10627 (heartbeat/recovery dependencies), #10601 (parent recovery epic), and #10517 (routing semantics). None define a cross-harness prompt-landing acceptance matrix.
  • Local resource search found #10440 noting the old trap: a bootstrap check passed even though full-loop wake delivery had not been validated. That is historical adjacency, not an open matrix ticket.
  • ask_knowledge_base(type: 'ticket') found no equivalent ticket for prompt-landing validation.

The Problem

"Wake delivered" is currently ambiguous. It can mean any of:

  • the A2A message was persisted;
  • list_messages can read it;
  • a wake subscription exists;
  • a coalesced/raw event was emitted;
  • bridge-daemon attempted an adapter;
  • osascript exited 0;
  • the correct app became frontmost;
  • the payload actually reached the agent prompt input;
  • the agent accepted/submitted the prompt without corrupting files or spawning a wrong session.

Only the last two prove the user-visible effect. The substrate repeatedly regressed because earlier layers were treated as sufficient proof.

Historical Memory Context

Relevant Memory Core anchors:

  • summary_0763a9bf-1052-4a2f-99f3-a8e0e14f1671 — wake delivery previously required multiple fixes across permissions, tab focus, atomic paste, raw/digest envelope, and clone state.
  • summary_a8592d87-f132-42c5-af6f-0e066fd5f428 — a raw/coalescing change broke another delivery path, showing that wire-format and adapter compatibility must be tested as a loop.
  • summary_bf59d6c4-e250-44a2-b4b2-5bffae40ab5f — wake substrate reviews uncovered silent metadata stripping and led to mailbox protocol discipline.

The Architectural Reality

The validation surface spans:

  • Memory Core A2A storage/listing (add_message, list_messages).
  • WakeSubscriptionService bootstrap/active subscription state.
  • Coalescing/raw wake event shape.
  • Bridge daemon or native harness transport.
  • Harness-specific prompt surface behavior:
    • Claude Desktop / Claude app tab or prompt field.
    • Antigravity IDE agent composer, not editor files.
    • Codex Desktop prompt/thread surface.

No single unit test can prove all of this for all harnesses. The matrix should combine deterministic unit/integration tests with explicit manual or semi-automated live validation evidence.

The Fix

Create a cross-harness wake validation matrix and wire it into the #10647 reactivation gate.

Minimum matrix columns:

  1. message persisted;
  2. unread/list state correct;
  3. subscription bootstrap and metadata correct;
  4. wake event emitted with expected envelope;
  5. adapter selected correct app/session target;
  6. prompt payload lands in the agent prompt/composer surface;
  7. no editor/file content is modified;
  8. no fresh session is spawned unless explicit sunset/unsubscribe permits it;
  9. recipient can act on the prompt or a clear blocked signal is emitted;
  10. evidence artifact captured in PR/comment/log.

Minimum rows:

  • Claude Desktop / Claude app wake path.
  • Antigravity IDE / Gemini wake path.
  • Codex Desktop wake path.

Acceptance Criteria

  • A repo doc or test-runbook defines the matrix columns above and names the evidence required for each cell.
  • The matrix explicitly states that A2A storage success and bridge adapter success are not sufficient proof of prompt delivery.
  • The matrix covers Claude Desktop, Antigravity IDE, and Codex Desktop as distinct harness rows.
  • Antigravity row includes a negative assertion: the payload must not land in editor/file content (#10644 regression class).
  • Fresh-session rows include a negative assertion: no fresh session may spawn unless explicit sunset/unsubscribe state authorizes it.
  • At least one deterministic test or mock validates bridge adapter intent for each supported app name/strategy where feasible.
  • Live/manual evidence requirements are documented for UI-only assertions that cannot be proven in headless unit tests.
  • #10647 heartbeat reactivation gate references this matrix as required evidence before reset.

Out of Scope

  • Fixing any individual harness row failure. Failures should create or link specific tickets such as #10644/#10645.
  • Building a full native UI automation layer for every harness in this ticket.
  • Replacing #10517 HarnessPresence/wakePolicy routing semantics.
  • Reactivating heartbeat.

Avoided Traps

  • Trap: bridge log says delivered, therefore done. Rejected. Prompt landing is the observable effect.
  • Trap: require impossible full automation before documenting the gate. Rejected. Manual evidence is acceptable where harness UI automation is brittle, but the matrix must make the gap explicit.
  • Trap: test only the currently broken harness. Rejected. Cross-harness regression is the pattern.
  • Trap: collapse storage/listing into wake delivery. Rejected. A2A mailbox can work while wake delivery is broken.

Related

  • Parent epic: #10647.
  • Grandparent epic: #10601.
  • Antigravity prompt-surface regression: #10644.
  • Codex bootstrap regression: #10645.
  • Routing model: #10517.
  • Historical full-loop validation miss: #10440.

Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1

Retrieval Hint: wake prompt landing matrix bridge delivered not enough Antigravity file write Codex Claude harness validation.

tobiu referenced in commit 8f5bdc4 - "feat(ai): add wake safety gate and circuit breaker (#10648) (#10653) on May 3, 2026, 5:40 PM
tobiu referenced in commit 704a33c - "docs(ai): add wake prompt landing matrix (#10649) (#10654) on May 3, 2026, 6:59 PM
tobiu closed this issue on May 3, 2026, 6:59 PM
tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM
tobiu referenced in commit 73dfaf6 - "fix(ai): extend focus seed default to Codex (#10662) (#10663) on May 3, 2026, 10:00 PM