LearnNewsExamplesServices
Frontmatter
id10517
titleDefine HarnessPresence and wakePolicy routing for A2A wakes
stateOpen
labels
documentationenhancementaiagent-role:devagent-task:blockedarchitecture
assignees[]
createdAtApr 30, 2026, 2:56 AM
updatedAtMay 25, 2026, 6:54 AM
githubUrlhttps://github.com/neomjs/neo/issues/10517
authorneo-gpt
commentsCount2
parentIssue10311
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]

Define HarnessPresence and wakePolicy routing for A2A wakes

Opendocumentationenhancementaiagent-role:devagent-task:blockedarchitecture
neo-gpt
neo-gpt commented on Apr 30, 2026, 2:56 AM

Context

Recent wake-substrate analysis found that our Shape D architecture is directionally correct, but missing one critical layer: transport selection alone cannot decide whether a wake should interrupt an active harness turn or wait until the next turn. The user explicitly validated the value of documenting the three existing shapes and approved filing this follow-up ticket after the current open-ticket sweep.

The current architecture already documents the transport ladder:

  • Shape A: MCP notifications / push when a harness supports it.
  • Shape B: A2A webhook push notifications.
  • Shape C: bridge-daemon / OS-level fallback.
  • Shape D: hybrid standards-first routing with bridge fallback.

The new insight is that Shape D needs an explicit presence and policy model, not just transport capability detection.

The Problem

The wake service currently has durable messages, subscriptions, priorities, and bridge delivery, but urgent-vs-next-turn semantics are underspecified. This creates two failure modes:

  1. Normal coordination messages can wake or interrupt agents unnecessarily.
  2. Truly urgent coordination, such as scope-creep prevention while a peer is mid-PR, has no explicit routing contract.

The hardest part is not the priority flag itself. The hard part is knowing whether the target harness is idle, actively generating, waiting on approval, or holding user-typed input that must not be clobbered.

The Architectural Reality

This ticket sits under the swarm-autonomy / wake-substrate line, not under a specific PR review or Memory Core bugfix.

Relevant existing anchors:

  • Parent epic: #10311
  • Shape D wake-substrate epic: #10357
  • Standards-alignment ADR: learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md
  • MCP Shape A follow-up: #10358
  • A2A webhook Shape B follow-up: #10359
  • AppleScript focus-steal safety issue: #10422
  • Wake-substrate integrity issue: #10515

Current public-source anchors from the April 2026 analysis:

The Fix

Define and document a routing layer with two explicit primitives:

  1. HarnessPresence

    • Suggested state set: unknown | idle | active | waitingOnApproval | userTyping?
    • Suggested metadata: activeTurnId, capabilities, lastSeenAt, source
    • Purpose: describe the receiving harness state independently from message importance.
  2. wakePolicy

    • Suggested values: silent | next_turn | immediate
    • Purpose: describe delivery behavior independently from semantic priority.

Routing should then be capability-aware:

  • silent: store only; no wake.
  • next_turn: store unread; picked up by turn-start mailbox checks.
  • immediate + Codex active: prefer Codex app-server turn/steer.
  • immediate + Codex idle: prefer Codex app-server turn/start.
  • immediate + MCP/A2A-capable harness: use standards-backed push.
  • immediate + no native control plane: bridge-daemon / OS fallback.

The OS fallback may infer active vs idle from the harness submit button state, e.g. Send versus Stop / Cancel, but this must be treated as a last resort. If used, polling should be conservative; the initial design target is a 5s interval with focus checks, timeout limits, and a hard guard against clobbering active user input.

Acceptance Criteria

  • Existing Shape A/B/C/D terminology is preserved and extended, not replaced.
  • Documentation defines HarnessPresence and wakePolicy separately from transport capability and message priority.
  • Documentation explains why OS-level button-state polling is last-resort only.
  • Documentation specifies a conservative initial OS polling interval of 5s if this fallback is implemented.
  • Routing examples cover Codex native app-server steering/start, MCP/A2A standards push, and bridge fallback.
  • Scope boundary is explicit: this ticket does not implement wake-substrate GC/subscription integrity fixes from #10515.
  • Scope boundary is explicit: this ticket does not solve AppleScript focus-steal mechanics from #10422 beyond referencing their safety constraints.

Out of Scope

  • Implementing the actual OS polling adapter.
  • Changing existing wake subscription persistence or GC behavior.
  • Expanding #10515 with routing semantics.
  • Treating OS button polling as a first-class architecture.
  • Assuming all harnesses support active-turn preemption.

Avoided Traps

  • Priority-only design: priority: high is not the same thing as wakePolicy: immediate. Some high-importance messages should wait; some short guardrail messages may need immediate delivery.
  • Transport-only design: MCP/A2A/bridge capability does not answer whether the target is safe to interrupt.
  • OS polling as architecture: Send vs Stop/Cancel can be useful empirically, but it is brittle UI automation and must remain a fallback adapter.
  • Scope creep into #10515: wake-substrate integrity must stay focused on subscription survival and test isolation. Routing policy is a separate layer.

Related

  • Parent epic: #10311
  • Prior Shape D wake-substrate epic: #10357
  • MCP notification path: #10358
  • A2A webhook path: #10359
  • AppleScript focus safety: #10422
  • Wake-substrate integrity: #10515

Origin Session ID: 3b0c3e6f-21a2-4b16-babd-3c4e208c2926 Retrieval Hint: wake substrate HarnessPresence wakePolicy urgent next-turn routing Codex app-server OS button polling MCP push A2A webhook bridge fallback

tobiu referenced in commit 73dfaf6 - "fix(ai): extend focus seed default to Codex (#10662) (#10663) on May 3, 2026, 10:00 PM
tobiu referenced in commit 60b9c7b - "fix(ai): fail closed for Codex UI wake (#10664) (#10665) on May 3, 2026, 10:42 PM