LearnNewsExamplesServices
Frontmatter
id10636
titleWake subscription validAppNames omits Codex, breaks @neo-gpt bootstrap
stateClosed
labels
bugairegressionarchitecture
assigneesneo-opus-4-7
createdAtMay 3, 2026, 3:13 PM
updatedAtMay 3, 2026, 6:03 PM
githubUrlhttps://github.com/neomjs/neo/issues/10636
authorneo-opus-4-7
commentsCount0
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 3, 2026, 4:31 PM

Wake subscription validAppNames omits Codex, breaks @neo-gpt bootstrap

Closedbugairegressionarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 3, 2026, 3:13 PM

Context

Live regression discovered 2026-05-03 ~12:58Z after @tobiu restarted bridge-daemon + MCP servers post-#10632 merge ("agreed restart should make things BETTER, instead full-blown regression").

Empirical anchors (verified independently by @neo-opus-4-7 + @neo-gpt cross-family memory-mining):

  1. bridge-daemon log PID 43092 shows 3 osascript exit-1 failures with no successes:

    • 2026-05-03T12:58:14.073Z — failed delivery (corresponds to GPT→Opus msg sent 12:57:41Z, ~33s coalescing window)
    • 2026-05-03T12:58:17.093Z — failed delivery (corresponds to GPT→Gemini msg sent 12:57:45Z)
    • 2026-05-03T12:59:35.101Z — failed delivery (corresponds to Gemini→Opus high-priority msg sent 12:59:03Z)
  2. SQLite WAKE_SUBSCRIPTION rows show only 2 active Shape C subs after restart, NOT 3:

    • WAKE_SUB:2ac01429-...@neo-gemini-3-1-pro / appName Antigravity / harnessTarget bridge-daemon
    • WAKE_SUB:4df2d514-...@neo-opus-4-7 / appName Claude / harnessTarget bridge-daemon (auto-bootstrapped fresh on MCP restart at 12:51:59Z)
    • No active @neo-gpt Shape C subscription — the prior WAKE_SUB:2257d6ee-... (Codex) is gone
  3. @neo-gpt's AgentIdentity in raw SQLite has the canonical subscriptionTemplate with harnessTargetMetadata.appName: 'Codex' and harnessTarget: 'bridge-daemon' — the data is fine.

  4. manage_wake_subscription({action:'bootstrap'}) invoked from a fresh @neo-gpt Codex MCP session rejects with: "Cannot bootstrap subscription: no subscriptionTemplate found on AgentIdentity '@neo-gpt'" — even though raw SQLite shows the template DOES exist. Two possible failure surfaces here: live MCP GraphService cache staleness AND/OR the canonical allow-list rejecting 'Codex' at the validation gate.

  5. Static analysis of WakeSubscriptionService.mjs: PR #10628 (closed #10624) added validAppNames = ['Antigravity', 'Claude'] and the validation throws "Invalid appName 'Codex'. Must be one of: Antigravity, Claude" when the bootstrap path attempts to write the canonical Codex sub.

The Problem

#10624's Fix section literally specified the initial allow-list: "currently: 'Antigravity', 'Claude'; future identities added as harness registry expands". The sub-implementation in PR #10628 codified that initial scope. Codex/@neo-gpt was deferred at write-time and propagated as oversight through implementation — even though @neo-gpt is a canonical trio member (e.g., checkAllAgentIdle.mjs shipped via PR #10631 hardcodes NEO_TRIO_IDENTITIES || '@neo-gemini-3-1-pro,@neo-opus-4-7,@neo-gpt').

The regression chain on MCP server restart:

  • Auto-bootstrap (per #10437/#10438) fires for each AgentIdentity with a subscriptionTemplate
  • For @neo-gpt, the template specifies appName: 'Codex'
  • WakeSubscriptionService.validateMetadata throws because 'Codex' is not in validAppNames
  • The sub is silently NOT created; @neo-gpt becomes unreachable via Shape C wake
  • Codex agent receives messages via list_messages (mailbox-DB layer is fine) but no wake injection

This is a substrate-truth gap of the #10624 family — case-handling/whitelist-completeness defects. Same pattern as #10619 Cycle 1 (AGENT_MEMORY vs MEMORY label drift), #10623 Cycle 1 ($.type vs $.label query drift): "silently-rejected-because-list-is-incomplete" rather than "silently-accepted-but-wrong-form" — symmetric failure mode in the validation direction.

The Architectural Reality

  • Single touchpoint: ai/mcp/server/memory-core/services/WakeSubscriptionService.mjs:60 — the validAppNames field declaration.
  • Validation site: Same file, line ~464 (validateMetadata method).
  • Consumer test surface: test/playwright/unit/ai/mcp/server/memory-core/services/WakeSubscriptionService.spec.mjs (negative-case rejection test exists per #10624 AC; needs symmetric positive-case for Codex).
  • No other allow-list duplicates: grep confirms validAppNames is the single source; openapi.yaml accepts appName as free-form string per pre-#10628 design.
  • Trio canonicalization elsewhere: ai/scripts/checkAllAgentIdle.mjs:18 hardcodes @neo-gemini-3-1-pro,@neo-opus-4-7,@neo-gpt — independent confirmation that Codex is a canonical trio member.

The Fix

Add 'Codex' to the validAppNames allow-list in WakeSubscriptionService.mjs:60:

// Before:
validAppNames = ['Antigravity', 'Claude']

// After:
validAppNames = ['Antigravity', 'Claude', 'Codex']

Single-line code change. Ordering matches the chronological harness onboarding (Antigravity → Claude → Codex per swarm history).

Spec test extension in the corresponding WakeSubscriptionService.spec.mjs: add positive-case assertion that appName: 'Codex' is accepted (mirroring the existing 'Antigravity' and 'Claude' positive cases).

Operator step (post-merge, NOT auto-migration): after MCP server restart with the fix landed, the next auto-bootstrap pass will create @neo-gpt's Shape C sub from the canonical template. If the MCP GraphService cache is also stale on @neo-gpt's AgentIdentity (per @neo-gpt's evidence in cross-family A2A), a separate cache-invalidation ticket will be needed — that's out of scope here.

Acceptance Criteria

  • validAppNames in WakeSubscriptionService.mjs:60 includes 'Codex'
  • Spec test asserts appName: 'Codex' accepted (positive case mirroring 'Antigravity' / 'Claude')
  • Existing negative-case rejection test (lowercase / unknown appName) still passes
  • PR body cites empirical anchor: bridge.log timestamps + GPT's cross-family A2A diagnostic (MESSAGE:6b8c7086) + the #10624 "future identities added as harness registry expands" deferral marker
  • Post-merge: MCP server restart triggers auto-bootstrap recreation of @neo-gpt's Shape C sub via canonical template (verified by SQLite query showing 3 active subs across the trio)

Out of Scope

  • MCP GraphService cache staleness on AgentIdentity at restart — @neo-gpt's separate evidence (bootstrap returns "no subscriptionTemplate found" while raw SQLite shows the template). If post-fix bootstrap still fails after MCP restart, file as a separate substrate ticket; do not bundle here.
  • bridge-daemon stderr capture (stdio:'ignore' swallows osascript stderr) — orthogonal observability gap, separate ticket scope. The 3 failures in this regression were initially undiagnosable because of this; fixing it is defense-in-depth that benefits all future osascript exit-N forensics.
  • OS-level TCC keystroke permission — addressed externally by @tobiu System Settings update; not a code concern.
  • Bulk validAppNames → registry refactor — premature. Static list with explicit additions per harness onboarding matches #10624 precedent.

Avoided Traps

  • Auto-allow case-insensitive variants — same trap rejected by #10624 (silent normalization hides write-side defects). Throw on mismatch; require canonical exact match.
  • Bundle the MCP cache-staleness fix — different substrate (GraphService cache vs validation list); different reviewer surface; bundling violates ticket-create §1 single-scope discipline.
  • Bundle the bridge-daemon stderr observability fix — different file (ai/scripts/bridge-daemon.mjs), different concern (post-hoc forensics vs validation list completeness). Filing separately preserves clean PR review surfaces.
  • Replace static list with dynamic registry-lookup — premature abstraction. The trio is currently 3; explicit listing has lower drift risk than a derived list and is easier to grep.
  • Auto-migrate @neo-gpt's existing sub from SQLite at validation rollout — operator-coordinated re-bootstrap is safer (mirrors #10624 AC pattern).

Related

  • Parent: #10601 (substrate-stack Epic) — this is a substrate-truth regression in lane #5 (canonical wake routes).
  • Direct precursor: #10624 (canonical appName enforcement) / PR #10628 (implementation that introduced the gap by literal "future identities" deferral marker).
  • Substrate-truth class siblings: #10619 Cycle 1 (AGENT_MEMORY label drift), #10623 Cycle 1 ($.label query drift) — same substrate-truth-truth-not-honored failure family.
  • Trio canonicalization confirmation: PR #10631 (#10625 all-agent-idle detection) — NEO_TRIO_IDENTITIES env default lists @neo-gpt as canonical member.
  • Cross-family memory anchor: @neo-gpt's MESSAGE:6b8c7086-6e66-4323-a9b8-4c8d3f6c929b (2026-05-03T13:06:06Z) — empirical evidence that timestamps + Codex-bootstrap-rejection trace.

Origin Session ID: 9766f91c-51f8-44fe-ac34-d79f61a0e1bf

Retrieval Hint: query_summaries("Codex bootstrap excluded validAppNames wake substrate regression 2026-05-03") + query_raw_memories("validAppNames Antigravity Claude Codex omitted #10624 #10628")

tobiu referenced in commit b2071d5 - "fix(memory-core): include Codex in wake subscription canonical appNames (#10636) (#10637) on May 3, 2026, 4:31 PM
tobiu closed this issue on May 3, 2026, 4:31 PM