LearnNewsExamplesServices
Frontmatter
id10645
titleAgentIdentity cache miss breaks wake bootstrap templates
stateClosed
labels
bugairegressionarchitecture
assigneesneo-gpt
createdAtMay 3, 2026, 4:33 PM
updatedAtMay 3, 2026, 6:59 PM
githubUrlhttps://github.com/neomjs/neo/issues/10645
authorneo-gpt
commentsCount0
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 3, 2026, 6:59 PM

AgentIdentity cache miss breaks wake bootstrap templates

Closedbugairegressionarchitecture
neo-gpt
neo-gpt commented on May 3, 2026, 4:33 PM

Context

PR #10637 closed the validAppNames half of #10636 by adding canonical Codex acceptance to WakeSubscriptionService. That was necessary, but it was not the only failure surface observed during the 2026-05-03 bridge/MCP restart regression.

Before #10637, a fresh @neo-gpt Codex MCP session called:

manage_wake_subscription({action: 'bootstrap'})

and received:

Cannot bootstrap subscription: no subscriptionTemplate found on AgentIdentity '@neo-gpt'

At the same time, raw SQLite inspection showed the @neo-gpt AgentIdentity node did contain the canonical subscriptionTemplate with:

harnessTarget: 'bridge-daemon'
harnessTargetMetadata: {appName: 'Codex'}

That means this is not just the fixed allow-list bug from #10636. The bootstrap path can observe a stale or incomplete AgentIdentity object even when the durable SQLite row is correct.

Duplicate sweep notes:

  • #10636 explicitly names this cache-staleness surface as out of scope after the validAppNames fix.
  • #10638 covers bridge-daemon stderr observability, not Memory Core identity lookup.
  • Older cache tickets such as #10222 / #10230 discuss related GraphService cache coherence and rich-row preservation classes, but do not cover the current WakeSubscriptionService.bootstrap / AgentIdentity.subscriptionTemplate restart regression.

The Problem

WakeSubscriptionService.bootstrap() treats the in-memory GraphService.db.nodes.get(owner) result as authoritative for the agent identity template. If that cache entry is missing, stale, or represented as a stripped identity stub, bootstrap throws no subscriptionTemplate found even though SQLite contains the correct rich AgentIdentity row.

For the wake substrate, this is severe because restart-time bootstrap is exactly the path that should recreate Shape C subscriptions after MCP server restart. A stale in-memory identity row silently strands an agent family from wake injection while mailbox storage still works, creating the false impression that A2A itself is broken.

The Architectural Reality

Relevant ownership surface:

  • ai/mcp/server/memory-core/services/WakeSubscriptionService.mjs
    • bootstrap() reads GraphService.db.nodes.get(owner) and requires properties.subscriptionTemplate.
    • It then delegates through subscribe(), which validates the template metadata and persists the WAKE_SUBSCRIPTION node.
  • ai/graph/identityRoots.mjs
    • @neo-gpt has a canonical subscriptionTemplate with appName: 'Codex'.
  • test/playwright/unit/ai/mcp/server/memory-core/services/WakeSubscriptionService.spec.mjs
    • Existing tests cover template creation, idempotency, missing template, and appName validation, but they do not reproduce a durable-SQLite-rich-row / in-memory-cache-stale-row split.

This belongs in Memory Core / GraphService read-through behavior or in the WakeSubscriptionService bootstrap read path. It does NOT belong in bridge-daemon.mjs, macOS TCC permission handling, or a dynamic replacement for the now-correct validAppNames list.

The Fix

Add a focused regression path proving that WakeSubscriptionService.bootstrap() can recover the canonical AgentIdentity.subscriptionTemplate from the durable graph state after an MCP restart or cache-stale condition.

Likely implementation shapes:

  1. Teach bootstrap to perform a durable read-through by id when GraphService.db.nodes.get(owner) lacks properties.subscriptionTemplate.
  2. Or fix the GraphService / SQLite node cache hydration so AgentIdentity nodes loaded after restart preserve rich properties, including subscriptionTemplate.

The chosen implementation must preserve the existing hard error for truly missing templates; it should not silently synthesize a template or fall back to hardcoded harness defaults.

Acceptance Criteria

  • A regression test creates or simulates a rich AgentIdentity row in SQLite with properties.subscriptionTemplate, while the in-memory node cache is missing or stale, and proves WakeSubscriptionService.bootstrap() still finds the template.
  • Bootstrap for an identity with a real template creates or returns an active WAKE_SUBSCRIPTION using the durable template fields.
  • The existing no-template error path remains intact for an identity whose durable row genuinely lacks subscriptionTemplate.
  • The regression test covers @neo-gpt / Codex or an equivalent Shape C template so the #10636/#10637 path is protected.
  • The fix does not introduce hardcoded per-agent fallbacks, does not broaden validAppNames beyond canonical entries, and does not mutate AgentIdentity rows into stripped stubs.
  • Post-merge validation: after MCP server restart, manage_wake_subscription({action: 'bootstrap'}) from @neo-gpt creates or returns a Shape C subscription whose harnessTargetMetadata.appName is Codex.

Out of Scope

  • validAppNames missing Codex — fixed by PR #10637 / #10636.
  • bridge-daemon stdio:'ignore' stderr observability — #10638 already isolates the bridge diagnostics surface.
  • macOS TCC / System Events keystroke permissions — operator/system setting, not Memory Core code.
  • Dynamic harness registry refactor for app names — premature for this bug; static canonical entries remain acceptable.
  • A live A2A write or wake-delivery test as part of implementation; this ticket should first repair deterministic Memory Core bootstrap behavior.

Avoided Traps

  • Trap: bundle this with bridge-daemon diagnostics. Rejected. A bootstrap read-path failure and osascript stderr suppression are different substrates with different verification surfaces.
  • Trap: treat list_messages success as wake success. Rejected. Mailbox storage and Shape C injection are separate layers; this ticket protects the bridge between identity templates and wake subscriptions.
  • Trap: hardcode @neo-gpt defaults in bootstrap. Rejected. The canonical template already lives on the AgentIdentity; the bug is that bootstrap cannot reliably see it.
  • Trap: mask the bug with retry loops. Rejected unless the retry is tied to an explicit durable read-through. Prior cache tickets show blind retry loops are not a substitute for coherent graph hydration.

Related

  • Parent: #10601 — Auto-wake substrate Epic.
  • Precursor: #10636 / PR #10637 — fixed Codex allow-list validation but intentionally scoped this cache-staleness issue out.
  • Related diagnostics: #10638 — bridge-daemon stderr visibility.
  • Historical cache-coherence class: #10222, #10230.
  • Origin Session ID: 89b259c3-27ec-4afb-baaf-fd39b55bffe1

Retrieval Hint: query_raw_memories("no subscriptionTemplate found @neo-gpt raw SQLite subscriptionTemplate Codex") or query_summaries("GraphService cache staleness AgentIdentity subscriptionTemplate wake bootstrap").

tobiu referenced in commit 8f5bdc4 - "feat(ai): add wake safety gate and circuit breaker (#10648) (#10653) on May 3, 2026, 5:40 PM
tobiu referenced in commit 226cabb - "fix(ai): recover wake bootstrap templates from durable graph (#10645) (#10656) on May 3, 2026, 6:59 PM
tobiu closed this issue on May 3, 2026, 6:59 PM
tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM