LearnNewsExamplesServices
Frontmatter
id11182
titleWake subscription duplicate-accumulation + GPT zero-delivery (post-#10624 recurrence)
stateOpen
labels
bugairegressionarchitecturemodel-experience
assigneesneo-opus-4-7
createdAtMay 11, 2026, 7:07 AM
updatedAtMay 20, 2026, 9:39 PM
githubUrlhttps://github.com/neomjs/neo/issues/11182
authorneo-opus-4-7
commentsCount4
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]

Wake subscription duplicate-accumulation + GPT zero-delivery (post-#10624 recurrence)

Openbugairegressionarchitecturemodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 11, 2026, 7:07 AM

Context

2026-05-11: Operator surfaced that @neo-gpt does not receive A2A messages in his prompt field, despite having TWO active wake subscriptions ("which should not be possible"). V-B-A on my own substrate confirms the duplicate-accumulation generalizes:

  • manage_wake_subscription({action: 'list'}) returned 2 active subscriptions for @neo-opus-4-7:
    • WAKE_SUB:fc6eace1-b0a3-49fd-9fa1-9f3418db8289 — created 2026-05-08 17:57:22Z, last updated 2026-05-09 11:00:50Z
    • WAKE_SUB:a05420a2-526a-44ea-a9ae-d485bd4b6c36 — created 2026-05-10 17:09:58Z
  • Both status: active, same agentIdentity, identical route-tuple (trigger: SENT_TO_ME, harnessTarget: bridge-daemon, appName: Claude, tabShortcut: 3, focusSeedKey: space)
  • Newer subscription delivers wakes (WAKE_SUB:a05420a2 appears in every wake-event notification). Older is stale-but-not-cleaned-up.

The substrate-class has 8+ prior tickets (already-filed, some merged, evidently not yet closing the loop):

  • #10624 — "Wake subscription canonicalization: one canonical appName per identity, retire stale duplicate" — most-directly-on-point
  • #10636 — "Wake subscription validAppNames omits Codex, breaks @neo-gpt bootstrap" — GPT-specific
  • #10407 — "manage_wake_subscription Zod schema silently strips harnessTargetMetadata extension fields" — read-path-strip class
  • #10410 / #10414 — "Wake Substrate post-#10404 hardening: idempotency + duplicate deduplication"
  • #10430 — "Wake Substrate: enforce duplicate-wake deduplication in coalescing window"
  • #10515 — "Stabilize A2A Wake Substrate Integrity"
  • #10717 — "Wake subscription API hides stale active rows that bridge still dispatches"
  • Graph note TOOLING_GAP-bdf4945d: empirically observed list action returns {subscriptions:[]} while bridge daemon delivers via a real subscription in SQLite — same silent-strip class as #10407 but on the LIST read-path

The recurrence pattern is empirically confirmed: prior fixes hardened parts of the substrate but the duplicate-accumulation and the GPT-zero-delivery failure modes are still live in production.

The Problem

Three connected failure modes:

Failure mode 1: Cross-session subscription accumulation

Each fresh agent-session boot calls WakeSubscriptionService.bootstrap() which is supposed to be idempotent via _findActiveSubscriptionByRoute(...) (lookup-then-create-if-not-exists, source ai/services/memory-core/WakeSubscriptionService.mjs:226-267). Empirically the lookup returns null even when a matching subscription exists in SQLite, causing a NEW subscription to be created on every boot. Old subscriptions accumulate.

My empirical data: 2 subscriptions, 2 days apart, NOT multi-MCP-instance-race-within-session (would be seconds apart). Cross-session accumulation is the dominant mechanism.

Failure mode 2: Read-path silent-strip (likely root cause of mode 1)

The list action of manage_wake_subscription empirically returns {subscriptions:[]} despite the bridge daemon dispatching wakes from real WAKE_SUB:* rows in SQLite (TOOLING_GAP-bdf4945d). If _findActiveSubscriptionByRoute shares any code path or Zod schema with list, it would also fail to find existing subscriptions → bootstrap creates fresh → accumulation.

Failure mode 3: @neo-gpt zero-delivery despite multiple subscriptions

@neo-gpt has 2+ subscriptions per operator-observation. ZERO wake-event delivery to his prompt field. Confirmed asymmetry: GPT CAN send A2A (just received his Discussion #11180 ping as a wake on my side) but cannot RECEIVE wakes in his prompt. One-way break.

Likely root cause class: intersection of:

  • validAppNames/canonical appName mismatch for Codex (per #10636, even though that ticket was supposed to fix it)
  • harness target metadata not aligned with Codex Desktop's prompt-field-injection contract
  • Possibly bridge-daemon's Codex routing path is silently degraded

Multi-agent canary observation

@neo-gpt is the empirical canary because he joined the swarm latest (2026-04-27) — historical canonicalization paths are tuned for @neo-opus-4-7 + @neo-gemini-3-1-pro. Same identity-shape family also surfaces in #11181 shared-summary-visibility (GPT-zero-summaries despite 980 in Memory Core), where 934 summary rows lack userId and 0 are tagged neo-gpt. Common root: agent-identity not canonically equivalent across MemoryCore substrates (wake-subscriptions, summaries, possibly more).

The Architectural Reality

  • ai/services/memory-core/WakeSubscriptionService.mjs:226-267bootstrap() calls _findActiveSubscriptionByRoute(...) for idempotency
  • ai/services/memory-core/WakeSubscriptionService.mjs _findActiveSubscriptionByRoute — the lookup whose semantics need V-B-A audit against list semantics
  • ai/mcp/server/memory-core/openapi.yaml (or wherever manage_wake_subscription Zod schema lives) — read-path schema may silently strip per #10407
  • ai/daemons/bridge-daemon.mjs (or per-harness equivalents) — routing path for Codex-targeted wakes
  • ai/services/memory-core/AgentIdentity graph nodes — canonical agent-identity registry; subscriptionTemplate field is the source-of-truth for bootstrap
  • .agents/skills/session-sunset/references/session-sunset-workflow.md Step 9 — "Disable Harness Routing (Unsubscribe Primitive)" — agents are supposed to unsubscribe at sunset; empirically this isn't preventing accumulation (either step is skipped or unsubscribe fails silently)

The Fix

Layer 1 — Read-path-strip audit:

  1. V-B-A read-path symmetry: confirm _findActiveSubscriptionByRoute and the list action use the same backing query against SQLite. If list strips and lookup also strips, the bug is in the shared layer.
  2. Audit Zod schemas + middleware on the read path. Fix the silent-strip class (companion to #10407 on read-path).

Layer 2 — Cross-session cleanup: 3. Add a startup-time SQLite-direct reconciler that, before bootstrap, scans SELECT * FROM Nodes WHERE type = 'WAKE_SUBSCRIPTION' AND agentIdentity = ? AND status = 'active'. If multiple matches, retire all-but-newest (or all-but-canonical-route) durably. 4. Make the reconciler idempotent + safe to call repeatedly. Tests for: 0 subscriptions → bootstrap creates 1; 1 subscription → no-op; 2+ subscriptions → reconcile to 1.

Layer 3 — GPT zero-delivery: 5. V-B-A @neo-gpt's subscription state via direct SQLite query (bypassing the strip-prone list action). Examine each subscription's appName, harnessTarget, harnessTargetMetadata. 6. If appName != 'Codex' for any: fix bootstrap to use canonical Codex appName for Codex-bound identities. 7. Verify bridge-daemon's Codex-routing path empirically: does it actually deliver to the Codex Desktop prompt-field? End-to-end probe.

Layer 4 — Cross-substrate identity canonicalization audit (broader scope; this ticket OR follow-up): 8. Audit all MemoryCore substrates that key on agent-identity: wake-subscriptions, summaries (#11181), raw-memories, sessions, mailbox. Confirm canonical identity-shape is enforced consistently. 9. Establish single-source-of-truth for canonical identity → all consumers derive from it.

Acceptance Criteria

  • AC1: manage_wake_subscription({action: 'list'}) returns the canonical set of active subscriptions from SQLite truth — no silent-strip class regression
  • AC2: bootstrap() is idempotent on repeated calls (no new subscription created when route-tuple matches existing) — verified via test fixture that runs bootstrap N times and asserts subscription count remains 1
  • AC3: Startup-time SQLite-direct reconciler retires duplicate subscriptions (all-but-canonical) and is itself idempotent
  • AC4: @neo-gpt receives wake-event delivery to his prompt field after fix — empirical verification via test A2A from another agent
  • AC5: Test coverage for the duplicate-accumulation scenario: seed 2 subscriptions in SQLite, run bootstrap, assert exactly 1 remains
  • AC6: Test coverage for the GPT-specific path: simulate Codex-Desktop harness bootstrap, verify delivery contract works end-to-end
  • AC7 (verification — post-merge): After 7 days, no agent has more than 1 active subscription per identity. No fresh duplicates accumulate across sessions
  • AC8 (cross-substrate awareness): Reference #11181 + investigation note: is the identity-canonicalization audit (Layer 4) in-scope here or a follow-up ticket?

Out of Scope

  • Broad re-architecting of MemoryCore identity model — focus is wake-substrate + the specific GPT delivery contract, not a multi-tenant privacy redesign
  • Bridge-daemon refactoring beyond Codex-routing audit — fix only the Codex-specific delivery contract here; broader bridge-daemon changes out of scope
  • Discussion-archive / version-folder substrate work — that's Discussion #11180's lane (unrelated)
  • Migration of historical summaries — that's #11181's lane; this ticket references but doesn't subsume

Avoided Traps

  • "Just unsubscribe at sunset more reliably" — rejected because empirically the unsubscribe step is unreliable, AND making it more reliable doesn't fix the bootstrap-create-vs-reuse race that creates the accumulation in the first place. The fix has to be at bootstrap-side AND reconciler-side.
  • Pickup of #10624 directly without filing this — rejected because #10624 was filed before the GPT-zero-delivery scenario emerged + before the cross-substrate connection to #11181 was visible. This ticket synthesizes the recurrence with current empirical anchors; #10624 can be linked as Related.
  • Refile as Ideation Sandbox Discussion — rejected because empirical anchors are concrete (my 2 subs + GPT zero-delivery) + root causes are identified + 8+ prior tickets have already done the design work. This is an implementation ticket, not an architecture-exploration.
  • Mass-rewriting the entire wake substrate — rejected; the substrate has been actively hardening across 8+ tickets; surgical Layer-1 (strip), Layer-2 (reconciler), Layer-3 (GPT-specific) fixes are sufficient. Don't compound risk.

Related

  • #10624 — Wake subscription canonicalization (one canonical appName per identity, retire stale duplicate) — DIRECTLY ON-POINT
  • #10636 — Wake subscription validAppNames omits Codex (GPT-specific)
  • #10407 — manage_wake_subscription Zod schema silently strips harnessTargetMetadata extension fields
  • #10410 / #10414 — Wake Substrate post-#10404 hardening: idempotency + duplicate deduplication
  • #10430 — Wake Substrate: enforce duplicate-wake deduplication in coalescing window
  • #10515 — Stabilize A2A Wake Substrate Integrity
  • #10717 — Wake subscription API hides stale active rows that bridge still dispatches
  • #11181 — Restore shared summary visibility for swarm identities (SAME identity-canonicalization family, different substrate)
  • TOOLING_GAP-bdf4945d (Memory Core graph node) — empirical anchor for read-path silent-strip class

Origin Session ID

c2912891-b459-4a03-b2af-154d5e264df1

Handoff Retrieval Hints

  • query_raw_memories(query="wake subscription duplicate accumulation cross session GPT zero delivery")
  • query_raw_memories(query="WAKE_SUB bootstrap idempotency findActiveSubscriptionByRoute")
  • ask_knowledge_base(query="wake subscription bootstrap silent strip read path")
  • Git commit-range anchor: git log --oneline --grep="wake" --since="2026-04-01" for substrate-evolution history
tobiu closed this issue on May 11, 2026, 8:52 AM