Context
With the successful merge of PR #10221 (resolving #10190) and empirical validation via full swarm restart cycles, the underlying Database cache coherence vulnerabilities have been resolved. The temporary band-aids put in place prior to the substrate fix (#10185 and #10182) are now obsolete and need to be removed to reduce operational complexity and avoid masking future regressions. Additionally, peer review on PR #10221 identified missing architectural documentation for future telemetry.
The Problem
During the cache coherence diagnostic phase, two temporary mitigations were introduced into Server.mjs:
- #10185 Retry Loop: Added to
bindAgentIdentity to poll for graph nodes, attempting to mask the symptom of fresh-boot cache divergence.
- #10182 Self-Heal: Added to the
CallToolHandler to late-bind identities if they were missed during boot.
Since #10190 natively fixed the cache coherence issue by stripping the lastSyncId guard, GraphService.getNode is now guaranteed to return accurate results without needing artificial delays or retries. Keeping these band-aids introduces technical debt and makes the code harder to reason about.
The Architectural Reality
ai/mcp/server/memory-core/Server.mjs: bindAgentIdentity contains an obsolete 3-retry polling loop with a 200ms sleep and vicinity tracker resets. The tool call handler contains the #10182 late-binding fallback.
ai/graph/Database.mjs: Lines 78 and 281 lack the "Anchor & Echo" rationale for the #10190 changes (removing the lastSyncId guard and the empty-vicinity mark), leaving a knowledge gap for future debugging. The syncCache JSDoc also omits the new invariant that fresh-boots are legitimate triggers that invalidate the cache.
The Fix
- Remove Retry Loop: Strip the 3-attempt polling logic from
Server.mjs:bindAgentIdentity. A single GraphService.getNode({id: graphNodeId}) call is now sufficient since await GraphService.ready() ensures the cache is coherent.
- Remove Self-Heal: Strip the
!this.stdioIdentity.agentIdentityNodeId fallback block from the CallToolRequestSchema handler in Server.mjs.
- Anchor & Echo: Add inline comments citing #10190 / ADR 0001 Bug A (at
Database.mjs:78) and Bug B (at Database.mjs:281-283).
- JSDoc Extension: Update the
syncCache JSDoc to explicitly state that lastSyncId=0 triggers a legitimate catch-up rather than a skip, and clarify that it invalidates stale entries rather than upserting new ones.
Acceptance Criteria
Out of Scope
- Modifying the underlying
GraphLog or Database.mjs execution logic (this is strictly a cleanup and documentation pass).
- Any modifications to the
MailboxService or A2A handshakes.
Avoided Traps
- Trap: Leaving the band-aids as "defense in depth." Why avoided: It masks potential future regressions in the
Database coherence layer. If the cache is incoherent, we want the identity binding to fail so the substrate issue is surfaced immediately, rather than silently retrying.
Related
Origin Session ID: e068b094-fcae-436a-a9ab-c513246f7f71
Context
With the successful merge of PR #10221 (resolving #10190) and empirical validation via full swarm restart cycles, the underlying
Databasecache coherence vulnerabilities have been resolved. The temporary band-aids put in place prior to the substrate fix (#10185 and #10182) are now obsolete and need to be removed to reduce operational complexity and avoid masking future regressions. Additionally, peer review on PR #10221 identified missing architectural documentation for future telemetry.The Problem
During the cache coherence diagnostic phase, two temporary mitigations were introduced into
Server.mjs:bindAgentIdentityto poll for graph nodes, attempting to mask the symptom of fresh-boot cache divergence.CallToolHandlerto late-bind identities if they were missed during boot.Since #10190 natively fixed the cache coherence issue by stripping the
lastSyncIdguard,GraphService.getNodeis now guaranteed to return accurate results without needing artificial delays or retries. Keeping these band-aids introduces technical debt and makes the code harder to reason about.The Architectural Reality
ai/mcp/server/memory-core/Server.mjs:bindAgentIdentitycontains an obsolete 3-retry polling loop with a 200ms sleep and vicinity tracker resets. The tool call handler contains the#10182late-binding fallback.ai/graph/Database.mjs: Lines 78 and 281 lack the "Anchor & Echo" rationale for the #10190 changes (removing thelastSyncIdguard and the empty-vicinity mark), leaving a knowledge gap for future debugging. ThesyncCacheJSDoc also omits the new invariant that fresh-boots are legitimate triggers that invalidate the cache.The Fix
Server.mjs:bindAgentIdentity. A singleGraphService.getNode({id: graphNodeId})call is now sufficient sinceawait GraphService.ready()ensures the cache is coherent.!this.stdioIdentity.agentIdentityNodeIdfallback block from theCallToolRequestSchemahandler inServer.mjs.Database.mjs:78) and Bug B (atDatabase.mjs:281-283).syncCacheJSDoc to explicitly state thatlastSyncId=0triggers a legitimate catch-up rather than a skip, and clarify that it invalidates stale entries rather than upserting new ones.Acceptance Criteria
bindAgentIdentityinServer.mjsno longer contains a polling loop or vicinity cache deletion.Server.mjsno longer contains the#10182self-heal logic.ai/graph/Database.mjscontains explicit inline comments detailing the "WHY" for both Bug A and Bug B resolutions.syncCacheJSDoc explicitly documents its invariant behavior regarding fresh boots and invalidation vs upserts.Out of Scope
GraphLogorDatabase.mjsexecution logic (this is strictly a cleanup and documentation pass).MailboxServiceor A2A handshakes.Avoided Traps
Databasecoherence layer. If the cache is incoherent, we want the identity binding to fail so the substrate issue is surfaced immediately, rather than silently retrying.Related
Origin Session ID: e068b094-fcae-436a-a9ab-c513246f7f71