LearnNewsExamplesServices
Frontmatter
id11698
titleSandman hides apoptosis null-node failure as success
stateClosed
labels
bugairegressionarchitecture
assigneesneo-gpt
createdAtMay 20, 2026, 11:15 PM
updatedAtMay 21, 2026, 12:16 AM
githubUrlhttps://github.com/neomjs/neo/issues/11698
authorneo-gpt
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 21, 2026, 12:16 AM

Sandman hides apoptosis null-node failure as success

Closed v13.0.0/archive-v13-0-0-chunk-12 bugairegressionarchitecture
neo-gpt
neo-gpt commented on May 20, 2026, 11:15 PM

Context

Operator ran npm run ai:run-sandman on 2026-05-20. The run reached GraphMaintenanceService vector apoptosis, detected 8344 orphaned nodes, then logged a DreamService error while still printing ✅ Sandman cycle complete. and exiting with code 0.

Observed stack:

[INFO] [GraphMaintenanceService] Apoptosis detected 8344 orphaned nodes. Commencing eradication...
[ERROR] [DreamService] Failed to process undigested sessions: TypeError: Cannot read properties of null (reading 'id')
    at Store.getKey (src/data/Store.mjs:554:25)
    at Store.splice (src/collection/Base.mjs:1483:45)
    at Store.splice (ai/graph/Store.mjs:161:30)
    at Store.remove (src/collection/Base.mjs:1396:14)
    at Database.removeNode (ai/graph/Database.mjs:413:18)
    at GraphService.mjs:986:43
    at Array.forEach (<anonymous>)
    at GraphService.mjs:986:21
    at Database.transaction (ai/graph/Database.mjs:474:13)
    at GraphService.removeNodes (ai/services/memory-core/GraphService.mjs:985:17)
...
✅ Sandman cycle complete.
Process finished with exit code 0

The Problem

The REM maintenance path can fail during apoptosis deletion and still look successful to the caller. This is dangerous for operator runs and future automation because the process-level success signal no longer means the REM cycle completed.

There are two coupled failure surfaces:

  1. GraphService.removeNodes() can pass an invalid/null node id into Database.removeNode() during apoptosis cleanup.
  2. DreamService.processUndigestedSessions() catches the error, logs it, does not rethrow or return a failure result, and buildScripts/ai/runSandman.mjs then prints success and sets process.exitCode = 0.

The Architectural Reality

  • ai/daemons/services/GraphMaintenanceService.mjs:49-53 calls GraphService.getOrphanedNodes() and then GraphService.removeNodes(orphaned).
  • ai/services/memory-core/GraphService.mjs:952-975 selects orphan rows from SQLite Nodes and pushes row.id into the deletion list without validating the id.
  • ai/services/memory-core/GraphService.mjs:982-988 wraps nodeIds.forEach(id => this.db.removeNode(id)) in Database.transaction().
  • ai/graph/Database.mjs:407-414 calls me.nodes.remove(nodeId) directly.
  • src/collection/Base.mjs:1294-1297 treats null as an item because typeof null === 'object'.
  • src/data/Store.mjs:542-554 then calls item[keyProperty], which throws for item === null.
  • ai/daemons/DreamService.mjs:248-254 catches and logs the failure without propagating it.
  • buildScripts/ai/runSandman.mjs:218-224 awaits DreamService.processUndigestedSessions(), then unconditionally synthesizes Golden Path, prints success, and sets exit code 0 if no exception escapes.
  • ai/graph/storage/SQLite.mjs:83-87 declares Nodes.id TEXT PRIMARY KEY without explicit NOT NULL; existing SQLite rowid-table semantics can allow historical null primary-key rows unless explicitly constrained or guarded at insert time.

The Fix

Implement a narrow hardening pass across the owning boundaries:

  1. Add validation/repair handling so GraphService apoptosis never calls Database.removeNode(null) or otherwise routes invalid ids through Store.remove().
  2. Prevent future invalid graph node ids from entering SQLite through SQLite.addNodes() / graph upsert paths, with an explicit diagnostic rather than a latent corrupt row.
  3. Ensure DreamService.processUndigestedSessions() exposes failure to callers, either by rethrowing after logging or by returning a structured failure result consumed by runSandman.
  4. Make runSandman fail non-zero whenever REM processing failed, while preserving the heavy-maintenance lease semantics and the held-lease early-exit behavior.
  5. Add focused unit coverage for the null-id apoptosis guard and the Sandman/DreamService failure propagation contract.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
GraphService.removeNodes(nodeIds) This ticket + 2026-05-20 Sandman stack Invalid/null ids are rejected or repaired before Database.removeNode(); no Store.getKey(null) TypeError Emit explicit diagnostic naming invalid node ids / corrupt rows JSDoc on removeNodes() and/or getOrphanedNodes() Unit test passes [null] or corrupt orphan row and proves no Store.getKey(null) crash
SQLite node insert path This ticket + schema audit New node writes require a non-empty string id Throw explicit invalid-node-id error before insert JSDoc on SQLite.addNodes() Unit test proves null/undefined node id is rejected before SQLite persistence
DreamService.processUndigestedSessions() This ticket + runSandman caller contract REM failures are observable to callers Structured failure result if rethrow would break existing caller JSDoc on method return/error contract Unit test proves GraphMaintenanceService failure reaches caller
npm run ai:run-sandman Operator CLI contract Fatal REM failure exits non-zero and does not print successful completion Held lease still exits 0 without mutation, as today Script comments near exitCode handling Script-level test or unitized dependency-injected test proves failure => exitCode 1

Acceptance Criteria

  • Regression test reproduces the Store.getKey(null) path from apoptosis deletion and verifies the new guard prevents the TypeError.
  • Invalid/null graph node ids cannot be inserted through the normal SQLite addNodes() persistence path without an explicit error.
  • DreamService.processUndigestedSessions() no longer silently swallows a GraphMaintenanceService failure from callers.
  • buildScripts/ai/runSandman.mjs exits non-zero and does not print ✅ Sandman cycle complete. when REM processing fails.
  • Held heavy-maintenance lease behavior remains unchanged: held lease exits cleanly without running REM or decay.
  • Post-merge validation: rerun npm run ai:run-sandman; the previous Cannot read properties of null (reading 'id') error is gone, and any future fatal REM error returns non-zero.

Out of Scope

  • Manually deleting or rewriting all current orphan rows outside the tested repair/guard path.
  • Reworking vector-dimension mismatch configuration (gemini-embedding-001 3072 vs configured 4096); that warning is visible in the same log but is a separate configuration/collection-dimension concern.
  • Changing apoptosis retention policy or protected node labels.
  • Reworking heavy-maintenance lease acquisition semantics.

Avoided Traps

  • Treating the exit code 0 as success — rejected. The logged DreamService error falsifies the success claim.
  • Folding this into #11595 — rejected. #11595 fixed string-shaped rollback payloads; this stack reaches Store.getKey(null) before that rollback failure class.
  • Only changing Collection.Base.isItem(null) — rejected as an insufficient sole fix. It may avoid this TypeError, but it does not explain or prevent invalid graph ids entering apoptosis or fix the swallowed REM failure.
  • Only skipping null ids in apoptosis — rejected as incomplete. It leaves Sandman able to hide future fatal REM failures behind exit code 0.

Related

  • #11595 / PR #11611 — adjacent prior apoptosis rollback shape bug, closed/merged.
  • ai/daemons/services/GraphMaintenanceService.mjs:49-53
  • ai/services/memory-core/GraphService.mjs:952-988
  • ai/graph/Database.mjs:407-414
  • src/collection/Base.mjs:1294-1297
  • src/data/Store.mjs:542-554
  • ai/daemons/DreamService.mjs:248-254
  • buildScripts/ai/runSandman.mjs:218-224

Origin Session ID: d13c94dd-e721-4e28-ac9e-4d0b3c0f66de Retrieval Hint: query_raw_memories("Sandman apoptosis null node Store.getKey exit code 0 DreamService") Retrieval Hint: query_raw_memories("GraphService removeNodes null id runSandman success after error")

tobiu referenced in commit 2f92821 - "fix(ai): harden Sandman REM failure handling (#11698) (#11699) on May 21, 2026, 12:16 AM
tobiu closed this issue on May 21, 2026, 12:16 AM