LearnNewsExamplesServices
Frontmatter
id10643
titlecheckSunsetted ORDER BY picks legacy rows for originSessionId extraction
stateClosed
labels
bugaiarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 3, 2026, 4:16 PM
updatedAtMay 3, 2026, 9:06 PM
githubUrlhttps://github.com/neomjs/neo/issues/10643
authorneo-opus-4-7
commentsCount0
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[x] 10641 checkSunsetted false-positive on stale memory spawns orphan sessions
blocking[]
closedAtMay 3, 2026, 9:06 PM

checkSunsetted ORDER BY picks legacy rows for originSessionId extraction

Closedbugaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 3, 2026, 4:16 PM

Context

Surfaced during the diagnostic round on #10641 (orphan-spawn root cause). The staleness false-positive in checkSunsetted.mjs is being removed by PR #10642, but the underlying ORDER BY in the predicate's AGENT_MEMORY query has an independent bug that survives the staleness fix and corrupts originSessionId extraction.

Empirical anchor (2026-05-03):

$ node ai/scripts/checkSunsetted.mjs '@neo-opus-4-7'
{"identity":"@neo-opus-4-7","sunsetted":false,"reason":"","originSessionId":"b1839431-cba1-4b6d-913f-27b09e472e67"}

Returned originSessionId is b1839431... — that's the previous session, not the current 9766f91c-51f8-44fe-ac34-d79f61a0e1bf. Verified via direct SQL against the Nodes table:

-- Top-3 rows by checkSunsetted's exact ORDER BY:
SELECT id, json_extract(data, '$.properties.name') as nm,
       json_extract(data, '$.properties.timestamp') as ts
FROM Nodes
WHERE json_extract(data, '$.label') = 'AGENT_MEMORY'
  AND (json_extract(data, '$.properties.agentIdentity') = '@neo-opus-4-7'
    OR json_extract(data, '$.properties.userId') = '@neo-opus-4-7')
ORDER BY COALESCE(json_extract(data, '$.properties.timestamp'),
                  json_extract(data, '$.properties.name')) DESC
LIMIT 3;

7026238f | nm="Memory: 2026-05-03T10:02:45.446Z" | ts=NULL
eb179635 | nm="Memory: 2026-05-03T10:00:15.622Z" | ts=NULL
ddf89005 | nm="Memory: 2026-05-03T09:35:37.999Z" | ts=NULL

Top-3 are all legacy rows with structured timestamp field NULL but name="Memory: <iso>" populated. Fresh structured rows (with timestamp="2026-05-03T13:50:19..." populated) sort BELOW them.

The Problem

COALESCE(timestamp, name) DESC returns:

  • Legacy rows: COALESCE(NULL, "Memory: 2026-05-03T10:02:45.446Z")"Memory: ..." starting with 'M' (0x4D)
  • Fresh rows: COALESCE("2026-05-03T13:50:19.871Z", anything)"2026-05-03T..." starting with '2' (0x32)

Lexical comparison: 'M' > '2' (ASCII 0x4D vs 0x32). Legacy rows always sort first regardless of actual recency.

Consequences post-#10641:

  • originSessionId extraction returns the legacy row's sessionId instead of the most recent session
  • Fresh-session-spawn boot prompts (when sunset DOES fire genuinely via Unsubscribe) will carry the wrong Origin Session ID, pointing the next agent at a stale session for context-priming
  • Memory Core context-priming reads the wrong session ID → query_summaries / query_raw_memories retrieve unrelated context

The Architectural Reality

  • Predicate query: ai/scripts/checkSunsetted.mjs:36-48 — the memStmt SELECT
  • Migration block: ai/scripts/checkSunsetted.mjs:57-81 — runs only on the top-1 row, never catches up to all legacy rows in a single sweep
  • Consumer: ai/scripts/swarm-heartbeat.sh:144 reads originSessionId from the JSON, forwards to resumeHarness.mjs which embeds it in buildBootGroundingPrompt
  • Post-#10641: lastMemTime is no longer used for the sunset decision; only originSessionId extraction remains as a consumer of this query

The Fix

Three viable shapes (rank-ordered by simplicity):

Option A (preferred): Force migration before query. Run a one-shot UPDATE that migrates all legacy AGENT_MEMORY rows for the target identity in a single statement at the top of main(). After migration, all rows have structured timestamp and the ORDER BY works correctly.

Option B: Filter migrated-only. Add AND json_extract(data, '$.properties.timestamp') IS NOT NULL to the WHERE clause. Skips legacy rows entirely; relies on at least one fresh row existing. Edge case: agent whose ALL memories are legacy (pre-migration) returns NULL — degrade gracefully.

Option C: Sort by id descending. Replace ORDER BY COALESCE(timestamp, name) with ORDER BY id DESC. UUIDs are insertion-ordered; latest-inserted wins. Bypasses the lexical-vs-structured tension entirely. Migration block can run separately; query result is unaffected by legacy timestamp NULLs.

Option A preserves the most semantic information (migrated rows become queryable correctly for other consumers); Option C is the smallest diff. Pick at implementation time based on whether other consumers of the same query exist.

Acceptance Criteria

  • checkSunsetted.mjs predicate returns the most-recent-by-actual-time AGENT_MEMORY row for the target identity, regardless of whether that row is legacy or structured
  • Live run node ai/scripts/checkSunsetted.mjs '@neo-opus-4-7' returns the current session's originSessionId, not a legacy session ID
  • Test coverage: insert a legacy row (no timestamp, name="Memory: <ISO>") AND a fresh row with timestamp="<later-ISO>" for the same identity; verify the predicate returns the fresh row's session ID
  • No regression on the existing migration test (legacy row still gets migrated on-read if Option A is chosen)
  • Anchor & Echo block on the predicate captures the chosen fix shape's rationale

Out of Scope

  • Re-introducing the staleness-as-sunset-signal branch (rejected in #10641; explicitly excluded)
  • Bulk migration of all legacy AGENT_MEMORY rows across the entire DB (separate migration ticket if warranted; this fix only needs to make the predicate return correct data)
  • Changing the JSON output shape of checkSunsetted.mjs (would break swarm-heartbeat.sh parser)

Avoided Traps

  • Trap: "Just trust the migration block — it'll catch up over many heartbeat cycles." Rejected. The migration block runs only on the top-1 row per query. With many legacy rows for the same identity, the top-1 keeps being a legacy row across cycles until ALL legacy rows older than the most-recent-legacy are migrated. Convergence is slow and probabilistic; not deterministic.
  • Trap: "Add a tiebreaker sort key (ORDER BY ... DESC, id DESC)." Rejected as insufficient. The primary sort key still puts legacy rows first; tiebreaker only matters within the same primary key value.

Related

  • Parent: #10601 (auto-wakeup substrate Epic — native SUB_ISSUE link to follow)
  • Sibling: #10641 / PR #10642 (the staleness-branch removal that made this bug operationally relevant)
  • Sibling: #10627 (steady-state set_session_id rotation — adjacent originSessionId handling)
  • Loose anchor: #10117 (DevIndex absent users — different surface, similar lexical-vs-structured-data class of bug)

Origin Session ID

Origin Session ID: 9766f91c-51f8-44fe-ac34-d79f61a0e1bf

Retrieval Hint

query_raw_memories("checkSunsetted ORDER BY legacy row originSessionId b1839431") or git log for the SQL excerpt above. Empirical anchor: live execution of node ai/scripts/checkSunsetted.mjs '@neo-opus-4-7' from the worktree at claude/issue-10641-checksunsetted-staleness-removal post-#10641-fix returning a stale session ID.

tobiu referenced in commit 7a7d362 - "docs(agentos): add wake substrate incident protocol (#10650) (#10655) on May 3, 2026, 7:00 PM
tobiu referenced in commit 5a9e6e4 - "fix(ai): bulk migrate legacy memory rows to fix staleness bug (#10643) (#10659) on May 3, 2026, 9:06 PM
tobiu closed this issue on May 3, 2026, 9:06 PM