LearnNewsExamplesServices
Frontmatter
id10494
titleFix DreamService token exhaustion and enforce PR priority hierarchy
stateOpen
labels
bugai
assigneesneo-opus-4-7
createdAtApr 29, 2026, 4:01 PM
updatedAtMay 18, 2026, 11:18 PM
githubUrlhttps://github.com/neomjs/neo/issues/10494
authorneo-gemini-3-1-pro
commentsCount3
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]

Fix DreamService token exhaustion and enforce PR priority hierarchy

Openbugai
neo-gemini-3-1-pro
neo-gemini-3-1-pro commented on Apr 29, 2026, 4:01 PM

Context

During the Sandman daemon's REM-cycle execution (runSandman.mjs), the SemanticGraphExtractor repeatedly hit token exhaustion (FINAL EXHAUSTED RAW LLM DUMP) when generating the session_artifact.graph.nodes array. Additionally, we identified a need to structurally prioritize swarm task routing so that PRs and Issues take precedence over Discussions.

The Problem

  1. Token Exhaustion: The Tri-Vector extraction prompt attempts to generate a large amount of JSON. When it hits the model's token ceiling, it truncates the payload (e.g., ending abruptly at "type": "SESSION",). The strict validation rejects this, and the retry loop asks the LLM to fix it, which inevitably hits the ceiling again, creating an extraction deadlock.
  2. Task Prioritization: The GoldenPathSynthesizer lacked mathematical enforcement of a workflow hierarchy, occasionally causing the Dream Pipeline to prioritize stagnant discussions over actionable PRs.

The Architectural Reality

  • ai/daemons/services/SemanticGraphExtractor.mjs: Handles Tri-Vector synthesis. Strict schema validation does not gracefully handle truncated graph.nodes or graph.edges.
  • ai/daemons/services/GoldenPathSynthesizer.mjs: Handles SQL queries identifying the active frontier but missed a deterministic priority layering mechanism.

The Fix

  1. SemanticGraphExtractor: Relaxed JSON validation. If session_artifact is extracted but graph nodes/edges are missing/truncated, default them to [] instead of rejecting the payload, preventing the recursive retry crash.
  2. GoldenPathSynthesizer: Added tiered priority bonuses (+1000 for PRs, +500 for Issues, +100 for Discussions) and updated the SQL query to reliably detect OPEN status by checking the root-level state.

Acceptance Criteria

  • SemanticGraphExtractor digests sparse/truncated JSON payloads without exhaustion loops.
  • GoldenPathSynthesizer mathematically ranks PRs above Discussions.
  • Sandman daemon completes full backlog extraction without shape validation errors.

Out of Scope

  • Modifying the underlying SQLite graph schema.
  • Increasing the max_tokens configuration in OpenAiCompatible.mjs (this ticket addresses the structural resilience, not the token ceiling itself).

Origin Session ID: 9b6e7550-a8fa-4ab0-b502-5f5c97180068

tobiu referenced in commit 6df0d23 - "fix(daemons): relax Tri-Vector validation and codify Golden Path priority (#10495) on Apr 30, 2026, 3:15 AM
tobiu
tobiu May 5, 2026, 6:21 PM

evaluate if we should drop this one, since it highly conflicts with hebian decay.

  • 2026-05-05T19:17:17Z @neo-opus-4-7 assigned to @neo-opus-4-7

@neo-opus-4-7 - 2026-05-05T19:30:02Z

Input from @neo-opus-4-7:

◆ ## Empirical state reconciliation — ticket is in stale shape post-#10495

Investigated 2026-05-05 per @tobiu's hint about the DreamMode regression. Empirical findings:

Fix #1 (relax JSON validation) — IMPLEMENTED via #10495

PR #10495 (commit 6df0d2366, 2026-04-30, @neo-gemini-3-1-pro): "fix(daemons): relax Tri-Vector validation and codify Golden Path priority (#10494)". Sub-commits explicitly reference (#10494). The relaxed-validation logic is present in ai/daemons/services/SemanticGraphExtractor.mjs:137-144 — defaults graph.nodes and graph.edges to [] when missing/truncated rather than rejecting the payload. This matches this ticket's body-proposed Fix #1 exactly.

Fix #2 (+1000/+500/+100 priority bonuses) — REJECTED by current code

ai/daemons/services/GoldenPathSynthesizer.mjs:41-44 (docstring): "We avoid hardcoded multiplier bonuses (e.g., for PRs) to prevent zeroing out the natural physics simulation of the queue. Task state is queried via both $.properties.state and $.state to ensure reliable OPEN detection across varying JSON schemas."

The WHERE clause check (line 126: (json_extract(n.data, '$.properties.state') = 'OPEN' OR json_extract(n.data, '$.state') = 'OPEN')) IS present — that's the OPEN-state-detection portion of this ticket's Fix #2. But the priority-multiplier portion (+1000 PR / +500 Issue / +100 Discussion) is deliberately not applied, per the docstring's anti-multiplier reasoning. #10495 implemented the OPEN-state check but rejected the multiplier shape.

Current operator-side state

autoDream AND autoGoldenPath are deliberately disabled in operator-local ai/mcp/server/memory-core/config.mjs:35,41 via false && short-circuits (config.mjs last modified 2026-05-05 13:53; deliberate manual edit per @tobiu). The deployed config bypasses both env vars regardless of AUTO_DREAM / AUTO_GOLDEN_PATH settings.

The disable is not documented in this ticket's body — body still proposes the rejected multiplier fix as if active.

Recommended ticket close shape

This ticket should close as RESOLVED_BY_RELATED_PR (#10495) with two residuals tracked elsewhere:

  1. Operator-local autoDream + autoGoldenPath disable (the actual current "DreamMode inactive" symptom @tobiu flagged) — sibling concern, not within this ticket's original scope. Cause-of-disable investigation would benefit from its own ticket.
  2. Body-vs-code drift detection (Fix #2 still proposed in body despite implementation rejection) — friction-into-gold candidate; ticket-body-staleness pattern is filing-worthy substrate-improvement (sibling family with #10646 Require live open-queue sweep).

Substrate-improvement follow-ups I'm filing

  • features.dream healthcheck observability — operators (and agents) currently can't see autoDream / autoGoldenPath actual state without grepping the gitignored config. Symmetric to #10723 providers.embedding / #10770 providers.auth.
  • DreamPipeline.md backup-first operational discipline — @tobiu's hint 2026-05-05: "before running dreammode / sandman, to backup MC. we have scripts for it" (npm run ai:backupbuildScripts/ai/backup.mjs atomic bundle). Currently tribal knowledge; should be codified in learn/agentos/DreamPipeline.md.

Standing by — if @tobiu wants this ticket closed as RESOLVED_BY_RELATED_PR (#10495), the residuals above are the structural follow-up shape. Otherwise: investigation into the actual cause-of-disable would need explicit operator GO since it requires controlled re-enable + observation under backup-first discipline.

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

  • 2026-05-05T19:30:41Z @neo-opus-4-7 cross-referenced by #10779
  • 2026-05-05T19:31:14Z @neo-opus-4-7 cross-referenced by #10780
  • 2026-05-05T19:45:18Z @neo-opus-4-7 cross-referenced by #10671
  • 2026-05-09T21:58:38Z @neo-opus-4-7 cross-referenced by #11065

@neo-gpt - 2026-05-18T21:18:15Z

Input from GPT-5 (Codex):

● ## Live Sandman V-B-A update — Tri-Vector AC still failing, but not the old partial-graph shape

Operator ran /opt/homebrew/bin/npm run ai:run-sandman from /Users/Shared/github/neomjs/neo on dev == origin/dev (3271fb0281b2cd81fca84f403c0696b76e74d930) on 2026-05-18.

Observed behavior: every processed session hit three SemanticGraphExtractor validation failures, then logged FINAL EXHAUSTED RAW LLM DUMP with no visible payload before continuing. This means the current live failure is not the previously-fixed #10495 case where session_artifact exists but graph.nodes / graph.edges are malformed or truncated. The relaxed-validation guard at SemanticGraphExtractor.mjs only runs after payload.session_artifact exists.

Code evidence:

  • ai/daemons/services/SemanticGraphExtractor.mjs still retries only on !payload || !payload.session_artifact and logs raw result.content after attempt 3.
  • OpenAiCompatible.generate() delegates to stream(), concatenates yielded delta chunks, and returns {content: fullContent}. Empty provider output currently becomes a schema-validation failure, not a provider-boundary diagnostic.
  • TopologyInferenceEngine also uses OpenAiCompatible.generate(), so any provider/streaming-boundary issue could silently turn into empty/no-op extraction there too.

Ticket-shape implication: #10494 remains the right existing anchor for the Tri-Vector validation AC. Do not duplicate it as a new ticket yet. The next author should narrow this ticket or add a successor only if they decide to retire the stale PR-priority half. The concrete implementation target is provider-output observability / blank-output handling, not another graph.nodes relaxation.

Suggested AC refresh:

  • SemanticGraphExtractor distinguishes malformed JSON from blank provider output and logs an actionable diagnostic with provider host/model and response-shape class.
  • Unit coverage stubs OpenAiCompatible.generate() returning {content: ''} and verifies the failure is classified without pretending this is a graph-schema repair case.
  • If the OpenAI-compatible server returns non-SSE JSON while stream: true, OpenAiCompatible.stream() should either parse the response correctly or throw a provider-protocol error; it must not silently yield empty content.

Related new surfaces from the same Sandman run are separate and should be tracked separately: graph apoptosis rollback crash and Golden Path embedding-dimension drift.