LearnNewsExamplesServices
Frontmatter
id10460
titleDreamService marks graphDigested:true on partial MemorySessionIngestor failures
stateClosed
labels
bugaiarchitecturecore
assigneesneo-gpt
createdAtApr 28, 2026, 10:49 AM
updatedAtMay 1, 2026, 3:46 PM
githubUrlhttps://github.com/neomjs/neo/issues/10460
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 1, 2026, 3:46 PM

DreamService marks graphDigested:true on partial MemorySessionIngestor failures

Closedbugaiarchitecturecore
neo-opus-4-7
neo-opus-4-7 commented on Apr 28, 2026, 10:49 AM

Context

Discovered during a 2026-04-28 runSandman cycle with empirical evidence that failed memory ingestion is silently masked as full ingestion, leading to permanently invisible memory rows in the graph. The trigger today was the post-FileSystemIngestor SQLite IN-clause overflow (separate ticket), but the corruption pattern itself is independent of any specific trigger — it's a gating bug in DreamService.processUndigestedSessions.

The Problem

When MemorySessionIngestor.syncSessionToGraph partially fails — e.g., 100 of 133 memories upserted, 33 errored on individual GraphService.upsertNode calls — those 33 errors are caught silently in a per-memory try/catch and accumulated into stats.errors. The session-level loop in DreamService then proceeds to call SemanticGraphExtractor.executeTriVectorExtraction(session), which reads session.document (text-only, no graph dependency). If the LLM extraction succeeds, DreamService marks the session graphDigested: true regardless of how many memories actually made it into the graph.

On the next REM cycle, findUndigestedSessions filters that session out (graphDigested === true). The 33 missing memories never get re-attempted. Lazy back-fill (#10153) only triggers when a future extraction emits an edge referencing a missing memory:<id> — if no such reference materialises, the memories stay orphaned permanently.

This produces the observed pattern: "latest session summaries missing" — memories silently dropped during ingestion of recent sessions, no log signal loud enough to surface, no re-attempt path.

The Architectural Reality

Three converging conditions:

  1. ai/daemons/services/MemorySessionIngestor.mjs:196-231 — per-memory try/catch swallows errors:

       for (let i = 0; i < rawMemories.ids.length; i++) {
        try {
            // upsertNode + linkNodes
            stats.memoriesUpserted++;
        } catch (e) {
            stats.errors.push(`[${rawMemories.ids[i]}] ${e.message}`);
        }
    }
  2. ai/daemons/DreamService.mjs:202-205 — never reads ingestStats.errors:

       const ingestStats = await MemorySessionIngestor.syncSessionToGraph(session);
    logger.info(`[DreamService]   -> Memory/Session graph ingestion took: ${ingestTime}s (${ingestStats.memoriesUpserted} upserted, ${ingestStats.memoriesSkipped} skipped)`);

    Log format omits errors count entirely.

  3. ai/daemons/DreamService.mjs:224-228graphDigested gate is LLM-extraction-only:

       if (success) {
        await this.sessionsCollection.update({
            ids: [session.id],
            metadatas: [{ ...session.meta, graphDigested: true }]
        });
    }

    success reflects only SemanticGraphExtractor.executeTriVectorExtraction's return value; memory-ingestion errors don't gate this.

SemanticGraphExtractor's own error path is structurally safe — non-fetch errors return null (line 264), so partial graph-extraction writes don't poison graphDigested. The dangerous path is exclusively MemorySessionIngestor → DreamService.

The Fix

Tighten the gate in ai/daemons/DreamService.mjs per-session loop:

const ingestStats = await MemorySessionIngestor.syncSessionToGraph(session);
const ingestErrors = ingestStats.errors?.length ?? 0;

if (ingestErrors > 0) {
    logger.warn(`[DreamService] Session ${session.meta.sessionId} had ${ingestErrors} memory-ingestion error(s); graphDigested will NOT be set this cycle.`);
}

// ... existing extractor + topology + gap inference calls ...

if (success && ingestErrors === 0) {
    await this.sessionsCollection.update({
        ids: [session.id],
        metadatas: [{ ...session.meta, graphDigested: true }]
    });
    logger.info(`[DreamService] Session ${session.meta.sessionId} marked as graphDigested in Memory Core.`);
}

This makes ingestion errors self-healing: failed memories stay un-digested → re-attempted on next REM cycle → the offending error (e.g., transient SQLite saturation) typically clears between runs.

Also surface the error count in the existing INFO-tier ingestion log so operators have a fast-path signal:

logger.info(`[DreamService]   -> Memory/Session graph ingestion took: ${ingestTime}s (${ingestStats.memoriesUpserted} upserted, ${ingestStats.memoriesSkipped} skipped, ${ingestErrors} errors)`);

Acceptance Criteria

  • DreamService.processUndigestedSessions reads ingestStats.errors.length and only sets graphDigested: true if extraction succeeded AND ingestion errors are zero
  • WARN-level log emitted when ingestStats.errors.length > 0, including session ID and error count
  • INFO-level ingestion log includes error count in the existing format
  • Existing test test/playwright/unit/ai/daemons/services/MemorySessionIngestor.spec.mjs extended (or new sibling spec) verifies that simulated per-memory errors block graphDigested propagation
  • Empirical verification: post-fix runSandman with simulated injected upsertNode failure on 1 memory shows session NOT marked graphDigested and is re-attempted on the next cycle
  • Commit subject ends with (#TICKET_ID) per AGENTS.md §3 Gate 1; type=fix, scope=ai

Out of Scope

  • Backfilling already-orphaned memories from prior runs. A separate substrate (graph-integrity audit, filed as sibling ticket) handles detection + backfill of pre-existing silent drops.
  • Eliminating the per-memory try/catch itself. The catch is correct in shape — the bug is the missing downstream gate, not the catch's existence.
  • Distinguishing transient from permanent ingestion errors. Future work could classify error severity (e.g., SQLITE_BUSY retryable vs schema-mismatch fatal); for now, treat any error as "session re-attempt needed."

Avoided Traps

  • Re-throwing per-memory errors out of MemorySessionIngestor. Rejected. The session-level loop should continue processing other memories even when one fails — partial progress is real progress as long as the session isn't prematurely marked digested. Bubble the count, not the throw.
  • Adding a new state value graphDigested: 'partial'. Rejected. Three-state flag adds query complexity downstream (findUndigestedSessions would need to handle 'partial' specifically). Boolean is sufficient if the gate is correct.
  • Auto-retry within the same REM cycle. Rejected. Same-cycle retry of a transient SQLite saturation is unlikely to clear (the saturation is the reason MemorySessionIngestor failed in the first place). Cross-cycle retry is the correct interval — typically the delta log gets consumed, the saturation clears, the next run succeeds.

Related

  • Trigger context: SQLite IN-clause overflow in getDeltaLog (sibling ticket filed same session — saturated GraphLog post-FileSystemIngestor produces the per-memory upsertNode errors)
  • Detection substrate: graph-integrity audit (sibling ticket — periodic SESSION→memory completeness check to surface historical silent drops)
  • Adjacent: #10153 (lazy back-fill mechanism — only triggers on edge reference, doesn't cover orphaned memories that are never referenced)
  • Adjacent: #10143 (Memory + Session as first-class graph nodes — shipped 2026-04-21; this ticket is a gating bug in the consumer of that substrate)

Origin Session ID: 4bb6859b-860f-440d-9055-320e20b0ee22

Retrieval Hint: MemorySessionIngestor silent per-memory error swallow graphDigested premature true partial-ingestion mask

tobiu referenced in commit 105d0d9 - "fix(ai): block graphDigested on ingestion errors (#10460) (#10581) on May 1, 2026, 3:46 PM
tobiu closed this issue on May 1, 2026, 3:46 PM