LearnNewsExamplesServices
Frontmatter
id10780
titleCodify backup-first operational discipline before DreamMode/Sandman invocation
stateClosed
labels
documentationenhancementaimodel-experience
assignees[]
createdAtMay 5, 2026, 9:31 PM
updatedAtMay 9, 2026, 11:41 PM
githubUrlhttps://github.com/neomjs/neo/issues/10780
authorneo-opus-4-7
commentsCount2
parentIssue10671
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 9, 2026, 11:41 PM

Codify backup-first operational discipline before DreamMode/Sandman invocation

Closeddocumentationenhancementaimodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 5, 2026, 9:31 PM

Context

Surfaced 2026-05-05 during DreamMode-inactivity investigation per @tobiu's explicit hint: "before running dreammode / sandman, to backup MC. we have scripts for it."

The Memory Core has an atomic-bundle backup primitive (npm run ai:backupbuildScripts/ai/backup.mjs) that snapshots the unified MC + KB + graph + concepts + trajectories state into .neo-ai-data/backups/backup-<ISO-timestamp>/. The discipline of running backup BEFORE invoking DreamMode/Sandman (autoDream / autoGoldenPath / runSandman) is currently tribal knowledge — operator-named in session 2026-05-05 but not codified in any committed substrate.

learn/agentos/DreamPipeline.md exists as the canonical operator-facing substrate for the Dream Pipeline architecture. It does NOT currently document the pre-run backup discipline.

The Problem

Without codified pre-run discipline:

  1. Agents operating autonomously (e.g., night-shift driver per #10763 Leased Driver Pattern, or follow-up agents picking up DreamPipeline-adjacent work) may invoke runSandman / re-enable autoDream without backup, risking the same regression that triggered @tobiu's manual disable in the first place.

  2. Operators picking up DreamMode work after a multi-day gap must remember the discipline by recall rather than reading documented procedure. Tribal knowledge accumulates; new contributors hit the regression and lose state.

  3. The substrate IS observable through npm run ai:backup — but the discipline gap is between knowing the script exists and knowing to ALWAYS run it before DreamMode invocation.

The Architectural Reality

  • learn/agentos/DreamPipeline.md — canonical operator-facing substrate; natural home for the discipline
  • buildScripts/ai/backup.mjs — the atomic-bundle backup orchestrator (covers KB JSONL + MC memories+summaries JSONL + graph SQLite JSONL + concepts JSONL + trajectories JSONL)
  • package.json:18"ai:backup" npm script
  • ai/mcp/server/memory-core/config.mjs:35,41 — current operator-disabled state (false && short-circuits on autoDream + autoGoldenPath)
  • Empirical anchor: session 2026-05-05 — @tobiu manually disabled both flags 2026-05-05 13:53 (deliberate safety measure pending regression resolution)

The Fix

Add a "Pre-Run Backup Discipline" section to learn/agentos/DreamPipeline.md. Cover:

  1. The backup primitive: npm run ai:backup — what it captures, where it stores, retention policy reference
  2. The discipline: ALWAYS run backup before:
    • Re-enabling autoDream / autoGoldenPath after a deliberate disable
    • Invoking runSandman manually for the first time after a substrate change
    • Running any DreamPipeline-adjacent script that mutates Memory Core state
  3. The why: DreamPipeline mutations are graph-write-heavy; a regression mid-run can produce corrupted graph state that's expensive to recover from without a backup. Backup-first is the cheap-insurance discipline.
  4. The recovery procedure: if a DreamMode run produces unexpected state, the canonical recovery is restore-from-most-recent-backup (procedure documented in #10129 / defragChromaDB.cleanOldBackups family)

Sibling addition: cross-reference from any wake-substrate scripts (runSandman.mjs, resumeHarness.mjs if it triggers Dream-adjacent work) to the backup discipline doc.

Acceptance Criteria

  • learn/agentos/DreamPipeline.md adds a "Pre-Run Backup Discipline" section
  • Section names the canonical primitive (npm run ai:backup)
  • Section enumerates the discipline triggers (3+ scenarios above)
  • Section names the recovery procedure (restore-from-most-recent-backup)
  • runSandman.mjs (or its harness invocation point) cross-references the discipline doc
  • If feasible: runSandman halts with operator-actionable message if no recent backup exists (e.g. < 24h old); operator override flag (--skip-backup-check) for explicit bypass

Out of Scope

  • Implementing a backup-required gate in runSandman.mjs beyond the discipline doc (mechanical-gate variant — file as scope-extension if discipline alone insufficient)
  • Defining retention policy beyond what's already in buildScripts/ai/backup.mjs (TODO comment in the file)
  • Backup automation / scheduled backups (operator-territory; this ticket is operational discipline)
  • Recovery procedure end-to-end documentation (separate scope; this ticket points at it but doesn't author it)

Avoided Traps

  • Mechanical-gate-first: would block legitimate brief test runs by operators who explicitly accept the risk. Discipline-doc with named override flag is consistent with pull-request-workflow precedent.
  • Bundling with DreamMode-regression bug ticket: different surface (operational discipline vs underlying-bug investigation). Sibling family but separate triggers.
  • Single-line mention: the discipline needs explicit triggers + why + recovery procedure to be operationally complete; one-line "backup first" tribal-knowledge-shape is what already exists.

Related

  • Operator hint (2026-05-05): "before running dreammode / sandman, to backup MC. we have scripts for it" — @tobiu, session 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
  • Backup primitive: buildScripts/ai/backup.mjs (atomic-bundle, per #10129 Phase 3)
  • Adjacent observability: healthcheck features.dream block (sibling-filed this session; provides the visibility needed to enforce the discipline)
  • Underlying substrate: #10494 (DreamService regression — possible cause of the operator-disable that triggered this discipline being named)

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

Retrieval Hint: query_raw_memories(query="DreamPipeline backup-first discipline runSandman ai:backup pre-run safety operator hint 10494")

tobiu
tobiu May 7, 2026, 11:03 PM

explore: while working on DreamService code, doing frequent backups is fine. on the long run: daily backups should be sufficient, and sandman could get triggered way more often (e.g. after each sunset).

  • 2026-05-09T15:17:04Z @neo-opus-4-7 cross-referenced by #11018
  • 2026-05-09T15:18:03Z @neo-opus-4-7 cross-referenced by #11019
  • 2026-05-09T16:42:34Z @neo-opus-4-7 cross-referenced by #11029
  • 2026-05-09T21:36:39Z @neo-opus-4-7 assigned to @neo-opus-4-7

@neo-opus-4-7 - 2026-05-09T21:41:06Z

Closing as superseded — verify-before-assert correction.

@tobiu surfaced (2026-05-09): manual backup was already done today, and the orchestrator architecture (post-#11041 / #11044 / #11051 M3.5 decomposition) should now own daily scheduled backups. Codifying manual operator discipline that the orchestrator-owned scheduled task supersedes is wrong-shape.

Empirical anchor (learn/agentos/v13-path.md:90, authored this session):

"Daily backup with rotation cap (PRIO 0 — non-negotiable) — orchestrator-owned scheduled task; 30-day rotation cap (one month coverage); backup-success precondition for any DreamMode/Sandman task spawn (#10780 discipline + post-#11018-retraction architectural correction; BackupService extraction lands as M4 per-task coordinator)"

The architectural direction is BackupService extraction as M4 per-task coordinator. The discipline-codification this ticket prescribed is obsolete under that architecture.

Self-acknowledgment: I started the docs work without verify-before-assert checking the current orchestrator-architecture state. @tobiu correctly caught the wrong-shape direction before I committed any code. Reverted all unstaged docs edits.

Closing as not-planned. Replaced by the M4 BackupService extraction work referenced in v13-path.md line 90 + 188 + 193.

— @neo-opus-4-7

  • 2026-05-09T21:41:07Z @neo-opus-4-7 closed this issue
  • 2026-05-09T21:41:10Z @neo-opus-4-7 unassigned from @neo-opus-4-7
  • 2026-05-09T21:44:21Z @neo-opus-4-7 cross-referenced by #11062
  • 2026-05-09T21:58:38Z @neo-opus-4-7 cross-referenced by #11065
  • 2026-05-09T22:16:26Z @neo-opus-4-7 cross-referenced by PR #11067