Context
Neo already has a canonical local backup primitive: npm run ai:backup invokes buildScripts/ai/backup.mjs, which writes an atomic bundle under .neo-ai-data/backups/backup-<ISO-timestamp>/. The existing script captures the Knowledge Base, Memory Core memories and summaries, the Memory Core graph, Concept Ontology JSONL, and RLAIF trajectories in one timestamped tree.
That covers the manual snapshot substrate. The remaining operational gap is automation: operators and agents can run the backup script, but the system does not yet guarantee a recent scheduled snapshot, expose the latest successful backup in healthcheck output, or document a restoration runbook for routine operations.
This ticket is preventive operational substrate / MX hardening. It turns the existing backup primitive into an observable daily safety loop without changing the backup bundle format.
The Problem
Manual backup discipline does not scale cleanly across a multi-agent runtime. A backup script can be correct and discoverable while still depending on human recall at exactly the moments when agents are focused on graph-heavy work, scheduled daemons, migrations, or deployment changes.
The current substrate also leaves three follow-up questions for operators:
- Is there a recent successful backup?
- How long are daily backups retained?
- Which documented steps restore from the latest known-good bundle?
Without a queryable backup.lastSuccessful signal, agents cannot cheaply validate backup freshness before starting work that depends on Memory Core or Knowledge Base persistence.
The Architectural Reality
package.json:18 exposes npm run ai:backup.
buildScripts/ai/backup.mjs:16-24 documents the atomic bundle shape for kb/, mc/, graph/, concepts/, and trajectories.
buildScripts/ai/backup.mjs:35-45 and buildScripts/ai/backup.mjs:136-139 explicitly defer retention implementation.
ai/mcp/server/memory-core/config.mjs:232 defines .neo-ai-data/backups as the Memory Core backup/export directory.
ai/mcp/server/memory-core/services/HealthService.mjs:671-703 builds the Memory Core healthcheck payload, but there is currently no backup freshness block.
ai/mcp/server/knowledge-base/services/HealthService.mjs:181-198 builds the Knowledge Base healthcheck payload, likewise without backup freshness observability.
- #10129 established the atomic backup bundle as the canonical manual snapshot substrate.
- #10780 documents backup-first operational discipline before DreamMode/Sandman work; this ticket is broader scheduled automation and observability, not another manual-discipline ticket.
The Fix
Implement a daily automated snapshot pipeline that invokes the existing canonical backup script rather than duplicating backup logic.
The implementation may use either a scheduled GitHub Actions workflow or a daemon-side scheduled task, but it must preserve the existing substrate boundary:
buildScripts/ai/backup.mjs remains the canonical snapshot orchestrator.
- Automation is responsible for schedule, retention, status recording, and operator-visible failure reporting.
- Healthcheck exposes a structured backup block, at minimum
backup.lastSuccessful, so agents can verify freshness without scraping filesystem state.
- A restoration runbook documents how to identify and restore the latest known-good bundle.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
| Daily backup scheduler |
#10129, this ticket |
Runs the canonical npm run ai:backup path once per day and records status metadata for the attempt. |
If the backup fails, no data stores are mutated; failure is recorded and surfaced to operators. |
Scheduled workflow or daemon docs explain where the schedule lives. |
Dry-run or integration evidence showing the scheduler invokes the canonical script. |
.neo-ai-data/backups/backup-<ISO-timestamp>/ retention |
buildScripts/ai/backup.mjs retention TODO |
Applies a rolling retention policy, defaulting to 30 daily snapshots plus longer-lived weekly snapshots unless implementation justifies a different operator-friendly default. |
Freshest successful backups are never deleted by a failed sweep; malformed folders are skipped and reported. |
Inline retention docs plus operator runbook. |
Unit coverage for retention selection and deletion boundaries. |
healthcheck.backup.lastSuccessful |
Memory Core / Knowledge Base healthcheck surfaces |
Healthcheck returns the timestamp of the latest successful automated backup and enough status to distinguish fresh, stale, failed, and never-run states. |
If metadata is missing, report never_run or equivalent without failing unrelated health dimensions. |
Healthcheck schema docs / JSDoc updated. |
Targeted healthcheck tests for fresh, stale, failed, and missing metadata cases. |
| Restoration runbook |
#10129 bundle layout |
Documents how to locate the latest known-good bundle and restore MC + KB + graph + concepts + trajectories consistently. |
If a subsystem is absent in a fresh environment, the runbook distinguishes optional-missing from failed-restore. |
learn/agentos/ or the nearest operator-facing deployment doc. |
Manual evidence or script-level smoke test using a small fixture backup. |
Acceptance Criteria
Out of Scope
- Changing the atomic backup bundle format established by #10129.
- Cloud upload, off-machine replication, or encrypted remote storage.
- Implementing destructive restore execution as an MCP tool. This ticket requires a runbook; restore tooling can be a follow-up.
- Defrag pre-nuke physical-copy snapshots.
defragChromaDB.mjs remains a peer script with a separate purpose.
- Reactive postmortem framing. This is preventive substrate hardening.
Avoided Traps
- Reimplementing backup in the scheduler: the scheduler should orchestrate
backup.mjs, not create a second backup code path.
- Retention without observability: deleting old bundles is only safe if operators can also see which backup last succeeded.
- Healthcheck as a hard dependency: backup freshness should be visible and actionable; it should not make unrelated read-only health dimensions fail by default.
Related
- #10129 — atomic timestamped backup bundle across persistent subsystems
- #10780 — backup-first operational discipline before DreamMode/Sandman invocation
- #10822 — config substrate cleanup epic context
Origin Session ID: ab6b2fad-6660-4148-a0c7-474c0131a6bf
Retrieval Hint: query_raw_memories(query="daily automated backup Memory Core snapshot lastSuccessful retention restoration runbook")
Context
Neo already has a canonical local backup primitive:
npm run ai:backupinvokesbuildScripts/ai/backup.mjs, which writes an atomic bundle under.neo-ai-data/backups/backup-<ISO-timestamp>/. The existing script captures the Knowledge Base, Memory Core memories and summaries, the Memory Core graph, Concept Ontology JSONL, and RLAIF trajectories in one timestamped tree.That covers the manual snapshot substrate. The remaining operational gap is automation: operators and agents can run the backup script, but the system does not yet guarantee a recent scheduled snapshot, expose the latest successful backup in healthcheck output, or document a restoration runbook for routine operations.
This ticket is preventive operational substrate / MX hardening. It turns the existing backup primitive into an observable daily safety loop without changing the backup bundle format.
The Problem
Manual backup discipline does not scale cleanly across a multi-agent runtime. A backup script can be correct and discoverable while still depending on human recall at exactly the moments when agents are focused on graph-heavy work, scheduled daemons, migrations, or deployment changes.
The current substrate also leaves three follow-up questions for operators:
Without a queryable
backup.lastSuccessfulsignal, agents cannot cheaply validate backup freshness before starting work that depends on Memory Core or Knowledge Base persistence.The Architectural Reality
package.json:18exposesnpm run ai:backup.buildScripts/ai/backup.mjs:16-24documents the atomic bundle shape forkb/,mc/,graph/,concepts/, andtrajectories.buildScripts/ai/backup.mjs:35-45andbuildScripts/ai/backup.mjs:136-139explicitly defer retention implementation.ai/mcp/server/memory-core/config.mjs:232defines.neo-ai-data/backupsas the Memory Core backup/export directory.ai/mcp/server/memory-core/services/HealthService.mjs:671-703builds the Memory Core healthcheck payload, but there is currently no backup freshness block.ai/mcp/server/knowledge-base/services/HealthService.mjs:181-198builds the Knowledge Base healthcheck payload, likewise without backup freshness observability.The Fix
Implement a daily automated snapshot pipeline that invokes the existing canonical backup script rather than duplicating backup logic.
The implementation may use either a scheduled GitHub Actions workflow or a daemon-side scheduled task, but it must preserve the existing substrate boundary:
buildScripts/ai/backup.mjsremains the canonical snapshot orchestrator.backup.lastSuccessful, so agents can verify freshness without scraping filesystem state.Contract Ledger Matrix
npm run ai:backuppath once per day and records status metadata for the attempt..neo-ai-data/backups/backup-<ISO-timestamp>/retentionbuildScripts/ai/backup.mjsretention TODOhealthcheck.backup.lastSuccessfulnever_runor equivalent without failing unrelated health dimensions.learn/agentos/or the nearest operator-facing deployment doc.Acceptance Criteria
npm run ai:backup/buildScripts/ai/backup.mjspath.lastAttempted,lastSuccessful, and failure detail for the latest failed attempt..neo-ai-data/backups/backup-*bundles with a documented default policy.backup.lastSuccessfulfield and a machine-readable backup status.buildScripts/ai/backup.mjs.Out of Scope
defragChromaDB.mjsremains a peer script with a separate purpose.Avoided Traps
backup.mjs, not create a second backup code path.Related
Origin Session ID: ab6b2fad-6660-4148-a0c7-474c0131a6bf
Retrieval Hint:
query_raw_memories(query="daily automated backup Memory Core snapshot lastSuccessful retention restoration runbook")