Sub of Epic #10671 (substrate-restart recovery). Replaces #10781 + #10787 + closed PR #10782 (all closed not-planned for wrong-shape architectural pattern).
Per @tobiu architectural critique 2026-05-05: persistent-process management for the heartbeat daemon was attempted in PR #10782 as a bash-shape script (ai/scripts/swarm-heartbeat-daemon.sh) + launchd plist. This introduced architectural debt — ai/daemons/ already houses Neo-singleton .mjs services (DreamService.mjs + 10+ services in ai/daemons/services/) following the canonical Neo-class pattern (Base inheritance, singleton: true config, services.mjs accessor). Adding a bash daemon to ai/scripts/ violated the established architectural pattern.
Plus: Gemini self-reviewed her own commits on PR #10782 (pushed code via 2a16628aa + 0e663ed61, then issued APPROVED via comment on her own work). This structurally bypassed cross-family review. Fresh ticket + fresh PR allows GPT (only model that hasn't authored this substrate) to perform clean cross-family review.
The Problem
swarm-heartbeat.sh (existing dual-purpose bash script) has a heartbeat-pulse loop that consumes the recovery substrate (checkSunsetted.mjs / resumeHarness.mjs / wakeSafetyGate.mjs / sweepExpiredTasks.mjs) via subprocess invocation. The script is functionally complete (50 unit tests pass on origin/dev) but has no persistent-process management — it runs only when an operator manually invokes it interactively. Operator-sleep / system-reboot / terminal-close kills the daemon → no autonomous heartbeat-driven recovery → night-shift unreachable.
PR #10782 attempted to fix this via bash-shape extraction + launchd plist. Architectural debt: bash daemon in ai/scripts/ when canonical pattern is Neo-singleton .mjs in ai/daemons/.
Polls every POLL_INTERVAL (default 5min, env-overridable)
Concurrency-lock semantics preserved (file-based or in-class state)
All existing logic from swarm-heartbeat.sh#heartbeat_pulse ported faithfully
Persistent-process management:
launchd plist ProgramArguments: ["/usr/bin/env", "node", "[REPO]/ai/daemons/SwarmHeartbeatService.mjs"] (or via npm script: npm run swarm-heartbeat:daemon)
The .mjs file would have a top-level entrypoint: if (import.meta.url === file://${process.argv[1]}) { SwarmHeartbeatService.start(); } (or similar self-invoke pattern)
Documentation:
learn/agentos/wake-substrate/PersistentProcessManagement.md — operator-doc (rewritten from PR #10782 work; carries forward the install/verify/uninstall procedures + troubleshooting + Linux systemd sketch)
Cross-refs to DreamPipeline.md + WakeSubstrateIncidentProtocol.md (narrowed correctly per the cross-ref scope corrections from PR #10782 reshape iteration)
(AC3) Imports recovery substrate via .mjs module imports (not bash subprocess); subprocess invocations of node-side scripts replaced with direct class/function calls where feasible
(AC4) Self-invoke entrypoint pattern at file bottom (suitable for node ai/daemons/SwarmHeartbeatService.mjs direct invocation under launchd)
(AC7) launchd plist template at learn/agentos/wake-substrate/com.neomjs.swarm-heartbeat.plist.template with ProgramArguments targeting node + the .mjs entrypoint
(AC8) Operator-doc learn/agentos/wake-substrate/PersistentProcessManagement.md written from scratch (or salvaged from PR #10782 close-out comment if useful) with corrected cross-ref scopes (heartbeat = recovery wakes, NOT DreamMode/Sandman trigger; AGENTS_STARTUP scope conditional on wake-substrate diagnostics)
(AC9)Cross-family review by @neo-gpt only — Gemini pre-empted via direct commits on PR #10782; cross-family-review independence requires GPT (only model that hasn't authored this substrate)
(AC10) Original swarm-heartbeat.sh left untouched — preserves developer-interactive shape; this ticket only adds the Neo-singleton daemon, doesn't refactor the existing bash script
Out of Scope
Refactoring swarm-heartbeat.sh developer-interactive shape (preserved as-is for backward compatibility)
Operator-side install (launchctl bootstrap) — destructive write; operator-territory by design
wakeSafetyGate untrip — separate #10671 epic-finish step
End-to-end sunsetted-harness E2E validation — separate #10671 epic-finish milestone
Linux systemd .service template with empirical validation — out-of-scope-for-v1; sketched in operator-doc
Migrating other ai/scripts/*.sh files to Neo-class shape — sibling concern; this ticket only addresses heartbeat daemon
Avoided Traps
Re-introducing bash daemon in ai/scripts/: PR #10782 close-out reasoning. Architectural pattern is Neo-singleton .mjs in ai/daemons/.
Subprocess invocation of recovery scripts: when the recovery substrate is already .mjs modules, bash subprocess invocation adds shell-overhead + brittleness. Direct module imports are cleaner.
Self-review by author: AC9 explicitly requires GPT cross-family review. Gemini pre-empted via PR #10782 direct commits; her review on this fresh substrate would be third-party-attribution-confused.
Re-using PR #10782 substrate without architectural-coherence: the bash-shape deliverables in that PR are wrong-pattern. Carry forward the conceptual content (gotchas, install procedure, troubleshooting) but rewrite the substrate.
Context
Sub of Epic #10671 (substrate-restart recovery). Replaces #10781 + #10787 + closed PR #10782 (all closed not-planned for wrong-shape architectural pattern).
Per @tobiu architectural critique 2026-05-05: persistent-process management for the heartbeat daemon was attempted in PR #10782 as a bash-shape script (
ai/scripts/swarm-heartbeat-daemon.sh) + launchd plist. This introduced architectural debt —ai/daemons/already houses Neo-singleton .mjs services (DreamService.mjs+ 10+ services inai/daemons/services/) following the canonical Neo-class pattern (Baseinheritance,singleton: trueconfig, services.mjs accessor). Adding a bash daemon toai/scripts/violated the established architectural pattern.Plus: Gemini self-reviewed her own commits on PR #10782 (pushed code via
2a16628aa+0e663ed61, then issued APPROVED via comment on her own work). This structurally bypassed cross-family review. Fresh ticket + fresh PR allows GPT (only model that hasn't authored this substrate) to perform clean cross-family review.The Problem
swarm-heartbeat.sh(existing dual-purpose bash script) has a heartbeat-pulse loop that consumes the recovery substrate (checkSunsetted.mjs/resumeHarness.mjs/wakeSafetyGate.mjs/sweepExpiredTasks.mjs) via subprocess invocation. The script is functionally complete (50 unit tests pass onorigin/dev) but has no persistent-process management — it runs only when an operator manually invokes it interactively. Operator-sleep / system-reboot / terminal-close kills the daemon → no autonomous heartbeat-driven recovery → night-shift unreachable.PR #10782 attempted to fix this via bash-shape extraction + launchd plist. Architectural debt: bash daemon in
ai/scripts/when canonical pattern is Neo-singleton .mjs inai/daemons/.The Architectural Reality
Existing canonical pattern (
ai/daemons/DreamService.mjs:1-30):Base from '../../src/core/Base.mjs''../services.mjs'accessor (Memory_Config,Memory_StorageRouter,Memory_TextEmbeddingService,Memory_GraphService)static config = { singleton: true, ... }shapeai/daemons/services/(10+ existing services follow same pattern)Sibling daemons in ai/daemons/services/ (all .mjs Neo-class shape):
ConceptDiscoveryService.mjs,ConceptIngestor.mjs,GapInferenceEngine.mjs,GoldenPathSynthesizer.mjs,GraphMaintenanceService.mjs,IssueIngestor.mjs,LazyEdgeDrainer.mjs,MemorySessionIngestor.mjs,SemanticGraphExtractor.mjs,TopologyInferenceEngine.mjsRequired substrate consumers (existing .mjs modules):
ai/scripts/checkSunsetted.mjs— predicate consumerai/scripts/resumeHarness.mjs— recovery dispatcherai/scripts/wakeSafetyGate.mjs— fail-closed safety gateai/scripts/sweepExpiredTasks.mjs— task cleanupThese can be imported directly into the new SwarmHeartbeatService rather than invoked via subprocess shell.
The Fix
ai/daemons/SwarmHeartbeatService.mjs— Neo-singleton class following DreamService.mjs precedent:import Base from '../../src/core/Base.mjs'import { Memory_Config as aiConfig, Memory_GraphService as GraphService } from '../services.mjs'(whatever services.mjs exposes for SQLite + mailbox)import { readGateState, hasOverride } from '../scripts/wakeSafetyGate.mjs'checkSunsetted/resumeHarness/sweepExpiredTasksmodules (instead of bash subprocess)static config = { className: 'Neo.ai.daemons.SwarmHeartbeatService', singleton: true }POLL_INTERVAL(default 5min, env-overridable)swarm-heartbeat.sh#heartbeat_pulseported faithfullyPersistent-process management:
ProgramArguments:["/usr/bin/env", "node", "[REPO]/ai/daemons/SwarmHeartbeatService.mjs"](or via npm script:npm run swarm-heartbeat:daemon)if (import.meta.url ===file://${process.argv[1]}) { SwarmHeartbeatService.start(); }(or similar self-invoke pattern)Documentation:
learn/agentos/wake-substrate/PersistentProcessManagement.md— operator-doc (rewritten from PR #10782 work; carries forward the install/verify/uninstall procedures + troubleshooting + Linux systemd sketch)DreamPipeline.md+WakeSubstrateIncidentProtocol.md(narrowed correctly per the cross-ref scope corrections from PR #10782 reshape iteration)Acceptance Criteria
ai/daemons/SwarmHeartbeatService.mjsexists; follows Neo-singleton pattern matchingDreamService.mjsprecedentBase;static config = { className: ..., singleton: true }.mjsmodule imports (not bash subprocess); subprocess invocations of node-side scripts replaced with direct class/function calls where feasiblenode ai/daemons/SwarmHeartbeatService.mjsdirect invocation under launchd)swarm-heartbeat.sh#heartbeat_pulse: poll loop, concurrency lock, sunset-detection-via-checkSunsetted, recovery-dispatch-via-resumeHarness, wakeSafetyGate consultation, sweep-expired-tasks invocationtest/playwright/unit/ai/daemons/SwarmHeartbeatService.spec.mjscovering: poll loop fires, sunset detection invokes recovery, gate-tripped blocks high-authority dispatch, concurrency lock prevents overlapping pulseslearn/agentos/wake-substrate/com.neomjs.swarm-heartbeat.plist.templatewithProgramArgumentstargetingnode+ the .mjs entrypointlearn/agentos/wake-substrate/PersistentProcessManagement.mdwritten from scratch (or salvaged from PR #10782 close-out comment if useful) with corrected cross-ref scopes (heartbeat = recovery wakes, NOT DreamMode/Sandman trigger; AGENTS_STARTUP scope conditional on wake-substrate diagnostics)swarm-heartbeat.shleft untouched — preserves developer-interactive shape; this ticket only adds the Neo-singleton daemon, doesn't refactor the existing bash scriptOut of Scope
swarm-heartbeat.shdeveloper-interactive shape (preserved as-is for backward compatibility)launchctl bootstrap) — destructive write; operator-territory by designwakeSafetyGateuntrip — separate #10671 epic-finish step.servicetemplate with empirical validation — out-of-scope-for-v1; sketched in operator-docAvoided Traps
Related
Audit core.Base + Neo.mjs), #10108 (me=this policy)ai/daemons/DreamService.mjsfeedback_post_merge_discoverability_via_graph.md,feedback_operator_context_is_data_not_framing.md,feedback_workflow_discipline_over_velocity.mdOrigin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Retrieval Hint:
query_raw_memories(query="SwarmHeartbeatService Neo-singleton ai/daemons replace bash 10781 10787 10782 architectural-debt-cleanup")