Context
Surfaced 2026-05-05 by @neo-gpt during architectural review of PR #10782 (persistent-process management for swarm-heartbeat). PR #10782 was reshaped because the plist template targeted swarm-heartbeat.sh directly — but the script is not a pure daemon. With no args, it sets AGENT_CMD="claude" and executes the Claude CLI as part of its wrapper loop.
Result: a launchd plist invoking bash swarm-heartbeat.sh would try to launch Claude CLI under launchd, not run a pure heartbeat daemon. Wrong-shape executable target.
The Problem
ai/scripts/swarm-heartbeat.sh is dual-purpose by current shape:
- Heartbeat-pulse subprocess (
heartbeat_pulse & at top) — the actual daemon-shape work: polls SQLite for unread messages, consults sunset detector, dispatches recovery via resumeHarness.mjs when warranted, consults wakeSafetyGate.mjs before high-authority actions
- Wrapper loop — manages an agent-CLI subprocess (
AGENT_CMD="claude" default with no args; the script's main loop runs the agent CLI, with the heartbeat as a backgrounded peer)
This makes the script convenient for a developer running it interactively (one command starts both the agent CLI and the heartbeat) but NOT suitable as a launchd daemon target — launchd would inherit the wrapper's AGENT_CMD execution and attempt to launch the agent CLI in a non-interactive launchd context.
The persistent-process management substrate (PR #10782 / #10781 lane) needs a daemon-only entrypoint that:
- Runs only the heartbeat-pulse poll loop
- Does NOT execute the agent-CLI wrapper
- Is suitable for launchd / systemd / equivalent process-manager invocation
The Architectural Reality
ai/scripts/swarm-heartbeat.sh — current dual-purpose script (heartbeat + agent-CLI wrapper)
ai/scripts/heartbeat_pulse (subprocess inside the script) — the actual daemon-shape work
ai/scripts/checkSunsetted.mjs, resumeHarness.mjs, wakeSafetyGate.mjs, sweepExpiredTasks.mjs — consumers chain
bridge-daemon.mjs — the Shape C delivery daemon (separate scope; runs persistently as PID 22447); does NOT need this entrypoint since it has its own startup mechanism
- ADR 0002 (
learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md) — codifies the bridge-daemon-vs-swarm-heartbeat split as intentional
session-sunset-workflow.md line 11 — explicit current contract: swarm-heartbeat.sh → checkSunsetted.mjs → resumeHarness.mjs is the recovery chain
idleOutNudge.mjs — split-of-concerns reference: heartbeat detects, bridge delivers
The Fix
Two equivalent shapes per @neo-gpt's recommendation:
Option A (smallest delta): add explicit --daemon-only (or --no-agent-cli) mode flag to swarm-heartbeat.sh. When the flag is set, skip the agent-CLI wrapper logic and run ONLY the heartbeat-pulse + recovery-dispatch poll loop. Plist + operator-doc point at bash swarm-heartbeat.sh --daemon-only.
Option B (cleaner separation): extract heartbeat-pulse + recovery-dispatch logic into a new file ai/scripts/swarm-heartbeat-daemon.sh (or swarm-heartbeat-daemon.mjs for the Node-orchestrator variant). The original swarm-heartbeat.sh stays as the developer-convenience interactive wrapper; the new file is the daemon-only target.
Recommendation: Option B (cleaner separation) — explicit two-file shape removes the dual-purpose ambiguity at the substrate level. Option A keeps backward compatibility but the dual-purpose design is the root of this issue; making it explicit-via-naming is the more durable fix. Sibling files (checkSunsetted.mjs, resumeHarness.mjs, etc.) are already separate-purpose; adding swarm-heartbeat-daemon.sh as a sibling matches the pattern.
If Option B is taken: the daemon entrypoint can also use a Node orchestrator (.mjs) instead of bash — gains type-safety + cross-platform parity — but that's scope-extension; v1 can match the existing bash style for minimal-delta.
Acceptance Criteria
Out of Scope
- bridge-daemon vs swarm-heartbeat consolidation (ADR 0002 names this as future direction; this ticket preserves current intentional split)
- Linux systemd
.service template (#10781 AC3 — out-of-scope-for-v1)
- Test isolation bug in
harnessLifecycle.spec.mjs (#10786 — separate ticket)
Avoided Traps
- Auto-collapsing into bridge-daemon: ADR 0002 names the split as intentional; current contract assigns sunset/recovery to swarm-heartbeat path and active wake delivery to bridge-daemon. Consolidation is a separate architectural decision, not this ticket's scope.
- Modifying swarm-heartbeat's existing developer-interactive shape: that's a separate concern; #10781's persistent-process need is the daemon-only path
- Drafting plist text without reading the script's full body: the verify-before-assert lapse that produced PR #10782's wrong-shape framing. Future plist work MUST trace the script's full execution path before drafting.
Related
- Blocking PR: #10782 (persistent-process management for swarm-heartbeat) — converted to draft pending this fix
- Parent epic: #10671 (substrate-restart recovery)
- Sibling test-isolation bug: #10786 (
harnessLifecycle.spec.mjs shared identity)
- ADR:
learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md — bridge-daemon-vs-swarm-heartbeat split
- Empirical anchor: @neo-gpt's architectural review 2026-05-05 (relayed via A2A MESSAGE:2a5f5f3d-4183-428a-b3ac-84a8faadef06)
Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Retrieval Hint: query_raw_memories(query="swarm-heartbeat daemon-only entrypoint dual-purpose agent-CLI launcher launchd plist 10782 10671")
Context
Surfaced 2026-05-05 by @neo-gpt during architectural review of PR #10782 (persistent-process management for swarm-heartbeat). PR #10782 was reshaped because the plist template targeted
swarm-heartbeat.shdirectly — but the script is not a pure daemon. With no args, it setsAGENT_CMD="claude"and executes the Claude CLI as part of its wrapper loop.Result: a launchd plist invoking
bash swarm-heartbeat.shwould try to launch Claude CLI under launchd, not run a pure heartbeat daemon. Wrong-shape executable target.The Problem
ai/scripts/swarm-heartbeat.shis dual-purpose by current shape:heartbeat_pulse &at top) — the actual daemon-shape work: polls SQLite for unread messages, consults sunset detector, dispatches recovery viaresumeHarness.mjswhen warranted, consultswakeSafetyGate.mjsbefore high-authority actionsAGENT_CMD="claude"default with no args; the script's main loop runs the agent CLI, with the heartbeat as a backgrounded peer)This makes the script convenient for a developer running it interactively (one command starts both the agent CLI and the heartbeat) but NOT suitable as a launchd daemon target — launchd would inherit the wrapper's
AGENT_CMDexecution and attempt to launch the agent CLI in a non-interactive launchd context.The persistent-process management substrate (PR #10782 / #10781 lane) needs a daemon-only entrypoint that:
The Architectural Reality
ai/scripts/swarm-heartbeat.sh— current dual-purpose script (heartbeat + agent-CLI wrapper)ai/scripts/heartbeat_pulse(subprocess inside the script) — the actual daemon-shape workai/scripts/checkSunsetted.mjs,resumeHarness.mjs,wakeSafetyGate.mjs,sweepExpiredTasks.mjs— consumers chainbridge-daemon.mjs— the Shape C delivery daemon (separate scope; runs persistently as PID 22447); does NOT need this entrypoint since it has its own startup mechanismlearn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md) — codifies the bridge-daemon-vs-swarm-heartbeat split as intentionalsession-sunset-workflow.mdline 11 — explicit current contract:swarm-heartbeat.sh → checkSunsetted.mjs → resumeHarness.mjsis the recovery chainidleOutNudge.mjs— split-of-concerns reference: heartbeat detects, bridge deliversThe Fix
Two equivalent shapes per @neo-gpt's recommendation:
Option A (smallest delta): add explicit
--daemon-only(or--no-agent-cli) mode flag toswarm-heartbeat.sh. When the flag is set, skip the agent-CLI wrapper logic and run ONLY the heartbeat-pulse + recovery-dispatch poll loop. Plist + operator-doc point atbash swarm-heartbeat.sh --daemon-only.Option B (cleaner separation): extract heartbeat-pulse + recovery-dispatch logic into a new file
ai/scripts/swarm-heartbeat-daemon.sh(orswarm-heartbeat-daemon.mjsfor the Node-orchestrator variant). The originalswarm-heartbeat.shstays as the developer-convenience interactive wrapper; the new file is the daemon-only target.Recommendation: Option B (cleaner separation) — explicit two-file shape removes the dual-purpose ambiguity at the substrate level. Option A keeps backward compatibility but the dual-purpose design is the root of this issue; making it explicit-via-naming is the more durable fix. Sibling files (
checkSunsetted.mjs,resumeHarness.mjs, etc.) are already separate-purpose; addingswarm-heartbeat-daemon.shas a sibling matches the pattern.If Option B is taken: the daemon entrypoint can also use a Node orchestrator (
.mjs) instead of bash — gains type-safety + cross-platform parity — but that's scope-extension; v1 can match the existing bash style for minimal-delta.Acceptance Criteria
--daemon-onlyflag OR Option B separate file)bash swarm-heartbeat.sh) preserved unchanged for backward compatibilityPersistentProcessManagement.md) §3 install procedure updated to reference daemon-only entrypoint;timeout 10 bash ai/scripts/swarm-heartbeat.shmanual-test command replaced with daemon-only equivalent that doesn't depend ontimeout(not available on macOS by default)Out of Scope
.servicetemplate (#10781 AC3 — out-of-scope-for-v1)harnessLifecycle.spec.mjs(#10786 — separate ticket)Avoided Traps
Related
harnessLifecycle.spec.mjsshared identity)learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md— bridge-daemon-vs-swarm-heartbeat splitOrigin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7
Retrieval Hint:
query_raw_memories(query="swarm-heartbeat daemon-only entrypoint dual-purpose agent-CLI launcher launchd plist 10782 10671")