LearnNewsExamplesServices
Frontmatter
id10787
titleAdd daemon-only entrypoint to swarm-heartbeat.sh (split heartbeat from agent-CLI launcher)
stateClosed
labels
bugenhancementaiarchitecture
assignees[]
createdAtMay 5, 2026, 10:31 PM
updatedAtMay 5, 2026, 11:08 PM
githubUrlhttps://github.com/neomjs/neo/issues/10787
authorneo-opus-4-7
commentsCount1
parentIssue10671
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 5, 2026, 11:08 PM

Add daemon-only entrypoint to swarm-heartbeat.sh (split heartbeat from agent-CLI launcher)

Closedbugenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 5, 2026, 10:31 PM

Context

Surfaced 2026-05-05 by @neo-gpt during architectural review of PR #10782 (persistent-process management for swarm-heartbeat). PR #10782 was reshaped because the plist template targeted swarm-heartbeat.sh directly — but the script is not a pure daemon. With no args, it sets AGENT_CMD="claude" and executes the Claude CLI as part of its wrapper loop.

Result: a launchd plist invoking bash swarm-heartbeat.sh would try to launch Claude CLI under launchd, not run a pure heartbeat daemon. Wrong-shape executable target.

The Problem

ai/scripts/swarm-heartbeat.sh is dual-purpose by current shape:

  1. Heartbeat-pulse subprocess (heartbeat_pulse & at top) — the actual daemon-shape work: polls SQLite for unread messages, consults sunset detector, dispatches recovery via resumeHarness.mjs when warranted, consults wakeSafetyGate.mjs before high-authority actions
  2. Wrapper loop — manages an agent-CLI subprocess (AGENT_CMD="claude" default with no args; the script's main loop runs the agent CLI, with the heartbeat as a backgrounded peer)

This makes the script convenient for a developer running it interactively (one command starts both the agent CLI and the heartbeat) but NOT suitable as a launchd daemon target — launchd would inherit the wrapper's AGENT_CMD execution and attempt to launch the agent CLI in a non-interactive launchd context.

The persistent-process management substrate (PR #10782 / #10781 lane) needs a daemon-only entrypoint that:

  • Runs only the heartbeat-pulse poll loop
  • Does NOT execute the agent-CLI wrapper
  • Is suitable for launchd / systemd / equivalent process-manager invocation

The Architectural Reality

  • ai/scripts/swarm-heartbeat.sh — current dual-purpose script (heartbeat + agent-CLI wrapper)
  • ai/scripts/heartbeat_pulse (subprocess inside the script) — the actual daemon-shape work
  • ai/scripts/checkSunsetted.mjs, resumeHarness.mjs, wakeSafetyGate.mjs, sweepExpiredTasks.mjs — consumers chain
  • bridge-daemon.mjs — the Shape C delivery daemon (separate scope; runs persistently as PID 22447); does NOT need this entrypoint since it has its own startup mechanism
  • ADR 0002 (learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md) — codifies the bridge-daemon-vs-swarm-heartbeat split as intentional
  • session-sunset-workflow.md line 11 — explicit current contract: swarm-heartbeat.sh → checkSunsetted.mjs → resumeHarness.mjs is the recovery chain
  • idleOutNudge.mjs — split-of-concerns reference: heartbeat detects, bridge delivers

The Fix

Two equivalent shapes per @neo-gpt's recommendation:

Option A (smallest delta): add explicit --daemon-only (or --no-agent-cli) mode flag to swarm-heartbeat.sh. When the flag is set, skip the agent-CLI wrapper logic and run ONLY the heartbeat-pulse + recovery-dispatch poll loop. Plist + operator-doc point at bash swarm-heartbeat.sh --daemon-only.

Option B (cleaner separation): extract heartbeat-pulse + recovery-dispatch logic into a new file ai/scripts/swarm-heartbeat-daemon.sh (or swarm-heartbeat-daemon.mjs for the Node-orchestrator variant). The original swarm-heartbeat.sh stays as the developer-convenience interactive wrapper; the new file is the daemon-only target.

Recommendation: Option B (cleaner separation) — explicit two-file shape removes the dual-purpose ambiguity at the substrate level. Option A keeps backward compatibility but the dual-purpose design is the root of this issue; making it explicit-via-naming is the more durable fix. Sibling files (checkSunsetted.mjs, resumeHarness.mjs, etc.) are already separate-purpose; adding swarm-heartbeat-daemon.sh as a sibling matches the pattern.

If Option B is taken: the daemon entrypoint can also use a Node orchestrator (.mjs) instead of bash — gains type-safety + cross-platform parity — but that's scope-extension; v1 can match the existing bash style for minimal-delta.

Acceptance Criteria

  • (AC1) New daemon-only entrypoint exists (per Option A --daemon-only flag OR Option B separate file)
  • (AC2) Daemon-only entrypoint runs heartbeat-pulse + recovery-dispatch poll loop without launching the agent CLI
  • (AC3) Existing developer-convenience interactive shape (bash swarm-heartbeat.sh) preserved unchanged for backward compatibility
  • (AC4) Unit tests cover the daemon-only path: poll loop fires correctly without agent-CLI side effects
  • (AC5) Plist template (PR #10782) updated to point at the new entrypoint
  • (AC6) Operator-doc (PersistentProcessManagement.md) §3 install procedure updated to reference daemon-only entrypoint; timeout 10 bash ai/scripts/swarm-heartbeat.sh manual-test command replaced with daemon-only equivalent that doesn't depend on timeout (not available on macOS by default)

Out of Scope

  • bridge-daemon vs swarm-heartbeat consolidation (ADR 0002 names this as future direction; this ticket preserves current intentional split)
  • Linux systemd .service template (#10781 AC3 — out-of-scope-for-v1)
  • Test isolation bug in harnessLifecycle.spec.mjs (#10786 — separate ticket)

Avoided Traps

  • Auto-collapsing into bridge-daemon: ADR 0002 names the split as intentional; current contract assigns sunset/recovery to swarm-heartbeat path and active wake delivery to bridge-daemon. Consolidation is a separate architectural decision, not this ticket's scope.
  • Modifying swarm-heartbeat's existing developer-interactive shape: that's a separate concern; #10781's persistent-process need is the daemon-only path
  • Drafting plist text without reading the script's full body: the verify-before-assert lapse that produced PR #10782's wrong-shape framing. Future plist work MUST trace the script's full execution path before drafting.

Related

  • Blocking PR: #10782 (persistent-process management for swarm-heartbeat) — converted to draft pending this fix
  • Parent epic: #10671 (substrate-restart recovery)
  • Sibling test-isolation bug: #10786 (harnessLifecycle.spec.mjs shared identity)
  • ADR: learn/agentos/decisions/0002-phase3-wake-substrate-standards-alignment.md — bridge-daemon-vs-swarm-heartbeat split
  • Empirical anchor: @neo-gpt's architectural review 2026-05-05 (relayed via A2A MESSAGE:2a5f5f3d-4183-428a-b3ac-84a8faadef06)

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

Retrieval Hint: query_raw_memories(query="swarm-heartbeat daemon-only entrypoint dual-purpose agent-CLI launcher launchd plist 10782 10671")