Context
Operator observation on 2026-05-23 after PR #11772 merged: dev was pulled and the Orchestrator was restarted, but no wakeup messages have been observed for @neo-opus-4-7 or @neo-gpt.
This directly targets the post-merge validation residual from PR #11772 / issue #11766. PR #11772 folded the standalone swarm-heartbeat daemon into the Orchestrator as a local-only scheduled lane and explicitly left L4 runtime validation open:
- local maintainer checkout: confirm a running Orchestrator drives heartbeat pulses (
.neo-ai-data/wake-daemon/heartbeat.alive mtime advancing; heartbeat log/activity);
- cloud profile: confirm the lane stays off.
The operator now supplied negative runtime evidence for the local profile: after restart, no visible wake delivery has happened.
The Problem
The merge replaced the old standalone heartbeat process with an Orchestrator-owned lane. Unit coverage verified the shape, but the live end-to-end path is still unproven and may be silently failing.
The dangerous failure mode is not just "no wake". It is false confidence: the Orchestrator can be running, and individual unit tests can pass, while the folded heartbeat lane either never becomes due, never emits, exits early, hits the wake-safety gate, or sends output to the wrong/obsolete harness surface.
Architectural Reality
Verified surfaces:
- PR #11772 merged at
2026-05-22T17:28:26Z and resolves #11766.
ai/daemons/Orchestrator.mjs initializes the heartbeat lane when swarmHeartbeatEnabled is true, then drives SWARM_HEARTBEAT_TASK_NAME through cadenceEngine.runIfDue and records task outcomes.
ai/scripts/orchestrator-daemon.mjs resolves NEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MS and NEO_ORCHESTRATOR_SWARM_HEARTBEAT_ENABLED from env/config.
ai/daemons/SwarmHeartbeatService.mjs#pulse() touches heartbeat.alive, checks locks, sweeps expired tasks, runs sunset / idle-out / all-agent-idle checks, then bypasses push-capable identities before the old token-economy tmux pulse path.
Important implication: an operator-visible wake can fail for several distinct reasons. The verification must isolate which stage is failing, not just assert "heartbeat broken".
Duplicate Sweep
- KB ticket search surfaced #10396 / #10399 / #10312 as older generic heartbeat / swarm-stall substrate tickets, and #10931 as a stale-threshold observability bug. None is the post-#11772 folded-Orchestrator live validation gap.
- GitHub search for open
SwarmHeartbeatService / Orchestrator / wake issues surfaced #11075, which is config constants cleanup and not a live wake-delivery verification issue.
- #11766 is the source implementation ticket and is closed; its L4 post-merge validation residual needs this follow-up ticket because the operator now reports negative runtime evidence.
The Fix
Build a focused verification + diagnosis pass for the folded heartbeat lane:
- Confirm the Orchestrator actually schedules
swarm-heartbeat after restart.
- Confirm
.neo-ai-data/wake-daemon/heartbeat.alive advances on each due pulse.
- Confirm
TaskStateService / HealthService exposes heartbeat task outcomes (running, completed, failed, lastReason) for the folded lane.
- Force a short-interval local run using
NEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MS and explicit NEO_ORCHESTRATOR_SWARM_HEARTBEAT_ENABLED=true, then capture whether the lane reaches each pulse() branch.
- Validate actual wake delivery to the intended active harness identities (
@neo-opus-4-7, @neo-gpt) or prove why delivery is correctly skipped.
- Add the smallest durable diagnostic or test/runbook needed so future agents can verify this without relying on operator observation alone.
Acceptance Criteria
Out of Scope
- Rewriting the heartbeat architecture again before the live failure stage is isolated.
- Reopening the retired standalone
swarm-heartbeat.sh / launchd path as the primary fix.
- Changing unrelated Orchestrator heavy-maintenance lanes (KB sync, dream, golden path, backup).
- Solving #11075 config-constant cleanup.
Avoided Traps
| Trap |
Why rejected |
| Treating "Orchestrator process is running" as proof |
The folded heartbeat lane can be disabled, not due, early-returning, or failing after Orchestrator start. |
| Treating absence of visible wake as one bug |
The pulse pipeline has multiple intentional skips and safety gates; diagnostics must name the exact stage. |
| Reverting to the old daemon path immediately |
#11766 deliberately retired that tech debt. The first step is to validate the folded lane and identify any narrow regression. |
| Only checking Chroma / KB health |
This is a local wake-substrate pipeline; Chroma health can be unrelated to whether heartbeat delivery reaches harnesses. |
Related
- #11766 — source issue, closed by PR #11772.
- PR #11772 — folded
swarm-heartbeat into the Orchestrator and left L4 post-merge validation open.
- #11730 — v13 post-MVP residual workstreams parent.
- #10601 / #10399 — older auto-wakeup / swarm-stall substrate history.
- #10931 — prior wake liveness observability threshold issue; adjacent but not duplicate.
Origin Session ID: d60db68f-8ff0-48a6-b168-237ca9dca2a0
Handoff Retrieval Hint: query_raw_memories("orchestrator swarm-heartbeat folded lane no wake after restart #11772 #11766")
Context
Operator observation on 2026-05-23 after PR #11772 merged:
devwas pulled and the Orchestrator was restarted, but no wakeup messages have been observed for@neo-opus-4-7or@neo-gpt.This directly targets the post-merge validation residual from PR #11772 / issue #11766. PR #11772 folded the standalone
swarm-heartbeatdaemon into the Orchestrator as a local-only scheduled lane and explicitly left L4 runtime validation open:.neo-ai-data/wake-daemon/heartbeat.alivemtime advancing; heartbeat log/activity);The operator now supplied negative runtime evidence for the local profile: after restart, no visible wake delivery has happened.
The Problem
The merge replaced the old standalone heartbeat process with an Orchestrator-owned lane. Unit coverage verified the shape, but the live end-to-end path is still unproven and may be silently failing.
The dangerous failure mode is not just "no wake". It is false confidence: the Orchestrator can be running, and individual unit tests can pass, while the folded heartbeat lane either never becomes due, never emits, exits early, hits the wake-safety gate, or sends output to the wrong/obsolete harness surface.
Architectural Reality
Verified surfaces:
2026-05-22T17:28:26Zand resolves #11766.ai/daemons/Orchestrator.mjsinitializes the heartbeat lane whenswarmHeartbeatEnabledis true, then drivesSWARM_HEARTBEAT_TASK_NAMEthroughcadenceEngine.runIfDueand records task outcomes.ai/scripts/orchestrator-daemon.mjsresolvesNEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MSandNEO_ORCHESTRATOR_SWARM_HEARTBEAT_ENABLEDfrom env/config.ai/daemons/SwarmHeartbeatService.mjs#pulse()touchesheartbeat.alive, checks locks, sweeps expired tasks, runs sunset / idle-out / all-agent-idle checks, then bypasses push-capable identities before the old token-economy tmux pulse path.Important implication: an operator-visible wake can fail for several distinct reasons. The verification must isolate which stage is failing, not just assert "heartbeat broken".
Duplicate Sweep
SwarmHeartbeatService/Orchestrator/wakeissues surfaced #11075, which is config constants cleanup and not a live wake-delivery verification issue.The Fix
Build a focused verification + diagnosis pass for the folded heartbeat lane:
swarm-heartbeatafter restart..neo-ai-data/wake-daemon/heartbeat.aliveadvances on each due pulse.TaskStateService/ HealthService exposes heartbeat task outcomes (running,completed,failed,lastReason) for the folded lane.NEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MSand explicitNEO_ORCHESTRATOR_SWARM_HEARTBEAT_ENABLED=true, then capture whether the lane reaches eachpulse()branch.@neo-opus-4-7,@neo-gpt) or prove why delivery is correctly skipped.Acceptance Criteria
swarm-heartbeatis enabled, due, and invoked after restart.heartbeat.alivemtime advancement is measured across at least two due pulses.swarm-heartbeatis inspected and records success/failure reason rather than staying silent.NEO_AI_DEPLOYMENT_MODE=cloudleaves the local-only heartbeat lane off by default.Out of Scope
swarm-heartbeat.sh/ launchd path as the primary fix.Avoided Traps
Related
swarm-heartbeatinto the Orchestrator and left L4 post-merge validation open.Origin Session ID: d60db68f-8ff0-48a6-b168-237ca9dca2a0
Handoff Retrieval Hint:
query_raw_memories("orchestrator swarm-heartbeat folded lane no wake after restart #11772 #11766")