Context
During the rotation to the lead position, the human commander recommended optimizing the auto-wakeup logic. The heartbeat concurrency lock (HEARTBEAT_LOCK_TTL_SECONDS and DEFAULT_STALE_LOCK_MS) was previously configured with a 30-minute time-to-live (TTL). This delay is too conservative for our current fast-paced swarm operations.
The Problem
If a session crashes while holding the heartbeat lock, the swarm heartbeat is suppressed for a full 30 minutes, preventing the auto-wakeup logic from triggering the next task. Given the speed of the agents, this 30-minute stale lock window causes unnecessary idle time.
The Architectural Reality
The heartbeat lock mechanism is implemented in two places:
ai/scripts/heartbeatLock.mjs (DEFAULT_STALE_LOCK_MS)
ai/scripts/swarm-heartbeat.sh (HEARTBEAT_LOCK_TTL_SECONDS)
The Fix
Reduce the TTL from 30 minutes to 10 minutes in both the JS node wrapper and the shell script consumer to ensure a faster recovery of the auto-wakeup logic.
Acceptance Criteria
DEFAULT_STALE_LOCK_MS is reduced from 30 * 60 * 1000 to 10 * 60 * 1000.
HEARTBEAT_LOCK_TTL_SECONDS is reduced from 1800 to 600.
Out of Scope
Any changes to the POLL_INTERVAL or bridge-daemon.mjs coalescing window.
Origin Session ID: c572ef58-93ac-4f71-9f32-5759fb8698ba
Context
During the rotation to the lead position, the human commander recommended optimizing the auto-wakeup logic. The heartbeat concurrency lock (
HEARTBEAT_LOCK_TTL_SECONDSandDEFAULT_STALE_LOCK_MS) was previously configured with a 30-minute time-to-live (TTL). This delay is too conservative for our current fast-paced swarm operations.The Problem
If a session crashes while holding the heartbeat lock, the swarm heartbeat is suppressed for a full 30 minutes, preventing the auto-wakeup logic from triggering the next task. Given the speed of the agents, this 30-minute stale lock window causes unnecessary idle time.
The Architectural Reality
The heartbeat lock mechanism is implemented in two places:
ai/scripts/heartbeatLock.mjs(DEFAULT_STALE_LOCK_MS)ai/scripts/swarm-heartbeat.sh(HEARTBEAT_LOCK_TTL_SECONDS)The Fix
Reduce the TTL from 30 minutes to 10 minutes in both the JS node wrapper and the shell script consumer to ensure a faster recovery of the auto-wakeup logic.
Acceptance Criteria
DEFAULT_STALE_LOCK_MSis reduced from30 * 60 * 1000to10 * 60 * 1000.HEARTBEAT_LOCK_TTL_SECONDSis reduced from1800to600.Out of Scope
Any changes to the
POLL_INTERVALorbridge-daemon.mjscoalescing window.Origin Session ID: c572ef58-93ac-4f71-9f32-5759fb8698ba