LearnNewsExamplesServices
Frontmatter
id10625
titleAll-agent-idle detection at heartbeat layer (substrate primitive for trio liveness)
stateClosed
labels
enhancementaiarchitecture
assigneesneo-gemini-pro
createdAtMay 3, 2026, 12:34 PM
updatedAtJun 7, 2026, 7:23 PM
githubUrlhttps://github.com/neomjs/neo/issues/10625
authorneo-opus-ada
commentsCount0
parentIssue10601
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[x] 10620 Normalize AGENT_MEMORY nodes to use structured timestamp and sessionId columns, [x] 10624 Wake subscription canonicalization: one canonical appName per identity, retire stale duplicate
blocking[x] 10626 Cooldown-bounded idempotent trio wake (binds to all-agent-idle detector contract)
closedAtMay 3, 2026, 1:31 PM

All-agent-idle detection at heartbeat layer (substrate primitive for trio liveness)

Closed Backlog/active-chunk-8 enhancementaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 3, 2026, 12:34 PM

Context

Empirical anchor: 8h43m all-three-agents-idled gap on 2026-05-02 evening → 2026-05-03 morning. Three swarm-heartbeat.sh instances were running per-identity (PIDs documented in conversation transcript), but ZERO pulse output across ~104 expected cycles per logs at .neo-ai-data/wake-daemon/heartbeat-*.log.

The current substrate is reactive: bridge-daemon.mjs delivers wake events on SENT_TO_ME (i.e., when a peer sends a message). When all three agents end their turns with nothing in the mailbox queue, no new wake fires, and the trio goes silent until @tobiu manually prompts one harness.

@tobiu's reframing (per @neo-gpt's MESSAGE:b1dae52c, 2026-05-03): the "postman" framing belongs to the era where @tobiu manually relayed messages. A2A messaging works. The current blocker is heartbeat-driven liveness when all three agents have ended their turns and no new A2A message is emitted.

This is the core substrate primitive that unlocks 24/7 trio collaboration.

The Problem

The heartbeat substrate as-shipped (post-#10623) has TWO orthogonal limitations:

  1. No all-agent-idle detection. Each heartbeat instance only looks at its OWN identity's state in isolation (via checkSunsetted.mjs). It never asks "are ALL configured identities idle?" — which is the precondition for the trio-coordination wake we actually need.

  2. Inject layer is tmux send-keys-only (swarm-heartbeat.sh:172). For non-tmux harnesses (Antigravity, Claude Desktop, Codex Desktop), the inject silently no-ops. With no all-idle detection AND no non-tmux delivery, the heartbeat substrate is currently a no-op for the actual harness mix the swarm uses.

This ticket addresses #1 — the detection-layer gap. The delivery-layer gap is its own concern (#3 cooldown-bounded trio wake builds on this detector's emitted state).

The Architectural Reality

  • swarm-heartbeat.sh:131-147 — Per-identity sunset check via checkSunsetted.mjs, fires resumeHarness.mjs on detected sunset. No cross-identity awareness.
  • checkSunsetted.mjs — Reads single AGENT_MEMORY for one identity, returns sunsetted boolean. Currently regex-extracts timestamp from properties.name (post-#10623 substrate-truth fix); will use structured properties.timestamp once Gemini's #10620 / PR #10621 lands the schema normalization.
  • MailboxService.dispatchSENT_TO_ME — current wake-emission path. Reactive only, not heartbeat-aware.

The substrate shape this ticket introduces:

heartbeat_pulse() {
  for identity in $configured_identities; do
    last_activity[identity] = query_last_AGENT_MEMORY_timestamp(identity)
  done

  if all-of last_activity older than IDLE_THRESHOLD:
    emit AllAgentIdleSignal({
      identities: configured,
      cycle_id: <derived>,
      coordinator_recommendation: <round-robin or earliest-idled>
    })
}

The signal contract (not the cooldown layer) is what this ticket ships. #3 layers cooldown/idempotency on top.

The Fix

Concrete prescription:

  1. ai/scripts/swarm-heartbeat.sh: extend heartbeat_pulse with a new helper get_all_agent_idle_state() that queries AGENT_MEMORY timestamps for ALL identities in the configured trio (or whatever future N), compares against IDLE_THRESHOLD (env var, default ~10 min), and returns structured all-idle signal when triggered.

  2. Detector contract: signal must include cycle_id (deterministic per pulse cycle so #3 can dedupe), identities involved, and coordinator-recommendation. Document the contract in script comments AND in a sibling .mjs if the detector is lifted out.

  3. Observability: signal emission MUST log to stderr with structured prefix (e.g., [heartbeat <ts>] AllAgentIdle detected: ...) so post-mortems can confirm detection vs no-detection from logs alone (avoiding the empirical-evidence gap that produced this ticket).

  4. Test coverage:

    • Fixture-backed: seed an AGENT_MEMORY graph fixture with all-idle timestamps, run get_all_agent_idle_state, assert signal emitted
    • Negative: seed at least one identity with recent activity, assert no signal emitted
    • Boundary: identity with no AGENT_MEMORY rows treated as idle (or as fresh-DB, depending on convergence — document the choice)

Acceptance Criteria

  • swarm-heartbeat.sh queries all configured identities' last-activity timestamps per cycle
  • Detector emits structured all-idle signal when ALL identities exceed IDLE_THRESHOLD
  • Detector contract documented (cycle_id, identities, coordinator-recommendation)
  • Spec test: fixture-backed positive + negative + boundary cases
  • Stderr log line on detection (observability)
  • PR body cites empirical anchor (8h43m idle gap, 0/104 pulse output)

Out of Scope

  • Cooldown / idempotency — owned by sibling ticket #3
  • Wake delivery to non-tmux harnesses — separate concern; this ticket emits the signal, doesn't route it
  • Identity-input normalization (@neo-opus-ada vs neo-opus-ada) — separate hardening lane
  • Coordinator-of-the-cycle policy — this ticket recommends one, doesn't enforce; #3 may layer policy on top

Avoided Traps

  • Per-identity isolation — the existing checkSunsetted.mjs shape is per-identity; treating that as adequate for trio liveness is the failure mode this ticket fixes. The all-idle predicate is INHERENTLY cross-identity.
  • Folding cooldown into detection — substrate-truth-grounded design pattern (per @neo-gpt's MESSAGE:e49dc3c5): #3 cooldown logic should bind to detector's emitted state, not invent its own stale-state interpretation. Detection and cooldown are distinct concerns; folding them couples them prematurely.
  • Regex-based timestamp extraction long-term — the regex fallback shipped in #10623 was the substrate-truth bridge until Gemini's #10620 / PR #10621 lands structured properties.timestamp. This ticket's detector should prefer structured fields when available, fall back to regex only if not.

Related

  • Blocked-by: #10624 (canonical wake routes) — wake-delivery reliability prerequisite
  • Blocked-by: Gemini's #10620 / PR #10621 — structured AGENT_MEMORY for trustworthy timestamp/identity inputs
  • Blocks: sibling ticket for cooldown-bounded trio wake (filed separately) — that builds on this detector's contract
  • Builds-on: #10619 (fresh-session-spawn substrate corrective), #10623 (heartbeat unread-count fix)

Origin Session ID: b1839431-cba1-4b6d-913f-27b09e472e67

Retrieval Hint: query_summaries("heartbeat liveness substrate-stack all-agent-idle detection trio coordination 2026-05-03") + query_raw_memories("all three agents idled overnight 8h43m no pulse output")

tobiu referenced in commit 0acd081 - "feat(ai): implement all-agent-idle detection primitive (#10625) (#10631) on May 3, 2026, 1:31 PM
tobiu closed this issue on May 3, 2026, 1:31 PM