LearnNewsExamplesServices
Frontmatter
id10674
titleIn-flight boot/restart lock primitive (identity-uniqueness mutex)
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-4-7
createdAtMay 4, 2026, 10:49 AM
updatedAtMay 9, 2026, 11:23 PM
githubUrlhttps://github.com/neomjs/neo/issues/10674
authorneo-opus-4-7
commentsCount0
parentIssue10671
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 4, 2026, 1:49 PM

In-flight boot/restart lock primitive (identity-uniqueness mutex)

Closedenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 4, 2026, 10:49 AM

Context

Sub-issue of #10671. The OS process boundary doesn't enforce identity uniqueness AT THE DATA LAYER — multiple harness instances can race on shared SQLite + mailbox during boot ramp. This lock IS the data-layer mutex.

The Problem

Current 600s cooldown in resumeHarness.mjs:79-86 is time-based, not in-flight-aware. Boot ramp easily exceeds 600s. Result: parallel sessions race on inbox / mark_read / sessionId.

Same primitive needed for the idle-out path to prevent A2A spam during active long-turn / rate-limit windows (per @neo-gpt's substrate-truth audit).

The Architectural Reality

Window from "lock-acquire" to "first add_memory observed" is the protected boot ramp. Lock files live alongside cooldown files in .neo-ai-data/wake-daemon/.

The Fix

  • resumeHarness.mjs writes inflight-{mode}-{identity}.txt with {timestamp, lockId} BEFORE invoking action
  • mode ∈ {sunset_restart, idle_out_nudge}
  • Next heartbeat fire skips action if lock exists AND no fresh AGENT_MEMORY post-lock-timestamp for that identity
  • First add_memory clears lock implicitly (via checkSunsetted detection) OR explicitly via lock-clear-on-detect
  • Lock age > BOOT_TIMEOUT (15 min default) without fresh memory = abandoned action; allow next interval to retry
  • After N (default 3) consecutive abandoned actions for any single identity, auto-trip wake safety gate per #10648 contract

Acceptance Criteria

  • Lock file written before sunset_restart AND idle_out_nudge actions
  • checkSunsetted (or its caller) considers lock state before recommending action
  • Spawned/nudged agent's first add_memory observed within BOOT_TIMEOUT clears lock cleanly
  • Lock age > BOOT_TIMEOUT without fresh memory marks action abandoned
  • N consecutive abandoned actions auto-trips wake safety gate
  • Unit + integration test coverage; explicit test for ESC-as-rejection scenario

Out of Scope

  • BOOT_TIMEOUT tuning (start at 15 min; instrument later)
  • N threshold tuning (start at 3; instrument later)

Related

  • Parent: #10671
  • Adjacent: #10626 (existing time-based cooldown — supplemented, not removed), #10648 (wake safety gate auto-trip target)

Origin Session ID: cce1fea5-32ff-410c-b820-2e9a27b3cd51

tobiu closed this issue on May 4, 2026, 1:49 PM
tobiu referenced in commit 5386a06 - "feat(ai): implement substrate restart mutex and inflight lock (#10674) (#10683) on May 4, 2026, 1:49 PM
tobiu referenced in commit 2156026 - "docs(ai): forensic record for 2026-05-03 runaway-spawn pattern (#10672) (#10688) on May 4, 2026, 3:41 PM