Context
Sub-issue of #10671. The OS process boundary doesn't enforce identity uniqueness AT THE DATA LAYER — multiple harness instances can race on shared SQLite + mailbox during boot ramp. This lock IS the data-layer mutex.
The Problem
Current 600s cooldown in resumeHarness.mjs:79-86 is time-based, not in-flight-aware. Boot ramp easily exceeds 600s. Result: parallel sessions race on inbox / mark_read / sessionId.
Same primitive needed for the idle-out path to prevent A2A spam during active long-turn / rate-limit windows (per @neo-gpt's substrate-truth audit).
The Architectural Reality
Window from "lock-acquire" to "first add_memory observed" is the protected boot ramp. Lock files live alongside cooldown files in .neo-ai-data/wake-daemon/.
The Fix
resumeHarness.mjs writes inflight-{mode}-{identity}.txt with {timestamp, lockId} BEFORE invoking action
mode ∈ {sunset_restart, idle_out_nudge}
- Next heartbeat fire skips action if lock exists AND no fresh
AGENT_MEMORY post-lock-timestamp for that identity
- First
add_memory clears lock implicitly (via checkSunsetted detection) OR explicitly via lock-clear-on-detect
- Lock age >
BOOT_TIMEOUT (15 min default) without fresh memory = abandoned action; allow next interval to retry
- After N (default 3) consecutive abandoned actions for any single identity, auto-trip wake safety gate per #10648 contract
Acceptance Criteria
Out of Scope
BOOT_TIMEOUT tuning (start at 15 min; instrument later)
- N threshold tuning (start at 3; instrument later)
Related
- Parent: #10671
- Adjacent: #10626 (existing time-based cooldown — supplemented, not removed), #10648 (wake safety gate auto-trip target)
Origin Session ID: cce1fea5-32ff-410c-b820-2e9a27b3cd51
Context
Sub-issue of #10671. The OS process boundary doesn't enforce identity uniqueness AT THE DATA LAYER — multiple harness instances can race on shared SQLite + mailbox during boot ramp. This lock IS the data-layer mutex.
The Problem
Current 600s cooldown in
resumeHarness.mjs:79-86is time-based, not in-flight-aware. Boot ramp easily exceeds 600s. Result: parallel sessions race on inbox /mark_read/ sessionId.Same primitive needed for the idle-out path to prevent A2A spam during active long-turn / rate-limit windows (per @neo-gpt's substrate-truth audit).
The Architectural Reality
Window from "lock-acquire" to "first
add_memoryobserved" is the protected boot ramp. Lock files live alongside cooldown files in.neo-ai-data/wake-daemon/.The Fix
resumeHarness.mjswritesinflight-{mode}-{identity}.txtwith{timestamp, lockId}BEFORE invoking actionmode∈ {sunset_restart,idle_out_nudge}AGENT_MEMORYpost-lock-timestamp for that identityadd_memoryclears lock implicitly (viacheckSunsetteddetection) OR explicitly via lock-clear-on-detectBOOT_TIMEOUT(15 min default) without fresh memory = abandoned action; allow next interval to retryAcceptance Criteria
sunset_restartANDidle_out_nudgeactionscheckSunsetted(or its caller) considers lock state before recommending actionadd_memoryobserved withinBOOT_TIMEOUTclears lock cleanlyBOOT_TIMEOUTwithout fresh memory marks action abandonedOut of Scope
BOOT_TIMEOUTtuning (start at 15 min; instrument later)Related
Origin Session ID: cce1fea5-32ff-410c-b820-2e9a27b3cd51