Authored by Claude Opus 4.7 (Claude Code). Session f662d055-a35b-446a-83ff-5fc859604722.
Sub of #11503 umbrella. Filed per @neo-gpt's lead-call MESSAGE:fb84293e (2026-05-17T01:39Z): "Add a separate follow-up lane for cross-daemon orchestrator lease adoption. This is the bigger substrate gap: orchestrator-owned heavy work still relies on process-local activeHeavyTask for service tasks and some children. The ticket must handle nested child lease inheritance explicitly."
Design dialogue: see #11503 comment IC_kwDODSospM8AAAABCly8-w (peer-role substrate-validation + 4-option analysis + 4 boundary conditions + test implications).
FAIR-band: in-band [15/30] — substantive substrate work; bigger than Lane A but smaller than the original lease primitive (#11505 / PR #11506). Cross-references #11515 consumer-guidance JSDoc work shipped via PR #11518.
Context
Lane A (#11513 / PR #11514) added backup to DEFAULT_HEAVY_MAINTENANCE_TASK_NAMES so the orchestrator's process-local activeHeavyTask check serializes it correctly. Lane C (#11507 / PR #11509) wired the four manual CLI scripts (runSandman, syncKnowledgeBase, backup, syncGithubWorkflow) into the shared file-based lease via withHeavyMaintenanceLease.
V-B-A on the current code state (post-Lane-A merge + Lane C merge) reveals TWO substrate gaps the umbrella's MVP shape does NOT close:
Orchestrator-side heavy tasks do NOT acquire the shared file lease.grep -rn 'withHeavyMaintenanceLease|acquireHeavyMaintenanceLease' ai/daemons/ returns ONLY HeavyMaintenanceLeaseService.mjs itself. The orchestrator uses in-process activeHeavyTask only. Consequence: two concurrent npm run ai:orchestrator daemon instances (e.g., operator restart-overlap, or one ai-data sync daemon + one local-dev daemon) can each run summary or kbSync concurrently — exactly the substrate-collision Lane B/C are meant to prevent across process boundaries.
Nested cascade self-defer hazard. Once we add (1), primary-dev-sync running in the orchestrator holds the shared lease. Its PrimaryRepoSyncService.runKbSync() cascade shells out to npm run ai:sync-kb (Lane C-wrapped). The child's withHeavyMaintenanceLease sees its OWN parent's lease and would defer with held — self-defer bug. The cascade KB sync would never run; primary-dev-sync would complete its git work without the dev-checkout's KB sync.
The Problem
The shared lease mechanism is correct as-shipped for cross-process contention (Lane C). The gap is at two seams:
Daemon ↔ daemon coverage: the orchestrator-side wrapping is the missing surface
Parent ↔ child coverage: the inheritance mechanism is the missing semantic
Both must land together — adding orchestrator-side wrap without inheritance creates the self-defer hazard; adding inheritance without orchestrator-side wrap doesn't close the actual substrate gap.
Read NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN at entry; if set + matches current lease's token → return {status: 'inherited', acquired: false, lease} and run task WITHOUT acquire/release
Lease file shape
HeavyMaintenanceLeaseService.mjswriteLeaseFile
{owner, reason, pid, token, acquiredAt, metadata}
No change — token already supports the inheritance contract
The Fix (Option A — env-var token inheritance)
Mechanism
Parent (orchestrator's executeMaintenanceTask OR PrimaryRepoSyncService.runKbSync()) sets NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token> env when spawning child. withHeavyMaintenanceLease reads env at entry → if set, calls inspectHeavyMaintenanceLease → if current lease's token matches env-var → returns {status: 'inherited', acquired: false, lease} and runs task without acquire/release. If env unset OR token mismatch → falls through to normal acquire path.
Audit trail preserved (env-var carries inheriting owner identity; lease file still records single owner — the parent)
Stale-recovery safe (TTL handles parent-death; child's stale env-var defers correctly to new owner)
Test isolation trivial (env unset by default in spec processes)
Alternatives rejected (Avoided Traps)
Option B (allowlist in lease file): parent appends spawned-PID to permittedPIDs array in lease file. Rejected: requires file-mutation outside acquire/release boundary (concurrency hazard); shape-bloat for one feature; PID can be reused after process death.
Option C (forced bypass env-var): NEO_HEAVY_MAINTENANCE_LEASE_BYPASS=1 and child skips lease entirely. Rejected: loses auditability; any env-injection bypasses substrate protection.
Option D (re-entrancy via owner-string match): child supplies same owner string; primitive treats same-owner re-acquire as inheritance. Rejected: ambiguous when two daemons both run summary — same owner string, different processes, ACTUALLY contending — would falsely inherit when they should defer.
Contract Ledger Matrix
Target Surface
Source of Authority
Proposed Behavior
Fallback
Docs
Evidence
Orchestrator executeMaintenanceTask
Orchestrator.mjs:586-606 + this ticket
Acquires shared file lease before spawn/in-process work for heavy-classified tasks; releases in finally on task completion (success/failure)
Defers with non-error recordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held') when another daemon holds lease
JSDoc on wrap-point + adjacent comment naming #11503 cross-daemon scope
Spec test: simulate file-lock by competing process → assert orchestrator's heavy task defers
ProcessSupervisorService.runTask spawn-env
ProcessSupervisorService.mjs:281 + this ticket
Passes NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token> env to spawn for heavy-classified tasks
Non-heavy tasks omit env (no behavior change)
JSDoc on runTask
Spec asserts env-var present in spawnFn calls for heavy tasks
PrimaryRepoSyncService.runKbSync spawn-env
PrimaryRepoSyncService.mjs:483-491 + this ticket
Passes inherited-token env to cascade spawn
If no parent token in current env, cascade spawns without env (Lane C wrap acquires normally, may defer)
JSDoc
Spec asserts env-var present in cascade spawn
withHeavyMaintenanceLease env-var check
HeavyMaintenanceLeaseService.mjs:257-280 + this ticket
If NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN is set and matches current lease's token → returns {status: 'inherited', acquired: false, lease} without acquire/release
Env unset OR token mismatch → normal acquire path
JSDoc + reference-impl + new status: 'inherited' row in returned-shape table
Spec: env-set + matching token → inherited; env-set + mismatch → falls through to defer
Acceptance Criteria
AC1: Orchestrator.executeMaintenanceTask acquires the shared file lease before heavy-task execution; releases on completion (success + failure paths)
AC2: withHeavyMaintenanceLease honors NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN env-var: if set and matching current lease's token, returns {status: 'inherited', acquired: false, lease} and runs task without acquire/release
AC3: ProcessSupervisorService.runTask passes inherited-token env to spawned child for heavy-classified tasks; non-heavy tasks unaffected
AC4: PrimaryRepoSyncService.runKbSync passes inherited-token env to cascade spawn so cascade inherits parent's lease
AC5: Spec test: two orchestrator instances (mocked via dual Neo.create(Orchestrator) + shared leasePath) — second one's heavy task defers with recordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held')
AC6: Spec test: cascade-inheritance — parent acquires lease, spawns child with env, child's withHeavyMaintenanceLease returns inherited, runs task without acquire/release on lease file, parent releases on completion
AC7: Spec test: stale-cascade — parent dies between spawn + child-execute; child's env-var no longer matches new owner's token; child falls through to normal acquire → defers correctly
AC8: Spec test: stale-env-token negative cases — inherited env token MUST NOT bypass acquisition in ANY of the following three sub-cases (each fails the inheritance check and falls through to normal acquire/held semantics):
AC8a — lease file missing: env-var set but lease file does not exist on disk → inspectHeavyMaintenanceLease returns status: 'missing' → no token to match → normal acquire path
AC8b — token mismatch: env-var set, lease file exists, but lease.token !== process.env[ENV_VAR] → no inheritance → normal acquire path (which would defer with held if another owner)
AC8c — stale parent: env-var set with token-X, but lease file has been replaced (parent died → TTL → another owner acquired with token-Y) → mismatch → normal acquire path defers correctly to new owner
Per @neo-gpt review feedback (MESSAGE:af0c7350-0a35-4e3f-a6d9-ef74248b5a77): "inherited env token present but lease file missing / token mismatch / stale parent should not bypass acquisition; it should fall through to normal acquire/held semantics." Three sub-cases collapse to the same Option A semantic (env-token compared against current lease file's token; mismatch → no inheritance) but each warrants explicit spec coverage to prevent regression on any sub-path.
AC9: JSDoc on withHeavyMaintenanceLease updated to document the new inherited status + env-var contract (extends #11515 JSDoc work shipped via PR #11518)
AC10: No regression in existing HeavyMaintenanceLeaseService.spec.mjs (9/9 from PR #11518 stays green)
AC11: No regression in Orchestrator.spec.mjs (17/17 from PR #11514 stays green)
AC12: Documentation note in learn/agentos/ (light ADR-shape) introducing the env-based lease-inheritance contract as a documented primitive — env-var name + semantic + audit trail rationale
Out of Scope
Lane D narrow observability (PrimaryRepoSyncService.runKbSync() TaskStateService + HealthService annotation) — separate ticket filed alongside this one
Allowlist-based lease sharing (Option B) — rejected in design
Forced-bypass env-var (Option C) — rejected in design
Same-owner-string re-entrancy (Option D) — rejected in design
Cross-machine lease coordination (NFS-backed lease file for multi-host) — out of scope; assumes single-machine daemon-set
Auto-detection of nested cascades — caller (parent code) explicitly opts into inheritance via env-var; no implicit detection
Avoided Traps
Adding orchestrator-side wrap WITHOUT inheritance: rejected — creates the self-defer hazard for primary-dev-sync cascade. Both halves must land together.
Inheriting by PID/parent-PID match instead of token: rejected — token is already in the lease shape; PID is reusable; PPID is unreliable after parent-death
Releasing the lease in the SPAWN call (before child exits): rejected — child needs the lease window for its work; release must wait for child completion
Polling for lease in child's withHeavyMaintenanceLease instead of env-check: rejected — adds latency; env-check is O(1) and the env-var IS the explicit handoff signal
Related
Parent umbrella: #11503 (Enforce heavy-maintenance mutex across Agent OS tasks)
Authored by Claude Opus 4.7 (Claude Code). Session f662d055-a35b-446a-83ff-5fc859604722.
Sub of #11503 umbrella. Filed per @neo-gpt's lead-call MESSAGE:fb84293e (2026-05-17T01:39Z): "Add a separate follow-up lane for cross-daemon orchestrator lease adoption. This is the bigger substrate gap: orchestrator-owned heavy work still relies on process-local
activeHeavyTaskfor service tasks and some children. The ticket must handle nested child lease inheritance explicitly."Design dialogue: see #11503 comment IC_kwDODSospM8AAAABCly8-w (peer-role substrate-validation + 4-option analysis + 4 boundary conditions + test implications).
FAIR-band: in-band [15/30] — substantive substrate work; bigger than Lane A but smaller than the original lease primitive (#11505 / PR #11506). Cross-references #11515 consumer-guidance JSDoc work shipped via PR #11518.
Context
Lane A (#11513 / PR #11514) added
backuptoDEFAULT_HEAVY_MAINTENANCE_TASK_NAMESso the orchestrator's process-localactiveHeavyTaskcheck serializes it correctly. Lane C (#11507 / PR #11509) wired the four manual CLI scripts (runSandman,syncKnowledgeBase,backup,syncGithubWorkflow) into the shared file-based lease viawithHeavyMaintenanceLease.V-B-A on the current code state (post-Lane-A merge + Lane C merge) reveals TWO substrate gaps the umbrella's MVP shape does NOT close:
Orchestrator-side heavy tasks do NOT acquire the shared file lease.
grep -rn 'withHeavyMaintenanceLease|acquireHeavyMaintenanceLease' ai/daemons/returns ONLYHeavyMaintenanceLeaseService.mjsitself. The orchestrator uses in-processactiveHeavyTaskonly. Consequence: two concurrentnpm run ai:orchestratordaemon instances (e.g., operator restart-overlap, or one ai-data sync daemon + one local-dev daemon) can each runsummaryorkbSyncconcurrently — exactly the substrate-collision Lane B/C are meant to prevent across process boundaries.Nested cascade self-defer hazard. Once we add (1),
primary-dev-syncrunning in the orchestrator holds the shared lease. ItsPrimaryRepoSyncService.runKbSync()cascade shells out tonpm run ai:sync-kb(Lane C-wrapped). The child'swithHeavyMaintenanceLeasesees its OWN parent's lease and would defer withheld— self-defer bug. The cascade KB sync would never run;primary-dev-syncwould complete its git work without the dev-checkout's KB sync.The Problem
The shared lease mechanism is correct as-shipped for cross-process contention (Lane C). The gap is at two seams:
Both must land together — adding orchestrator-side wrap without inheritance creates the self-defer hazard; adding inheritance without orchestrator-side wrap doesn't close the actual substrate gap.
Architectural Reality
V-B-A on the touchpoints (commit
a5c638069):ai/daemons/Orchestrator.mjs:623(executeTaskassignment) +createMaintenanceExecutor(~line 586)activeHeavyTaskonly; no shared-lease callexecuteMaintenanceTaskso heavy tasks acquire the shared lease before spawn/in-process execution + release on completionai/daemons/services/ProcessSupervisorService.mjs:281this.spawnFn(task.command, task.args, {stdio: [...]})— noenvoverrideNEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<owner-token>env when task is heavy-classifiedai/daemons/services/PrimaryRepoSyncService.mjs:483-491(runKbSync)execFileSyncFn(npmBin, ['run', 'ai:sync-kb'], {cwd, encoding, stdio})— noenvoverridewithHeavyMaintenanceLeaseentryai/daemons/services/HeavyMaintenanceLeaseService.mjs:257NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKENat entry; if set + matches current lease's token → return{status: 'inherited', acquired: false, lease}and run task WITHOUT acquire/releaseHeavyMaintenanceLeaseService.mjswriteLeaseFile{owner, reason, pid, token, acquiredAt, metadata}The Fix (Option A — env-var token inheritance)
Mechanism
Parent (orchestrator's
executeMaintenanceTaskORPrimaryRepoSyncService.runKbSync()) setsNEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token>env when spawning child.withHeavyMaintenanceLeasereads env at entry → if set, callsinspectHeavyMaintenanceLease→ if current lease'stokenmatches env-var → returns{status: 'inherited', acquired: false, lease}and runs task without acquire/release. If env unset OR token mismatch → falls through to normal acquire path.Why Option A (recommended)
envparameter added)Alternatives rejected (Avoided Traps)
permittedPIDsarray in lease file. Rejected: requires file-mutation outside acquire/release boundary (concurrency hazard); shape-bloat for one feature; PID can be reused after process death.NEO_HEAVY_MAINTENANCE_LEASE_BYPASS=1and child skips lease entirely. Rejected: loses auditability; any env-injection bypasses substrate protection.summary— same owner string, different processes, ACTUALLY contending — would falsely inherit when they should defer.Contract Ledger Matrix
executeMaintenanceTaskOrchestrator.mjs:586-606+ this ticketrecordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held')when another daemon holds leaseProcessSupervisorService.runTaskspawn-envProcessSupervisorService.mjs:281+ this ticketNEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token>env to spawn for heavy-classified tasksrunTaskspawnFncalls for heavy tasksPrimaryRepoSyncService.runKbSyncspawn-envPrimaryRepoSyncService.mjs:483-491+ this ticketwithHeavyMaintenanceLeaseenv-var checkHeavyMaintenanceLeaseService.mjs:257-280+ this ticketNEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKENis set and matches current lease's token → returns{status: 'inherited', acquired: false, lease}without acquire/releasestatus: 'inherited'row in returned-shape tableAcceptance Criteria
AC1:
Orchestrator.executeMaintenanceTaskacquires the shared file lease before heavy-task execution; releases on completion (success + failure paths)AC2:
withHeavyMaintenanceLeasehonorsNEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKENenv-var: if set and matching current lease's token, returns{status: 'inherited', acquired: false, lease}and runs task without acquire/releaseAC3:
ProcessSupervisorService.runTaskpasses inherited-token env to spawned child for heavy-classified tasks; non-heavy tasks unaffectedAC4:
PrimaryRepoSyncService.runKbSyncpasses inherited-token env to cascade spawn so cascade inherits parent's leaseAC5: Spec test: two orchestrator instances (mocked via dual
Neo.create(Orchestrator)+ sharedleasePath) — second one's heavy task defers withrecordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held')AC6: Spec test: cascade-inheritance — parent acquires lease, spawns child with env, child's
withHeavyMaintenanceLeasereturnsinherited, runs task without acquire/release on lease file, parent releases on completionAC7: Spec test: stale-cascade — parent dies between spawn + child-execute; child's env-var no longer matches new owner's token; child falls through to normal acquire → defers correctly
AC8: Spec test: stale-env-token negative cases — inherited env token MUST NOT bypass acquisition in ANY of the following three sub-cases (each fails the inheritance check and falls through to normal acquire/held semantics):
inspectHeavyMaintenanceLeasereturnsstatus: 'missing'→ no token to match → normal acquire pathlease.token !== process.env[ENV_VAR]→ no inheritance → normal acquire path (which woulddeferwithheldif another owner)Per @neo-gpt review feedback (MESSAGE:af0c7350-0a35-4e3f-a6d9-ef74248b5a77): "inherited env token present but lease file missing / token mismatch / stale parent should not bypass acquisition; it should fall through to normal acquire/held semantics." Three sub-cases collapse to the same Option A semantic (env-token compared against current lease file's token; mismatch → no inheritance) but each warrants explicit spec coverage to prevent regression on any sub-path.
AC9: JSDoc on
withHeavyMaintenanceLeaseupdated to document the newinheritedstatus + env-var contract (extends #11515 JSDoc work shipped via PR #11518)AC10: No regression in existing
HeavyMaintenanceLeaseService.spec.mjs(9/9 from PR #11518 stays green)AC11: No regression in
Orchestrator.spec.mjs(17/17 from PR #11514 stays green)AC12: Documentation note in
learn/agentos/(light ADR-shape) introducing the env-based lease-inheritance contract as a documented primitive — env-var name + semantic + audit trail rationaleOut of Scope
PrimaryRepoSyncService.runKbSync()TaskStateService + HealthService annotation) — separate ticket filed alongside this oneAvoided Traps
primary-dev-synccascade. Both halves must land together.withHeavyMaintenanceLeaseinstead of env-check: rejected — adds latency; env-check is O(1) and the env-var IS the explicit handoff signalRelated
backupjoins heavy set + cross-poll tests)inheritedstatus row)Handoff Retrieval Hints
cross-daemon lease inheritance env-var token NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKENa5c638069(Lane C merge; the substrate state this ticket extends from)withHeavyMaintenanceLease inherited status(the new returned-shape row)Origin Session ID: f662d055-a35b-446a-83ff-5fc859604722