LearnNewsExamplesServices
Frontmatter
id11519
titleCross-daemon lease coverage: orchestrator-side shared-lease adoption + env-var child inheritance
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-4-7
createdAtMay 17, 2026, 3:52 AM
updatedAtMay 17, 2026, 10:19 AM
githubUrlhttps://github.com/neomjs/neo/issues/11519
authorneo-opus-4-7
commentsCount2
parentIssue11503
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 17, 2026, 10:19 AM

Cross-daemon lease coverage: orchestrator-side shared-lease adoption + env-var child inheritance

Closedenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 17, 2026, 3:52 AM

Authored by Claude Opus 4.7 (Claude Code). Session f662d055-a35b-446a-83ff-5fc859604722.

Sub of #11503 umbrella. Filed per @neo-gpt's lead-call MESSAGE:fb84293e (2026-05-17T01:39Z): "Add a separate follow-up lane for cross-daemon orchestrator lease adoption. This is the bigger substrate gap: orchestrator-owned heavy work still relies on process-local activeHeavyTask for service tasks and some children. The ticket must handle nested child lease inheritance explicitly."

Design dialogue: see #11503 comment IC_kwDODSospM8AAAABCly8-w (peer-role substrate-validation + 4-option analysis + 4 boundary conditions + test implications).

FAIR-band: in-band [15/30] — substantive substrate work; bigger than Lane A but smaller than the original lease primitive (#11505 / PR #11506). Cross-references #11515 consumer-guidance JSDoc work shipped via PR #11518.

Context

Lane A (#11513 / PR #11514) added backup to DEFAULT_HEAVY_MAINTENANCE_TASK_NAMES so the orchestrator's process-local activeHeavyTask check serializes it correctly. Lane C (#11507 / PR #11509) wired the four manual CLI scripts (runSandman, syncKnowledgeBase, backup, syncGithubWorkflow) into the shared file-based lease via withHeavyMaintenanceLease.

V-B-A on the current code state (post-Lane-A merge + Lane C merge) reveals TWO substrate gaps the umbrella's MVP shape does NOT close:

  1. Orchestrator-side heavy tasks do NOT acquire the shared file lease. grep -rn 'withHeavyMaintenanceLease|acquireHeavyMaintenanceLease' ai/daemons/ returns ONLY HeavyMaintenanceLeaseService.mjs itself. The orchestrator uses in-process activeHeavyTask only. Consequence: two concurrent npm run ai:orchestrator daemon instances (e.g., operator restart-overlap, or one ai-data sync daemon + one local-dev daemon) can each run summary or kbSync concurrently — exactly the substrate-collision Lane B/C are meant to prevent across process boundaries.

  2. Nested cascade self-defer hazard. Once we add (1), primary-dev-sync running in the orchestrator holds the shared lease. Its PrimaryRepoSyncService.runKbSync() cascade shells out to npm run ai:sync-kb (Lane C-wrapped). The child's withHeavyMaintenanceLease sees its OWN parent's lease and would defer with heldself-defer bug. The cascade KB sync would never run; primary-dev-sync would complete its git work without the dev-checkout's KB sync.

The Problem

The shared lease mechanism is correct as-shipped for cross-process contention (Lane C). The gap is at two seams:

  • Daemon ↔ daemon coverage: the orchestrator-side wrapping is the missing surface
  • Parent ↔ child coverage: the inheritance mechanism is the missing semantic

Both must land together — adding orchestrator-side wrap without inheritance creates the self-defer hazard; adding inheritance without orchestrator-side wrap doesn't close the actual substrate gap.

Architectural Reality

V-B-A on the touchpoints (commit a5c638069):

Surface File:line Current state Required change
Orchestrator heavy task wrap ai/daemons/Orchestrator.mjs:623 (executeTask assignment) + createMaintenanceExecutor (~line 586) Wraps with in-process activeHeavyTask only; no shared-lease call Wrap executeMaintenanceTask so heavy tasks acquire the shared lease before spawn/in-process execution + release on completion
Heavy task spawn ai/daemons/services/ProcessSupervisorService.mjs:281 this.spawnFn(task.command, task.args, {stdio: [...]}) — no env override Pass NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<owner-token> env when task is heavy-classified
Nested cascade spawn ai/daemons/services/PrimaryRepoSyncService.mjs:483-491 (runKbSync) execFileSyncFn(npmBin, ['run', 'ai:sync-kb'], {cwd, encoding, stdio}) — no env override Pass inherited-token env so child cascades inherit lease
withHeavyMaintenanceLease entry ai/daemons/services/HeavyMaintenanceLeaseService.mjs:257 No env-var awareness Read NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN at entry; if set + matches current lease's token → return {status: 'inherited', acquired: false, lease} and run task WITHOUT acquire/release
Lease file shape HeavyMaintenanceLeaseService.mjs writeLeaseFile {owner, reason, pid, token, acquiredAt, metadata} No change — token already supports the inheritance contract

The Fix (Option A — env-var token inheritance)

Mechanism

Parent (orchestrator's executeMaintenanceTask OR PrimaryRepoSyncService.runKbSync()) sets NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token> env when spawning child. withHeavyMaintenanceLease reads env at entry → if set, calls inspectHeavyMaintenanceLease → if current lease's token matches env-var → returns {status: 'inherited', acquired: false, lease} and runs task without acquire/release. If env unset OR token mismatch → falls through to normal acquire path.

Why Option A (recommended)

  • Lease file shape unchanged (token already there)
  • Spawn call sites unchanged structurally (only env parameter added)
  • Audit trail preserved (env-var carries inheriting owner identity; lease file still records single owner — the parent)
  • Stale-recovery safe (TTL handles parent-death; child's stale env-var defers correctly to new owner)
  • Test isolation trivial (env unset by default in spec processes)

Alternatives rejected (Avoided Traps)

  • Option B (allowlist in lease file): parent appends spawned-PID to permittedPIDs array in lease file. Rejected: requires file-mutation outside acquire/release boundary (concurrency hazard); shape-bloat for one feature; PID can be reused after process death.
  • Option C (forced bypass env-var): NEO_HEAVY_MAINTENANCE_LEASE_BYPASS=1 and child skips lease entirely. Rejected: loses auditability; any env-injection bypasses substrate protection.
  • Option D (re-entrancy via owner-string match): child supplies same owner string; primitive treats same-owner re-acquire as inheritance. Rejected: ambiguous when two daemons both run summary — same owner string, different processes, ACTUALLY contending — would falsely inherit when they should defer.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
Orchestrator executeMaintenanceTask Orchestrator.mjs:586-606 + this ticket Acquires shared file lease before spawn/in-process work for heavy-classified tasks; releases in finally on task completion (success/failure) Defers with non-error recordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held') when another daemon holds lease JSDoc on wrap-point + adjacent comment naming #11503 cross-daemon scope Spec test: simulate file-lock by competing process → assert orchestrator's heavy task defers
ProcessSupervisorService.runTask spawn-env ProcessSupervisorService.mjs:281 + this ticket Passes NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN=<token> env to spawn for heavy-classified tasks Non-heavy tasks omit env (no behavior change) JSDoc on runTask Spec asserts env-var present in spawnFn calls for heavy tasks
PrimaryRepoSyncService.runKbSync spawn-env PrimaryRepoSyncService.mjs:483-491 + this ticket Passes inherited-token env to cascade spawn If no parent token in current env, cascade spawns without env (Lane C wrap acquires normally, may defer) JSDoc Spec asserts env-var present in cascade spawn
withHeavyMaintenanceLease env-var check HeavyMaintenanceLeaseService.mjs:257-280 + this ticket If NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN is set and matches current lease's token → returns {status: 'inherited', acquired: false, lease} without acquire/release Env unset OR token mismatch → normal acquire path JSDoc + reference-impl + new status: 'inherited' row in returned-shape table Spec: env-set + matching token → inherited; env-set + mismatch → falls through to defer

Acceptance Criteria

  • AC1: Orchestrator.executeMaintenanceTask acquires the shared file lease before heavy-task execution; releases on completion (success + failure paths)

  • AC2: withHeavyMaintenanceLease honors NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN env-var: if set and matching current lease's token, returns {status: 'inherited', acquired: false, lease} and runs task without acquire/release

  • AC3: ProcessSupervisorService.runTask passes inherited-token env to spawned child for heavy-classified tasks; non-heavy tasks unaffected

  • AC4: PrimaryRepoSyncService.runKbSync passes inherited-token env to cascade spawn so cascade inherits parent's lease

  • AC5: Spec test: two orchestrator instances (mocked via dual Neo.create(Orchestrator) + shared leasePath) — second one's heavy task defers with recordTaskOutcome(status='skipped', reasonCode='heavy-maintenance-lease-held')

  • AC6: Spec test: cascade-inheritance — parent acquires lease, spawns child with env, child's withHeavyMaintenanceLease returns inherited, runs task without acquire/release on lease file, parent releases on completion

  • AC7: Spec test: stale-cascade — parent dies between spawn + child-execute; child's env-var no longer matches new owner's token; child falls through to normal acquire → defers correctly

  • AC8: Spec test: stale-env-token negative cases — inherited env token MUST NOT bypass acquisition in ANY of the following three sub-cases (each fails the inheritance check and falls through to normal acquire/held semantics):

    • AC8a — lease file missing: env-var set but lease file does not exist on disk → inspectHeavyMaintenanceLease returns status: 'missing' → no token to match → normal acquire path
    • AC8b — token mismatch: env-var set, lease file exists, but lease.token !== process.env[ENV_VAR] → no inheritance → normal acquire path (which would defer with held if another owner)
    • AC8c — stale parent: env-var set with token-X, but lease file has been replaced (parent died → TTL → another owner acquired with token-Y) → mismatch → normal acquire path defers correctly to new owner

    Per @neo-gpt review feedback (MESSAGE:af0c7350-0a35-4e3f-a6d9-ef74248b5a77): "inherited env token present but lease file missing / token mismatch / stale parent should not bypass acquisition; it should fall through to normal acquire/held semantics." Three sub-cases collapse to the same Option A semantic (env-token compared against current lease file's token; mismatch → no inheritance) but each warrants explicit spec coverage to prevent regression on any sub-path.

  • AC9: JSDoc on withHeavyMaintenanceLease updated to document the new inherited status + env-var contract (extends #11515 JSDoc work shipped via PR #11518)

  • AC10: No regression in existing HeavyMaintenanceLeaseService.spec.mjs (9/9 from PR #11518 stays green)

  • AC11: No regression in Orchestrator.spec.mjs (17/17 from PR #11514 stays green)

  • AC12: Documentation note in learn/agentos/ (light ADR-shape) introducing the env-based lease-inheritance contract as a documented primitive — env-var name + semantic + audit trail rationale

Out of Scope

  • Lane D narrow observability (PrimaryRepoSyncService.runKbSync() TaskStateService + HealthService annotation) — separate ticket filed alongside this one
  • Allowlist-based lease sharing (Option B) — rejected in design
  • Forced-bypass env-var (Option C) — rejected in design
  • Same-owner-string re-entrancy (Option D) — rejected in design
  • Cross-machine lease coordination (NFS-backed lease file for multi-host) — out of scope; assumes single-machine daemon-set
  • Auto-detection of nested cascades — caller (parent code) explicitly opts into inheritance via env-var; no implicit detection

Avoided Traps

  • Adding orchestrator-side wrap WITHOUT inheritance: rejected — creates the self-defer hazard for primary-dev-sync cascade. Both halves must land together.
  • Inheriting by PID/parent-PID match instead of token: rejected — token is already in the lease shape; PID is reusable; PPID is unreliable after parent-death
  • Releasing the lease in the SPAWN call (before child exits): rejected — child needs the lease window for its work; release must wait for child completion
  • Polling for lease in child's withHeavyMaintenanceLease instead of env-check: rejected — adds latency; env-check is O(1) and the env-var IS the explicit handoff signal

Related

  • Parent umbrella: #11503 (Enforce heavy-maintenance mutex across Agent OS tasks)
  • Substrate primitive: PR #11506 / #11505 (HeavyMaintenanceLeaseService — primitive being extended)
  • Sibling lane A: #11513 / PR #11514 (orchestrator's backup joins heavy set + cross-poll tests)
  • Sibling lane C: #11507 / PR #11509 (manual CLI script lease adoption)
  • Sibling consumer-guidance: #11515 / PR #11518 (release-timing JSDoc + spec; this ticket EXTENDS the JSDoc with the new inherited status row)
  • Companion ticket (parallel-filing): narrow Lane D observability (TaskStateService + HealthService annotation for cascade)
  • Design dialogue: #11503 comment IC_kwDODSospM8AAAABCly8-w (peer-role substrate-validation; Option A recommended with 3 alternatives + 4 boundary conditions + test implications)
  • Lead-call authority: @neo-gpt MESSAGE:fb84293e-325b-4f0f-bf76-473d9ad342ff (2026-05-17T01:39:36Z) — explicit cross-daemon lane authorization

Handoff Retrieval Hints

  • Retrieval Hint: cross-daemon lease inheritance env-var token NEO_HEAVY_MAINTENANCE_LEASE_INHERITED_TOKEN
  • Retrieval Hint: Commit SHA a5c638069 (Lane C merge; the substrate state this ticket extends from)
  • Retrieval Hint: withHeavyMaintenanceLease inherited status (the new returned-shape row)
  • Retrieval Hint: PR #11506 / #11505 + PR #11509 + PR #11514 + PR #11518 (the four-PR substrate stack this ticket completes)

Origin Session ID: f662d055-a35b-446a-83ff-5fc859604722

tobiu referenced in commit 55dadc7 - "feat(ai): annotate runKbSync cascade as kbSync lifecycle (#11520) (#11521) on May 17, 2026, 8:23 AM
tobiu referenced in commit 0cedd3d - "feat(ai): orchestrator shared-lease wrap + cascade env inheritance (#11519) (#11527) on May 17, 2026, 10:19 AM
tobiu closed this issue on May 17, 2026, 10:19 AM