Resolves Lane C of #11503 umbrella. Builds on the lease primitive shipped by PR #11506 (#11505).
FAIR-band: in-band [10/30] — prio-0 per operator elevation 2026-05-16T22:55Z; closes the gh-CLI bypass surface at the maintenance-script layer (sibling to PR #11502 closing the same pattern at the pr-review surface).
Premise
PR #11506 / #11505 shipped HeavyMaintenanceLeaseService with withHeavyMaintenanceLease(task, {owner, ...}) wrapper. Lane C wires manual CLI scripts into that lease so they cannot collide with orchestrator-owned heavy work OR with each other.
Empirical anchor (today's wedge at 19:27Z): orchestrator's kbSync subprocess wedged while operator's manual npm run ai:sync-kb would have collided in parallel had operator triggered it. The substrate gap is real and exactly the failure class Lane C closes.
Scope (per #11503 ACs 4-7 and lead-sync from @neo-gpt MESSAGE:20f48f73)
| Script |
AC |
Lease owner string |
buildScripts/ai/runSandman.mjs |
AC4 |
sandman |
buildScripts/ai/syncKnowledgeBase.mjs |
AC5 |
kbSync |
buildScripts/ai/backup.mjs |
AC6 |
backup |
buildScripts/ai/syncGithubWorkflow.mjs |
AC7 |
syncGithubWorkflow (whole-run guarded per GPT's call; Stage 1/Stage 2 split deferred) |
Prescription
For each script, wrap the existing main() (or equivalent top-level async work) with withHeavyMaintenanceLease:
import {withHeavyMaintenanceLease} from '../../ai/services/github-workflow/...'
const result = await withHeavyMaintenanceLease(
async (lease) => {
},
{owner: 'sandman', reason: 'manual-cli'}
);
if (result.status === 'held') {
console.log(`⏸️ Deferred: heavy-maintenance lease held by '${result.lease.owner}' (reason='${result.lease.reason}', pid=${result.lease.pid}, acquiredAt=${result.lease.acquiredAt}).`);
console.log(` This script will not run while another heavy-maintenance task is active.`);
process.exit(0);
}
result.status === 'held' is a non-error exit (matches GPT's AC9 "deferred due to active heavy-maintenance lease" log shape).
Acceptance Criteria
Avoided Traps
- Splitting
syncGithubWorkflow into Stage 1 (GitHub IO, light) vs Stage 2 (Chroma writes, heavy): rejected per GPT's lead-sync (MESSAGE:20f48f73). Whole-run guard for v1; Stage-split refinement is a separate future ticket if lease hold-time becomes empirical pain.
- Adding lease semantics to
npm run ai:check-substrate-size / lint-skill-manifest / check-retired-primitives: rejected — these are pure-static-validation scripts (no Chroma/SQLite/LLM writes); not in the heavy-maintenance set.
- Building a script-specific lock primitive in each script: rejected — that's exactly the "per-script private locks" trap #11503 explicitly named. The shared lease primitive (PR #11506) is the substrate.
- Blocking the script on held-lease (wait-loop): rejected —
withHeavyMaintenanceLease returns status: 'held' immediately; scripts should defer-and-exit, not wait. Operator can re-invoke later. Matches non-error deferral semantics from AC9.
Related
- Parent umbrella: #11503 (Enforce heavy-maintenance mutex across Agent OS tasks)
- Substrate dependency: PR #11506 / #11505 (shared lease primitive)
- Conceptual sibling at different surface: PR #11502 / #11495 (CI lint for gh-pr-review CLI bypass) — same Helpful-Assistant CLI-bypass failure pattern, different substrate surface
- Empirical anchor: today's wedge cascade at 19:27Z (manual kb-sync + orchestrator kb-sync collision the lease primitive + this Lane C close together)
Out of Scope
- Lane A — adding
backup to DEFAULT_HEAVY_MAINTENANCE_TASK_NAMES in Orchestrator.mjs (folded by GPT into Lane B per their MESSAGE:20f48f73; will be handled separately if not part of PR #11506)
- Lane D —
PrimaryRepoSyncService.runKbSync() nested cascade lease wiring (GPT considering during Lane B API design)
- Lane E — observability/stale-adoption surfacing (separate ticket if needed)
Resolves Lane C of #11503 umbrella. Builds on the lease primitive shipped by PR #11506 (#11505).
FAIR-band: in-band [10/30] — prio-0 per operator elevation 2026-05-16T22:55Z; closes the gh-CLI bypass surface at the maintenance-script layer (sibling to PR #11502 closing the same pattern at the pr-review surface).
Premise
PR #11506 / #11505 shipped
HeavyMaintenanceLeaseServicewithwithHeavyMaintenanceLease(task, {owner, ...})wrapper. Lane C wires manual CLI scripts into that lease so they cannot collide with orchestrator-owned heavy work OR with each other.Empirical anchor (today's wedge at 19:27Z): orchestrator's kbSync subprocess wedged while operator's manual
npm run ai:sync-kbwould have collided in parallel had operator triggered it. The substrate gap is real and exactly the failure class Lane C closes.Scope (per #11503 ACs 4-7 and lead-sync from @neo-gpt MESSAGE:20f48f73)
buildScripts/ai/runSandman.mjssandmanbuildScripts/ai/syncKnowledgeBase.mjskbSyncbuildScripts/ai/backup.mjsbackupbuildScripts/ai/syncGithubWorkflow.mjssyncGithubWorkflow(whole-run guarded per GPT's call; Stage 1/Stage 2 split deferred)Prescription
For each script, wrap the existing
main()(or equivalent top-level async work) withwithHeavyMaintenanceLease:import {withHeavyMaintenanceLease} from '../../ai/services/github-workflow/...' // path TBD per services.mjs barrel const result = await withHeavyMaintenanceLease( async (lease) => { // ... existing script body ... }, {owner: 'sandman', reason: 'manual-cli'} ); if (result.status === 'held') { console.log(`⏸️ Deferred: heavy-maintenance lease held by '${result.lease.owner}' (reason='${result.lease.reason}', pid=${result.lease.pid}, acquiredAt=${result.lease.acquiredAt}).`); console.log(` This script will not run while another heavy-maintenance task is active.`); process.exit(0); }result.status === 'held'is a non-error exit (matches GPT's AC9 "deferred due to active heavy-maintenance lease" log shape).Acceptance Criteria
runSandman.mjs,syncKnowledgeBase.mjs,backup.mjs,syncGithubWorkflow.mjs) wrap their main body withwithHeavyMaintenanceLeasewithHeavyMaintenanceLeasefinally block)Avoided Traps
syncGithubWorkflowinto Stage 1 (GitHub IO, light) vs Stage 2 (Chroma writes, heavy): rejected per GPT's lead-sync (MESSAGE:20f48f73). Whole-run guard for v1; Stage-split refinement is a separate future ticket if lease hold-time becomes empirical pain.npm run ai:check-substrate-size/lint-skill-manifest/check-retired-primitives: rejected — these are pure-static-validation scripts (no Chroma/SQLite/LLM writes); not in the heavy-maintenance set.withHeavyMaintenanceLeasereturnsstatus: 'held'immediately; scripts should defer-and-exit, not wait. Operator can re-invoke later. Matches non-error deferral semantics from AC9.Related
Out of Scope
backuptoDEFAULT_HEAVY_MAINTENANCE_TASK_NAMESinOrchestrator.mjs(folded by GPT into Lane B per their MESSAGE:20f48f73; will be handled separately if not part of PR #11506)PrimaryRepoSyncService.runKbSync()nested cascade lease wiring (GPT considering during Lane B API design)