Context
ai/daemons/Orchestrator.mjs (682 LOC, shipped via PR #11016 sub of #11009) currently mixes coordination with concrete business logic across multiple responsibilities. Audit triggered by operator architectural challenge 2026-05-09: "i still challenge the orchestrator architecture and if it does too much."
Update 2026-05-09 (post-Discussion #11025 graduation): Body refreshed to reflect 3-voice cross-family convergence. Sub-extraction order INVERTED per @neo-gpt's leakage-prevention reasoning (TaskStateService first, not ProcessSupervisorService); CadenceEngine boundary tightened to pure-trigger-builder (NOT execute-runner) per OQ8 resolution. Original prescription preserved in Discussion #11025's archaeological record.
Empirical responsibility map (verified line-by-line read):
| Lines |
Current responsibility |
Verdict |
| 1-138 (helpers) + 138-682 (class) |
Total: 682 LOC |
Class doing too much |
| ~150 of those |
Core: lifecycle (start/stop/poll), configure, sweep dispatcher (runMaintenanceCycle/runTaskCycle), static config + collaborator declarations |
Stay |
| ~200 lines |
runTask (spawn + child.on lifecycle + PID-file write + state-mutation hooks) + recoverTask/recoverTasks/watchRecoveredTask/clearRecoveredTask + getTaskPidFile + processCommand |
Extract → ProcessSupervisorService |
| ~60 lines |
readState/writeState/createInitialTaskState (on-disk task-state envelope + JSON persistence) |
Extract → TaskStateService |
| ~30 lines + per-task wiring |
shouldRunIntervalTask/parseInterval + the boilerplate runSummaryCycle/runKbSyncCycle pattern (would repeat for every new task) |
Extract → CadenceEngine |
| ~15 lines |
writeLog (file-append + console mirror) |
Out of M3.5 scope per OQ4 — keep inline / pass adapter |
Existing extractions that prove this is the right substrate pattern:
ai/services/memory-core/HealthService.mjs — outcome recording
ai/daemons/services/SummarizationCoordinatorService.mjs — getDueTask({...}) (sunset-handover + interval-sweep trigger logic for summary lane); already consumed via Orchestrator's summarizationCoordinator_ reactive config at line 233
ai/scripts/bridge-daemon-queries.mjs — DB init + handover queries
Architectural target (mirroring v13-path.md M2 "Common Base Server Class" pattern at the daemon tier):
Orchestrator shrinks to ~150 LOC of pure coordination:
- Hold collaborators:
processSupervisor_, taskState_, cadenceEngine_, healthService_, per-task coordinators
- Run poll loop. For each lane: per-task coordinator returns trigger via
getDueTask({...}) (or via cadenceEngine.getIntervalTrigger({...}) primitive composition); orchestrator passes trigger to processSupervisor.runTask(trigger)
- That's it
This is a NEW milestone in the v13 path: M3.5 (Orchestrator decomposition), sequenced between current M3 (orchestrator skeleton, complete via #11009/#11016) and M4 (migrate decomposed daemon services to Orchestrator). Adding M3.5 first prevents M4's per-coordinator-service additions (DreamCoordinator, SandmanCoordinator, BackupService, GoldenPathCoordinator, GraphMaintenanceCoordinator) from compounding the existing fat-class problem.
Sub-tasks (file lazily as work picks them up)
Extraction order INVERTED per @neo-gpt's leakage-prevention reasoning at https://github.com/orgs/neomjs/discussions/11025#discussioncomment-16863204: if ProcessSupervisorService extracts first against raw mutable state, the new service inherits the exact pattern we're trying to remove. Locking TaskStateService mutation API first means ProcessSupervisor consumes a clean API from day one.
| # |
Sub-extraction |
Foundational order |
| Sub-1 |
TaskStateService — ai/daemons/services/TaskStateService.mjs; owns mutation API (markStarted, markSkipped, markCompleted, markFailed, adoptRunning, clearRecovered, getLastRunAt) + on-disk persistence (readState/writeState/createInitialTaskState). Locks state-mutation boundary BEFORE larger extraction inherits it. |
First (foundational; OQ2 inversion) |
| Sub-2 |
ProcessSupervisorService — ai/daemons/services/ProcessSupervisorService.mjs; owns runTask + recoverTask family + PID-file lifecycle + close/error wiring. Consumes TaskStateService API for state changes; does NOT mutate raw state directly. |
After Sub-1 |
| Sub-3 |
CadenceEngine — ai/daemons/services/CadenceEngine.mjs; pure trigger-builder exposing getIntervalTrigger({taskName, now, lastRunAt, intervalMs, reasonPrefix}). NOT execute-runner — per-task coordinators compose this primitive OR bypass for non-interval triggers (e.g., sunset-handover-priority pattern in SummarizationCoordinatorService). |
After Sub-1 + Sub-2 |
| Sub-4 |
Orchestrator slim-down PR — wire all three collaborators via reactive config, drop extracted methods, verify no behavior regression via characterization-then-extract test pattern. |
After Sub-1 + Sub-2 + Sub-3 |
Acceptance Criteria (epic-level — fires when sub-tickets close)
Avoided Traps
- ❌ Pile more tasks onto current Orchestrator shape — every new "add task X to Orchestrator" makes the fat-class problem worse; decompose first
- ❌ Big-bang single-PR rewrite — sub-tickets per extraction allow incremental review + cross-family verification per-extraction
- ❌ Skip the precedent audit —
SummarizationCoordinatorService is the exemplar pattern; new extractions mirror it (Neo class with focused method surface, reactive collaborator-injection in Orchestrator)
- ❌ Sub-task locations without precedent verification —
ai/daemons/services/ already hosts SummarizationCoordinatorService + GraphMaintenanceService + GoldenPathSynthesizer + ConceptIngestor + ConceptDiscoveryService + GapInferenceEngine + IssueIngestor + LazyEdgeDrainer + MemorySessionIngestor + SemanticGraphExtractor + TopologyInferenceEngine — that's the canonical home for daemon-tier services
- ❌ Extract ProcessSupervisor first — bakes in raw-state-mutation responsibility we're trying to remove (OQ2 inversion blocks this; TaskStateService extracts first to lock mutation API)
- ❌ CadenceEngine as execute-runner —
runIfDue(taskName, dueCheckFn, executeFn) shape would make CadenceEngine a mini-Orchestrator owning orchestration flow; pure getIntervalTrigger({...}) primitive preserves the SummarizationCoordinatorService precedent (coordinator decides "what work is due"; supervisor executes; orchestrator wires)
Provenance
- Operator architectural challenge 2026-05-09: "an orchestrator orchestrates ... i still challenge the orchestrator architecture and if it does too much"
- Triggered by my flawed framing of #11018 (closed as not planned 2026-05-09)
- Empirical anchor:
wc -l ai/daemons/Orchestrator.mjs = 682; responsibility map verified via line-by-line read
- Substrate precedent:
SummarizationCoordinatorService.mjs (already proves the pattern works in this codebase)
- v13-path.md M2 "Common Base Server Class" architectural pattern applied at daemon tier
- Discussion #11025 — 3-voice cross-family convergence on all 8 OQs (Opus + Gemini + GPT):
- OQ2 (extraction order): @neo-gpt proposed TaskStateService-first inversion with leakage-prevention reasoning; @neo-opus-4-7 leaned in with reasoning at https://github.com/neomjs/neo/discussions/11025#discussioncomment-16863319; @neo-gemini-3-1-pro aligned 16:50Z
- OQ8 (CadenceEngine boundary): @neo-gpt's pure-trigger-builder shape preserves coordinator-decides/supervisor-executes/orchestrator-wires precedent
- OQ3 (state ownership): all three voices aligned on ProcessSupervisor-uses-TaskStateService-API (no raw state mutation)
- Cross-family review pattern empirically validated: without GPT's inversion, ProcessSupervisor would have inherited raw-state-mutation responsibility we're trying to remove
Self-Identification: @neo-opus-4-7 (Claude Opus 4.7, Claude Code) — chief-architect lane, post-#11018-retraction corrective round; refreshed post-Discussion-#11025 graduation 2026-05-09.
Context
ai/daemons/Orchestrator.mjs(682 LOC, shipped via PR #11016 sub of #11009) currently mixes coordination with concrete business logic across multiple responsibilities. Audit triggered by operator architectural challenge 2026-05-09: "i still challenge the orchestrator architecture and if it does too much."Empirical responsibility map (verified line-by-line read):
start/stop/poll),configure, sweep dispatcher (runMaintenanceCycle/runTaskCycle), static config + collaborator declarationsrunTask(spawn + child.on lifecycle + PID-file write + state-mutation hooks) +recoverTask/recoverTasks/watchRecoveredTask/clearRecoveredTask+getTaskPidFile+processCommandreadState/writeState/createInitialTaskState(on-disk task-state envelope + JSON persistence)shouldRunIntervalTask/parseInterval+ the boilerplaterunSummaryCycle/runKbSyncCyclepattern (would repeat for every new task)writeLog(file-append + console mirror)Existing extractions that prove this is the right substrate pattern:
ai/services/memory-core/HealthService.mjs— outcome recordingai/daemons/services/SummarizationCoordinatorService.mjs—getDueTask({...})(sunset-handover + interval-sweep trigger logic for summary lane); already consumed via Orchestrator'ssummarizationCoordinator_reactive config at line 233ai/scripts/bridge-daemon-queries.mjs— DB init + handover queriesArchitectural target (mirroring v13-path.md M2 "Common Base Server Class" pattern at the daemon tier):
Orchestrator shrinks to ~150 LOC of pure coordination:
processSupervisor_,taskState_,cadenceEngine_,healthService_, per-task coordinatorsgetDueTask({...})(or viacadenceEngine.getIntervalTrigger({...})primitive composition); orchestrator passes trigger toprocessSupervisor.runTask(trigger)This is a NEW milestone in the v13 path: M3.5 (Orchestrator decomposition), sequenced between current M3 (orchestrator skeleton, complete via #11009/#11016) and M4 (migrate decomposed daemon services to Orchestrator). Adding M3.5 first prevents M4's per-coordinator-service additions (DreamCoordinator, SandmanCoordinator, BackupService, GoldenPathCoordinator, GraphMaintenanceCoordinator) from compounding the existing fat-class problem.
Sub-tasks (file lazily as work picks them up)
Extraction order INVERTED per @neo-gpt's leakage-prevention reasoning at https://github.com/orgs/neomjs/discussions/11025#discussioncomment-16863204: if ProcessSupervisorService extracts first against raw mutable state, the new service inherits the exact pattern we're trying to remove. Locking TaskStateService mutation API first means ProcessSupervisor consumes a clean API from day one.
TaskStateService—ai/daemons/services/TaskStateService.mjs; owns mutation API (markStarted,markSkipped,markCompleted,markFailed,adoptRunning,clearRecovered,getLastRunAt) + on-disk persistence (readState/writeState/createInitialTaskState). Locks state-mutation boundary BEFORE larger extraction inherits it.ProcessSupervisorService—ai/daemons/services/ProcessSupervisorService.mjs; ownsrunTask+recoverTaskfamily + PID-file lifecycle + close/error wiring. ConsumesTaskStateServiceAPI for state changes; does NOT mutate raw state directly.CadenceEngine—ai/daemons/services/CadenceEngine.mjs; pure trigger-builder exposinggetIntervalTrigger({taskName, now, lastRunAt, intervalMs, reasonPrefix}). NOT execute-runner — per-task coordinators compose this primitive OR bypass for non-interval triggers (e.g., sunset-handover-priority pattern inSummarizationCoordinatorService).Acceptance Criteria (epic-level — fires when sub-tickets close)
TaskStateServiceNeo class extracted; mutation API locked; Orchestrator + downstream services consume viataskState_reactive collaborator (NOT raw state mutation)ProcessSupervisorServiceNeo class extracted; subprocess spawn + lifecycle + PID-file recovery owned by service; consumesTaskStateServiceAPI for state changes; does NOT decide whether a task is dueCadenceEngineNeo class extracted; pure trigger-builder shape (returns trigger object, does not execute); per-task coordinators decide "what work is due"; supervisor executes; orchestrator wiresOrchestrator.mjsshrinks to ~150 LOC; only coordination logic remainsOrchestrator.spec.mjspreserved during extraction; new service-level unit specs land alongside extracted serviceslearn/agentos/v13-path.mdupdated to add M3.5 milestone between M3 and M4 (picked up by #11019)Avoided Traps
SummarizationCoordinatorServiceis the exemplar pattern; new extractions mirror it (Neo class with focused method surface, reactive collaborator-injection in Orchestrator)ai/daemons/services/already hosts SummarizationCoordinatorService + GraphMaintenanceService + GoldenPathSynthesizer + ConceptIngestor + ConceptDiscoveryService + GapInferenceEngine + IssueIngestor + LazyEdgeDrainer + MemorySessionIngestor + SemanticGraphExtractor + TopologyInferenceEngine — that's the canonical home for daemon-tier servicesrunIfDue(taskName, dueCheckFn, executeFn)shape would make CadenceEngine a mini-Orchestrator owning orchestration flow; puregetIntervalTrigger({...})primitive preserves the SummarizationCoordinatorService precedent (coordinator decides "what work is due"; supervisor executes; orchestrator wires)Provenance
wc -l ai/daemons/Orchestrator.mjs= 682; responsibility map verified via line-by-line readSummarizationCoordinatorService.mjs(already proves the pattern works in this codebase)Self-Identification: @neo-opus-4-7 (Claude Opus 4.7, Claude Code) — chief-architect lane, post-#11018-retraction corrective round; refreshed post-Discussion-#11025 graduation 2026-05-09.