Context
Operator observation on 2026-05-16 showed npm run ai:orchestrator running a large Knowledge Base sync and logging two separate noise classes:
- repeated
Skipping knowledge base sync; task already running (PID: 2082). lines every poll while the existing sync child was still active
- normal KB embedding progress emitted through the daemon as
[ERROR] [ProcessSupervisor] knowledge base sync stderr: [LOG] Processed and embedded batch 83 of 237
The embedding model/vector-dimension question is already closed separately: qwen3-8b @ 4096 dimensions is confirmed by healthcheck, direct endpoint smoke, and Chroma sample inspection. This ticket is only about daemon observability quality while that sync runs.
The Problem
ProcessSupervisorService currently treats all child stderr data as daemon-level ERROR, even when the child is using stderr as its normal log channel. That is correct enough for MCP/harness transport visibility, but it is wrong at the daemon supervisor boundary: an informational child line should not become an operator-facing daemon error.
The repeated already-running skip line is also low-value noise during long-running child tasks. It confirms the singleton guard is working, but printing the same message every orchestrator poll obscures real progress and real failures.
The Architectural Reality
ai/daemons/services/ProcessSupervisorService.mjs:195 detects a task already running and logs an INFO skip every invocation.
ai/daemons/services/ProcessSupervisorService.mjs:207 spawns child tasks with stderr piped.
ai/daemons/services/ProcessSupervisorService.mjs:209 forwards every stderr chunk as writeLog('ERROR', ...) regardless of child log prefix.
ai/daemons/TaskDefinitions.mjs:72 defines kbSync as a child process label knowledge base sync, so buildScripts/ai/syncKnowledgeBase.mjs progress lines flow through this supervisor path.
ai/daemons/Orchestrator.mjs:295 uses the orchestrator poll cadence, so already-running state is checked repeatedly while a long child task is active.
Existing adjacent tickets are not equivalent:
- #10576 / #10579 handled durable KB sync logging and are closed.
- #10088 is broader post-merge KB sync automation.
- #9919 / #9920 cover daemonization and re-queueing.
- #11075 is deferred config cleanup.
The Fix
Add a narrow ProcessSupervisorService logging policy for child task output:
- Classify child stderr lines by explicit child log prefix before forwarding to daemon logs. Examples:
[LOG] and [INFO] should map to daemon INFO; [WARN] to WARN; [ERROR] or unclassified stderr can remain ERROR unless a stronger local convention exists.
- Preserve stderr capture. The child process should continue piping stderr so MCP/server/script log channels remain observable.
- Throttle or dedupe the already-running skip line per task/PID/reason window so a long KB sync does not emit identical skip messages every poll.
- Keep failure semantics intact: non-zero exits, spawn failures, and real child error lines still surface as ERROR and still record failed task outcomes.
- Add focused unit coverage in
test/playwright/unit/ai/daemons/services/ProcessSupervisorService.spec.mjs for classification and skip dedupe/throttle.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
ProcessSupervisorService child stderr forwarding |
This ticket + operator 2026-05-16 log sample |
Map explicit child log prefixes to daemon severity before writing supervisor logs |
Unclassified stderr remains ERROR |
JSDoc on helper methods if non-obvious |
Unit tests for [LOG], [INFO], [WARN], [ERROR], and unclassified stderr |
| Already-running skip logging |
This ticket + ProcessSupervisorService.runTask() current behavior |
Emit the first skip for a task/PID/reason and suppress duplicate skip log spam while the same child remains active |
Continue recording skipped outcomes for health visibility |
Inline comment only if the state key is not obvious |
Unit test repeated runTask() calls with same running state |
| Failure semantics |
Existing ProcessSupervisorService child lifecycle behavior |
Spawn failure, child error event, non-zero close, and explicit [ERROR] child lines remain ERROR |
Existing task outcome recording stays authoritative |
Existing method JSDoc |
Existing tests plus new severity test |
Acceptance Criteria
Out of Scope
- Changing the embedding provider, vector dimensions, or KB content sync semantics.
- Removing stderr from MCP servers or CLI scripts.
- Reworking the full logging framework or introducing a new structured logger.
- Changing orchestrator poll cadence or KB sync interval defaults.
- Implementing post-merge auto-sync; that belongs to existing automation tickets.
Related
- #10576 — KB sync observability: tee MCP child logger output to
.neo-ai-data/logs/ for tail-able progress
- #10579 — KB Sync: Implement stderr observability (Shape A)
- #10088 — Automate post-merge knowledge-base sync trigger
- #9919 — Implement fs.watch Daemonization for Autonomous Orchestrator
- #9920 — Agent Error Recovery & Re-Queueing for Orchestrator
- #11075 — Migrate orchestrator magic numbers + retention constants to config
Origin Session ID: 6ec143cb-2e5b-4964-94d6-eb28cb25bde2
Handoff Retrieval Hint: query_raw_memories(query="orchestrator ProcessSupervisor knowledge base sync stderr LOG batch task already running PID skip noise child stderr severity classification")
Context
Operator observation on 2026-05-16 showed
npm run ai:orchestratorrunning a large Knowledge Base sync and logging two separate noise classes:Skipping knowledge base sync; task already running (PID: 2082).lines every poll while the existing sync child was still active[ERROR] [ProcessSupervisor] knowledge base sync stderr: [LOG] Processed and embedded batch 83 of 237The embedding model/vector-dimension question is already closed separately: qwen3-8b @ 4096 dimensions is confirmed by healthcheck, direct endpoint smoke, and Chroma sample inspection. This ticket is only about daemon observability quality while that sync runs.
The Problem
ProcessSupervisorServicecurrently treats all childstderrdata as daemon-levelERROR, even when the child is using stderr as its normal log channel. That is correct enough for MCP/harness transport visibility, but it is wrong at the daemon supervisor boundary: an informational child line should not become an operator-facing daemon error.The repeated already-running skip line is also low-value noise during long-running child tasks. It confirms the singleton guard is working, but printing the same message every orchestrator poll obscures real progress and real failures.
The Architectural Reality
ai/daemons/services/ProcessSupervisorService.mjs:195detects a task already running and logs an INFO skip every invocation.ai/daemons/services/ProcessSupervisorService.mjs:207spawns child tasks with stderr piped.ai/daemons/services/ProcessSupervisorService.mjs:209forwards every stderr chunk aswriteLog('ERROR', ...)regardless of child log prefix.ai/daemons/TaskDefinitions.mjs:72defineskbSyncas a child process labelknowledge base sync, sobuildScripts/ai/syncKnowledgeBase.mjsprogress lines flow through this supervisor path.ai/daemons/Orchestrator.mjs:295uses the orchestrator poll cadence, so already-running state is checked repeatedly while a long child task is active.Existing adjacent tickets are not equivalent:
The Fix
Add a narrow
ProcessSupervisorServicelogging policy for child task output:[LOG]and[INFO]should map to daemon INFO;[WARN]to WARN;[ERROR]or unclassified stderr can remain ERROR unless a stronger local convention exists.test/playwright/unit/ai/daemons/services/ProcessSupervisorService.spec.mjsfor classification and skip dedupe/throttle.Contract Ledger Matrix
ProcessSupervisorServicechild stderr forwarding[LOG],[INFO],[WARN],[ERROR], and unclassified stderrProcessSupervisorService.runTask()current behaviorrunTask()calls with same running stateProcessSupervisorServicechild lifecycle behavior[ERROR]child lines remain ERRORAcceptance Criteria
[LOG]or[INFO]are written as daemon INFO, not ERROR.[WARN]are written as daemon WARN.[ERROR]and unclassified stderr continue to be written as daemon ERROR.Out of Scope
Related
.neo-ai-data/logs/for tail-able progressOrigin Session ID: 6ec143cb-2e5b-4964-94d6-eb28cb25bde2
Handoff Retrieval Hint: query_raw_memories(query="orchestrator ProcessSupervisor knowledge base sync stderr LOG batch task already running PID skip noise child stderr severity classification")