LearnNewsExamplesServices
Frontmatter
id11017
titleOrchestrator task: primary-checkout `dev` auto-sync + KB cascade
stateClosed
labels
enhancementaiarchitecturemodel-experience
assigneesneo-gpt
createdAtMay 9, 2026, 3:07 PM
updatedAtMay 12, 2026, 4:08 AM
githubUrlhttps://github.com/neomjs/neo/issues/11017
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[x] 11009 Move Orchestrator logic into Neo daemon class
blocking[]
closedAtMay 10, 2026, 7:04 PM

Orchestrator task: primary-checkout dev auto-sync + KB cascade

Closedenhancementaiarchitecturemodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 9, 2026, 3:07 PM

Context

Filed 2026-05-09 from a daemon-coordination scope-capture pass with @tobiu (deferred from a Discussion path because daemon-architecture work doesn't need ideation-sandbox bandwidth — the substrate-truth is clear and peers will engage at PR-review time). This is Shape A from the brainstorm; Shape B (interim discipline-only sunset probe) shipped via #11013 → PR #11015.

Closes the mid-session staleness gap that Shape B couldn't reach. The operator's primary checkout (/path/to/neo/) hosts the canonical Agent OS daemon stack: orchestrator-daemon (post-#11009 Neo class) + sibling bridge-daemon (wake delivery) + DreamService (ingestion) + KB sync pipeline. All read pre-merge code while primary's dev lags origin/dev. The agent's UX target: after every PR merge (and every gh-workflow MCP sync_all chore commit, and every hourly data-sync-pipeline.yml push), the KB and downstream daemons should reflect the new state within minutes.

Architectural fit: orchestrator-daemon is the canonical learn/agentos/v13-path.md M3 home for "Scheduled Agent OS maintenance triggers". primary-dev-sync is exactly that class of trigger — periodic, observable, failure-isolated. Adding it as one more registered task post-#11009's class extraction is a clean substrate fit.

The Problem

Even with Shape B's sunset probe shipping (#11013), the discipline-only path has three gaps:

  1. Mid-session staleness: sunset fires at session end; if a PR lands mid-session and primary stays stale until next sunset, all daemons read pre-merge code in the interim.
  2. Operator-action dependency: Shape B emits a warning; operator must remember to run git -C $PRIMARY_ROOT pull origin dev. The friction-to-gold target is zero-operator-action.
  3. KB freshness gap: even when operator pulls, KB doesn't auto-refresh. npm run ai:sync-kb must be triggered separately.

The Shape A target: orchestrator-daemon registers a primary-dev-sync periodic task that detects origin/dev advance, FF-pulls primary, and cascades the KB sync — closing all three gaps in one task.

The Architectural Reality

Daemon ownership boundary (per learn/agentos/v13-path.md M3 + #11008 + #11009):

  • orchestrator-daemon = canonical scheduled-maintenance daemon (post-#11009 Neo class with task scheduling)
  • bridge-daemon = wake delivery only (per its JSDoc — out of scope for pull discipline)
  • DreamService = ingestion (consumed by orchestrator)

Existing substrate this task leverages:

  • Orchestrator.mjs Neo class (created by #11009) — task registration table
  • HealthService.recordTaskOutcome(taskName, status, details=null) (added by #11009) — task observability
  • npm run ai:sync-kbbuildScripts/ai/syncKnowledgeBase.mjsKB_DatabaseService.syncDatabase() — content-hash-aware delta-effective sync (only re-embeds changed/new docs)
  • git rev-parse --git-common-dir — resolves primary checkout path from any worktree
  • resources/content/.sync-metadata.json — gh-workflow MCP sync_all authoritative-write target (the file the "accept-theirs" Layer 2 rule applies to)

Trigger choice — polling on origin-advance, not PR-merge webhook: unified git fetch origin dev advance-detection catches PR merges + data-sync-pipeline.yml pushes (with [skip ci]) + gh-workflow sync_all chore commits. PR-merge-webhook is too narrow; advance-detection is the substrate-grounded trigger. Webhook can be a future enhancement if sub-second freshness becomes valuable; 5-10 min polling is sufficient for the "post-merge KB freshness" UX target.

File surface:

  • New: ai/daemons/services/PrimaryRepoSyncService.mjs (Neo class, sibling pattern with ai/daemons/services/SummarizationCoordinatorService.mjs from #11009structural-pre-flight Stage 1 fast-path applies per #11010)
  • Modified: Orchestrator.mjs (registration row in task table)
  • Modified: package.json (env var documentation only — no new script entries needed)
  • Modified: learn/agentos/v13-path.md (M3 task-list update reflecting new task)

The Fix

Polling shape

Loop every NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_INTERVAL_MS (default 600000 = 10 min):
    1. Verify primary on `dev` branch (skip-with-log if not)
    2. Verify no in-flight sync (skip-with-log if running; singleton lock)
    3. git -C $PRIMARY_ROOT fetch origin dev --quiet
    4. BEHIND = git rev-list --count dev..origin/dev
    5. If BEHIND == 0: skip cycle (1s no-op typical)
    6. If BEHIND > 0:
       Layer 1 — git pull --ff-only origin dev
         Success → cascade KB sync, recordTaskOutcome(success, {pulled: BEHIND})
         Failure → drop to Layer 2
       Layer 2 — narrow theirs-rule for resources/content/.sync-metadata.json
         git checkout -- resources/content/.sync-metadata.json
         git pull --ff-only origin dev
         Success → cascade KB sync, recordTaskOutcome(success, {pulled: BEHIND, resolved: 'meta-sync'})
         Failure → drop to Layer 3
       Layer 3 — Halt + operator-visible warning
         emit_operator_warning("primary-dev-sync skipped: non-FF divergence on non-meta-sync files")
         recordTaskOutcome(skipped, {reason: 'non-FF-divergence', behind: BEHIND, files: $diverged})

KB sync cascade (after successful Layer 1 or Layer 2 pull)

( cd $PRIMARY_ROOT && npm run ai:sync-kb )
<h1 class="neo-h1" data-record-id="8">Single-run; if next interval fires while sync is running, the singleton lock</h1>

<h1 class="neo-h1" data-record-id="9">from step 2 prevents overlap. KB sync at the service layer is content-hash-aware</h1>

<h1 class="neo-h1" data-record-id="10">(no --since flag needed); only re-embeds changed/new docs.</h1>

Configuration

Env var Default Purpose
NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_INTERVAL_MS 600000 (10 min) Polling cadence
NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_ENABLED true Operator one-shot disable for debugging

10-min default chosen because:

  • KB delta-sync at service-layer is content-hash-aware → most cycles are 1s no-ops
  • Cadence captures PR-merge UX target (KB fresh within 10 min of merge)
  • Lower-frequency than 60s reduces background-process noise
  • Configurable for operators who want sub-minute freshness

Sunset hook (closes Shape B's discipline path)

Once Shape A ships and verifies (3 successful primary-dev-sync cycles in the wild):

  • The conditional staleness-probe in .agents/skills/session-sunset/references/session-sunset-workflow.md (added by #11013/PR #11015) retires
  • One-line skill-cleanup PR closes the loop
  • AC includes filing the cleanup PR alongside Shape A's PR (or as a tiny follow-up)

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
ai/daemons/services/PrimaryRepoSyncService.mjs This ticket + v13-path.md M3 Neo class implementing the polling loop + 3-layer pull + KB cascade None — singleton service JSDoc + v13-path.md M3 task list update Unit tests verify each layer (FF success, theirs-resolve, halt warning)
Orchestrator.mjs task registration Post-#11009 Orchestrator.mjs One row added to task table; respects existing recordTaskOutcome envelope If Orchestrator class instantiation fails, daemon already halts (per #11009 contract) Orchestrator JSDoc Unit test verifies task registers + fires
npm run ai:sync-kb cascade Existing KB_DatabaseService.syncDatabase() contract Triggered post-pull from primary checkout context Manual npm run ai:sync-kb remains operator escape hatch learn/agentos/MemoryCore.md if affected Empirical: PR merges into dev → within 10 min KB returns the new content
resources/content/.sync-metadata.json Layer 2 resolution This ticket + operator framing When pull conflicts ONLY on this file, accept origin's version Layer 3 halt if other files conflict JSDoc on PrimaryRepoSyncService Unit test simulates meta-sync conflict + verifies theirs-resolve
Operator-visible Layer 3 warning surface This ticket Emit to orchestrator log + (future) bridge-daemon notification None — operator action required JSDoc Test verifies warning fires when non-meta-sync files diverge

Acceptance Criteria

  • AC1: ai/daemons/services/PrimaryRepoSyncService.mjs created as a Neo class. JSDoc covers class + load-bearing methods. structural-pre-flight Pre-Flight statement emitted in commit message body or PR comment per #11010 discipline.
  • AC2: Service registered as a task in Orchestrator.mjs with the primary-dev-sync name; task fires on the configured interval.
  • AC3: Path-detection mechanism — git rev-parse --git-common-dir from daemon's __dirname resolves primary-checkout path correctly when daemon runs from primary AND when daemon runs from a worktree (test both invocation paths).
  • AC4: Branch verification — task skips with debug log if primary's HEAD is not dev.
  • AC5: Singleton lock — overlapping sync invocations don't fire (test: trigger second invocation while first is running, verify second is skipped with log).
  • AC6: Layer 1 (FF pull) success path triggers KB sync cascade and records success outcome via HealthService.recordTaskOutcome(...).
  • AC7: Layer 2 (narrow theirs-resolve on resources/content/.sync-metadata.json) success path triggers KB sync cascade and records success outcome with details.resolved = 'meta-sync'.
  • AC8: Layer 3 (halt + warning) on non-FF non-meta-sync divergence: no pull attempted, no KB sync triggered, warning emitted, outcome recorded as skipped with details.reason = 'non-FF-divergence'.
  • AC9: KB sync cascade runs from primary checkout (cwd = $PRIMARY_ROOT); content-hash-aware service-layer delta-effective behavior verified (cycle on a fresh primary takes <2s; cycle with new commits takes longer per delta).
  • AC10: Configurable via NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_INTERVAL_MS (default 600000) and NEO_ORCHESTRATOR_PRIMARY_DEV_SYNC_ENABLED (default true).
  • AC11: Documentation updates: learn/agentos/v13-path.md M3 task list updated to reflect the new task; learn/agentos/DeploymentCookbook.md env var table extended with the two new env vars (per the precedent #10969 / PR #11000 established).
  • AC12: Unit tests cover: each of the 3 layers, the singleton lock, the path-detection (worktree + primary), the branch-verification skip, the env-var overrides, and the KB cascade trigger.
  • AC13: Sunset-discipline retirement — once Shape A verifies in production (3 successful cycles), the conditional staleness-probe in .agents/skills/session-sunset/references/session-sunset-workflow.md (added by #11013 / PR #11015) retires. Cleanup PR can land alongside Shape A's PR OR as a tiny follow-up — operator's preference.
  • AC14: No new file added to ai/scripts/ for this work — all logic lives in ai/daemons/services/PrimaryRepoSyncService.mjs per the directory-CHOICE discipline #10449 / #11009 established.

Out of Scope

  • PR-merge webhook listener — future enhancement if sub-second freshness becomes valuable. Pre-merge UX is solved by the 10-min polling cadence; webhook adds HTTP-listener infrastructure (port management, GitHub webhook config, secret rotation) that 10-min polling avoids.
  • Mid-PR-cycle staleness detection — this task triggers post-PR-merge naturally via origin-advance polling. Pre-merge state divergence (e.g., a dev branch diverging during a long-running PR review) is operator's git discipline, not orchestrator's.
  • Selective subsystem KB sync — the cascade triggers full npm run ai:sync-kb (delta-effective at service layer). If selective re-embedding becomes a perf concern, separate ticket extends KB_DatabaseService.syncDatabase() with a path-prefix scope filter.
  • Cross-worktree primary-detection — the daemon assumes it runs from primary OR from one of primary's worktrees (since git rev-parse --git-common-dir works from both). Daemons running from external clones (e.g., a forked repo's clone) are not addressed.
  • Bridge-daemon scope expansion — bridge-daemon stays focused on wake delivery only per its JSDoc. Pull discipline lives in orchestrator-daemon, not bridge-daemon.
  • Conflict resolution beyond the Layer 1/2/3 ladder — Layer 3's halt-with-warning is the safe-rule terminal state. Auto-merge or auto-rebase beyond Layer 2 risks data-loss; operator action is the right escalation.
  • Reverting Shape B's sunset probe in the same PR — explicitly left as a separate cleanup-PR (AC13) so verification of Shape A is empirically observable before retirement.

Avoided Traps

  • PR-merge webhook as primary trigger (rejected): too narrow. Origin-advance polling catches PR merges AND data-sync-pipeline pushes AND gh-workflow sync_all commits in one unified mechanism.
  • Layer 2 "accept theirs" applied broadly (rejected): would silently destroy legitimate local divergences elsewhere. Narrow rule on resources/content/.sync-metadata.json only is substrate-truth-correct (gh-workflow sync_all is the authoritative writer of that file).
  • Auto-merge or auto-rebase as Layer 3 (rejected): conflict-resolution beyond Layer 2 is operator territory. Daemon halts + warns rather than guess.
  • Cron-based polling outside Orchestrator (rejected): violates v13-path.md M3 + the existing #11009 architectural posture. Orchestrator-daemon is the canonical home.
  • Bundling Shape B retirement into Shape A's PR (rejected per AC13): empirical verification of Shape A first, then retire Shape B. Sequencing matters for substrate-truth durability.
  • Adding new logic to ai/scripts/ (rejected per AC14): violates the directory-CHOICE discipline that #10449 / #11009 established. All daemon-coordination logic lives in ai/daemons/.
  • Webhook + polling hybrid as MVP (rejected): adds complexity for sub-second freshness UX target that 10-min polling already satisfies.

Related

  • Prerequisite (blocks-by): #11009 — Move Orchestrator logic into Neo daemon class. Blocks because Shape A registers as a task in the post-#11009 Orchestrator.mjs Neo class + uses the post-#11009 HealthService.recordTaskOutcome(...) surface.
  • Sibling Shape B: #11013 — sunset-time discipline-only path (interim). Shape A retires Shape B's probe per AC13.
  • Architectural source: learn/agentos/v13-path.md §M3 (Orchestrator daemon as canonical scheduled-maintenance home).
  • Structural-pre-flight discipline: #10449 → PR #11010 (skill that gates new .mjs directory choice; AC1 explicitly fires for PrimaryRepoSyncService.mjs).
  • KB delta-effective substrate: buildScripts/ai/syncKnowledgeBase.mjs + KB_DatabaseService.syncDatabase() content-hash-aware sync.
  • Pipeline reconciliation context: .github/workflows/data-sync-pipeline.yml (hourly [skip ci] push to dev that this task naturally captures via origin-advance detection).
  • Meta-sync authoritative-writer: gh-workflow MCP sync_all action, writing resources/content/.sync-metadata.json.

Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1

Retrieval Hint: query_raw_memories(query="orchestrator daemon primary-dev-sync KB cascade auto-pull #11009 dependency Shape A polling 10 minute interval origin-advance fetch FF-pull theirs-resolve meta-sync layer 1 2 3 conflict resolution")

tobiu referenced in commit 7fa2c7d - "feat(ai): add primary dev sync orchestrator task (#11017) (#11130) on May 10, 2026, 7:04 PM
tobiu closed this issue on May 10, 2026, 7:04 PM