LearnNewsExamplesServices
Frontmatter
id10432
titlebootstrapWorktree --link-data: granular per-subdir symlinking
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-4-7
createdAtApr 27, 2026, 6:05 PM
updatedAtApr 27, 2026, 6:15 PM
githubUrlhttps://github.com/neomjs/neo/issues/10432
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtApr 27, 2026, 6:15 PM

bootstrapWorktree --link-data: granular per-subdir symlinking

Closedenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on Apr 27, 2026, 6:05 PM

Context

Follow-up to closed predecessor #10224 (which established the coarse-grained .neo-ai-data/ symlink as the cross-worktree data unification primitive) and #10424 (the cross-process coherence gap empirically diagnosed across this 2026-04-27 session-arc).

bootstrapWorktree.mjs --link-data currently symlinks the entire .neo-ai-data/ directory atomically. With --force, it fs.rms the existing dir recursively before symlinking. This works for a fresh worktree but breaks for any worktree that:

  1. Has the git-tracked .neo-ai-data/concepts/ directory present (every worktree, since it's tracked) → the symlink hides the worktree's own tracked files behind canonical's
  2. Has been previously fixed via unlink + git checkout to restore concepts/ → the data-link is now broken (regular dirs for sqlite/, chroma/ instead of symlinks), causing cross-process coherence drift between MCP servers and the bridge daemon

This session-arc was the empirical anchor: my MCP server's list_messages repeatedly missed messages from @neo-gemini-3-1-pro that raw SQL on canonical confirmed existed; bridge daemon delivered to phantom WAKE_SUB IDs that no MCP-server view contained.

The Problem

The .gitignore boundary inside .neo-ai-data/:

.neo-ai-data
!.neo-ai-data/concepts/

Everything under .neo-ai-data/ is gitignored EXCEPT concepts/ which IS tracked. The bootstrap's all-or-nothing symlink can't honor this distinction:

  • All-symlink → tracked concepts/ becomes invisible (replaced by canonical's view)
  • All-regular-dir (after the manual unlink + git checkout reversal) → gitignored substrate-data subdirs (sqlite/, chroma/, wake-daemon/, etc.) get isolated per-worktree, breaking cross-process coherence

The empirical pattern observed this session-arc: 11 worktrees on disk, only 1 symlinked correctly (peaceful-pare-9cbfcb); 10 had regular dirs and 0 (or stale) WAKE_SUBSCRIPTION nodes in their isolated DBs. MCP servers in those worktrees couldn't see canonical's live state.

The Architectural Reality

  • ai/scripts/bootstrapWorktree.mjs — the canonical worktree-init substrate. Already has BOOTSTRAP_CONFIGS (the explicit allowlist for config.mjs files) — the same shape applies cleanly to data subdirs.
  • symlinkDataDir({dir = '.neo-ai-data'}) — current shape: single-dir all-or-nothing symlink with force clobber.
  • test/playwright/unit/ai/scripts/bootstrapWorktree.spec.mjs — existing test surface; the granular update needs symmetric coverage.
  • Canonical .neo-ai-data/ substrate-data subdirs (gitignored): sqlite/, chroma/, wake-daemon/, backups/, datasets/, neo-sqlite/. Plus tracked-only: concepts/. Plus a top-level memory-core.sqlite placeholder file (empty, harmless).

wake-daemon/ symlinking is critical for #10423's PID-lock singleton enforcement to span worktrees — currently each worktree has its own wake-daemon/bridge-daemon.pid, so daemons spawned from different worktrees can't see each other's locks. Same logic for bridge.log (#10425 persistent log substrate).

The Fix

Replace the single-dir symlinkDataDir with a granular per-subdir version:

export const DATA_SUBDIRS_TO_LINK = [
    'sqlite',       // Memory Core graph DB
    'chroma',       // vector DBs (KB + memory-core)
    'wake-daemon',  // PID-lock + bridge.log + lastSyncId — unifies singleton across worktrees (#10423/#10425)
    'backups',      // JSONL backups
    'datasets',     // canonical CSVs
    'neo-sqlite'    // legacy DB (still 329MB / referenced)
];

export async function symlinkDataDir({
    mainCheckout, projectRoot,
    subdirs = DATA_SUBDIRS_TO_LINK,
    force = false, log = console.log
}) {
    // Ensure .neo-ai-data/ exists as a regular dir; never symlink the parent.
    // For each subdir in the allowlist:
    //   - lstat dst; if symlink → 'already-linked' (idempotent)
    //   - if non-symlink dir + canonical lacks subdir → 'skip-no-source'
    //   - if non-symlink dir + force=false → throw (data-loss guard preserved)
    //   - if non-symlink dir + force=true → recursive rm → symlink
    //   - else (no dst) → mkdir parent → symlink
    // concepts/ is never in the allowlist → never touched.
    // Returns per-subdir result map: { sqlite: 'linked', chroma: 'already-linked', ... }
}

Touched surface:

  • ai/scripts/bootstrapWorktree.mjs — function signature change + new DATA_SUBDIRS_TO_LINK export + per-subdir loop
  • test/playwright/unit/ai/scripts/bootstrapWorktree.spec.mjs — extend existing tests, add cases for: idempotent re-link per-subdir, refuse-clobber-without-force per-subdir, force-clobber per-subdir, skip-no-source-subdir, concepts/ never touched

Acceptance Criteria

  • symlinkDataDir accepts subdirs allowlist (defaults to DATA_SUBDIRS_TO_LINK)
  • concepts/ is NEVER in the default allowlist (cannot be clobbered by --force accidentally)
  • Idempotent per-subdir: existing symlink → skip; existing real-dir → guard or clobber per force flag
  • --force clobbers only listed subdirs, never concepts/ or other unlisted paths
  • Returns per-subdir result map (not single string) for granular diagnostic visibility
  • Test coverage: idempotent re-link, force-clobber, refuse-clobber, skip-no-source, concepts/-untouched
  • CLI mode (node ai/scripts/bootstrapWorktree.mjs --link-data) prints per-subdir result lines
  • JSDoc updated to reflect granular semantics + the gitignore-boundary rationale

Out of Scope

  • Auto-discovery of subdirs via git check-ignore (deferred — hardcoded allowlist is more conservative and the substrate-data subdir set is small + stable; revisit if churn warrants)
  • Migration script for existing broken worktrees (manual rm -rf + ln -s is one-shot; no need for tooling)
  • Changes to the parent .neo-ai-data/ directory itself (it's left as a regular dir; only its substrate-data children are symlinked)
  • Changes to concepts/ synchronization across worktrees (it's git-tracked; works through normal git mechanics)
  • Bridge-daemon-side changes to handle non-symlinked deployments (out of scope; this fix addresses the substrate, not the consumer)

Avoided Traps

  • Trap: Auto-discover via git check-ignore per-subdir at runtime. Avoided: Adds dependency on git invocation in a path that should be fast and deterministic; hardcoded allowlist is more transparent and a small list is fine.
  • Trap: Symlink the parent .neo-ai-data/ and special-case concepts/ afterward. Avoided: Leaves a window where concepts/ is wrong; granular per-subdir symlinking is cleaner.
  • Trap: Discriminate based on git-tracked status programmatically (e.g., git ls-files filter). Avoided: Coupling the bootstrap to git-tracking introspection adds overhead; the gitignore boundary is stable enough that an explicit allowlist captures the right shape with one source of truth.

Related

  • Closed predecessor: #10224 (coarse-grained symlink that this refines)
  • Related substrate bug: #10424 (cross-process coherence gap that the granular fix unblocks)
  • Singleton enforcement that depends on wake-daemon/ symlinking: #10423
  • Persistent log substrate that depends on wake-daemon/ symlinking: #10425
  • Bootstrap script @see chain: #10095, #10176

Origin Session ID: 594d8e82-3c69-4d66-8038-fcba9a96efa7

Retrieval Hint: "bootstrapWorktree granular subdir symlink concepts gitignore boundary"

tobiu referenced in commit ee24826 - "feat(bootstrap): granular per-subdir symlinking in --link-data (#10432) (#10433) on Apr 27, 2026, 6:15 PM
tobiu closed this issue on Apr 27, 2026, 6:15 PM
tobiu referenced in commit 63ccfd5 - "feat(bootstrap): support independent-clone canonical via --canonical-root (#10435) (#10436) on Apr 27, 2026, 8:09 PM
tobiu referenced in commit 50dbcfa - "docs(agentos): codify SKILL.md anatomy and authoring contract as ADR 0008 (#11427) (#11428) on May 16, 2026, 12:05 PM