LearnNewsExamplesServices
Frontmatter
id11451
titleClean-slate migration: Delete existing substrate + run sync_all (ADR 0004 Phase 1 Task 10)
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-4-7
createdAtMay 16, 2026, 12:53 PM
updatedAtMay 16, 2026, 4:51 PM
githubUrlhttps://github.com/neomjs/neo/issues/11451
authorneo-gemini-3-1-pro
commentsCount0
parentIssue11372
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 16, 2026, 4:51 PM

Clean-slate migration: Delete existing substrate + run sync_all (ADR 0004 Phase 1 Task 10)

Closedenhancementaiarchitecture
neo-gemini-3-1-pro
neo-gemini-3-1-pro commented on May 16, 2026, 12:53 PM

Context

This ticket implements Phase 1 Task 10 of Epic #11372: the clean-slate migration of the resources/content/ substrate to the Universal Ordinal-100 Content Architecture (ADR 0004). The producer and syncer code rewires (Phase 1 Tasks 1-9 via #11381 / #11403 / #11390 / #11407 / #11392 / #11409 / #11387) have all merged; this task delivers the data-side migration + clean-slate exhaustive emission contract in the syncer code.

The Problem

The legacy resources/content/ substrate uses retired GH-ID-stream <NNN>xx/ chunking + flat discussions + pre-Option-G release-notes shape. ADR 0004 anchors the solution on a single chunk-N/ universal ordinal primitive + _index.json map. The code changes are complete on origin/dev, but the on-disk substrate is still in the mixed/transitional shape that the syncers cannot incrementally migrate without dropping the cache.

The Architectural Reality (updated for PR #11461 Cycle 2)

The operator-approved migration boundary is two-phase:

  1. PR-merge phase (this ticket; landed via PR #11461 + fixup commit): commit the mass deletion + syncer-cap removal atomically. After merge, resources/content/ contains only concepts/ + sandman_handoff.md symlink; .sync-metadata.json + _index.json are absent.
  2. Operator-side post-merge phase (operator runs sync_all / npm run ai:orchestrator locally): the now-uncapped syncers rehydrate the full corpus from GitHub source-of-truth, emitting ordinal-100 chunk shape natively per contentPath.mjs.

This split is what makes the "no migration scripts" framing per ADR 0004 §3.6 mechanically true — the syncer logic IS the migration engine, executed by the operator post-merge.

The Fix

Per ADR 0004 §3.6 (clean-slate purge) + §1.3 (regeneratable-cache strategic principle):

  1. Delete resources/content/{issues, pulls, discussions, release-notes, archive}/* + .sync-metadata.json + _index.json in a single atomic PR commit.
  2. Remove the hard 200-cap from PullRequestSyncer + DiscussionSyncer so clean-slate emission paginates exhaustively via GraphQL cursor.
  3. Change IssueSyncer clean-slate since: fallback to a pre-Neo date so pre-2025 dormant issues re-emit on first sync.
  4. Bump maxIssues (10000 → 20000) for defensive headroom — Neo currently has 8,502 issues + 2,816 PRs (V-B-A'd 2026-05-16 via GitHub search per @neo-gpt PR #11461 Cycle 2 review PRR_kwDODSospM8AAAABAIZAdg); ~135% headroom on issues with room for growth.
  5. Operator runs sync_all post-merge; substrate rehydrates at correct shape.

Acceptance Criteria (PR-merge phase)

  • Legacy resources/content/{issues, pulls, discussions, release-notes, archive}/* is deleted (6,673 files).
  • resources/content/.sync-metadata.json is deleted.
  • resources/content/_index.json is deleted.
  • resources/content/concepts/ is preserved (separate substrate, out of ADR 0004 scope).
  • resources/content/sandman_handoff.md symlink is preserved (separate substrate).
  • PullRequestSyncer 200-cap removed; clean-slate exhaustive emission via GraphQL cursor pagination.
  • DiscussionSyncer 200-cap removed; same.
  • IssueSyncer clean-slate since: fallback set to '2017-01-01T00:00:00Z' (pre-Neo) so GraphQL fetches full repo history.
  • maxIssues config bumped 10000 → 20000 for defensive headroom.
  • Github-workflow unit specs verified passing (use temp dirs; cap-removal doesn't affect them).

Residual / Post-Merge Validation (operator-side, NOT pre-merge ACs)

  • After merge, operator runs sync_all (e.g., npm run ai:orchestrator or sync_all MCP) in main checkout.
  • Verify resources/content/ repopulates with ordinal-100 chunk shape — no <NNN>xx/ GH-ID-stream folders survive.
  • Verify _index.json regenerates correctly post-sync.
  • Verify all expected content types repopulate: issues/chunk-*/, pulls/chunk-*/, discussions/chunk-*/, release-notes/chunk-*/, archive/{type}/v<X>/chunk-*/.
  • Verify discussion + PR counts post-sync match expected (e.g., 948 tracked pulls → 948 emitted post-sync, not 200).

Out of Scope

  • Code changes to LocalFileService, path primitives, or other downstream consumers (completed in earlier Phase 1 tickets).
  • Phase 2 SEO / Portal app rewires.
  • archiveVersion carry-forward retirement (deferred follow-up; clean-cut + syncer fixes make it mechanically redundant but explicit removal is a separate cleanup).
  • Regression tests for clean-slate exhaustive emission with multi-page GraphQL mock fixtures (deferred follow-up — substantial enough to warrant its own scope).

Avoided Traps

  • Attempting to preserve git history with complex file-move scripts. Rejected by operator: "delete it all => clean slate". We use the sync pipeline as the migration engine.
  • Committing recreated content in the migration PR. The ACs were initially structured this way but updated 2026-05-16 per operator-approved Cycle 2 boundary: PR commits deletion + syncer-cap removal atomically; rehydration runs post-merge via operator-side sync_all.
  • Bundling speculative-support fixes. archiveVersion retirement is correct but mechanically post-merge-redundant via clean-cut; deferred to follow-up rather than scope-creeping this PR.

Related

  • Epic: #11372
  • Authority: ADR 0004 (0004-github-content-architecture.md) — §1.3 regeneratable-cache + §2.4 sealed-chunk via prevent-reopen.yml + §3.6 clean-slate purge + §9 item 10
  • Implementation PR: #11461
  • Sister Phase 1 PRs (merged): #11381 (contentPath.mjs) / #11403 (Lane B syncers) / #11390 (LocalFileService) / #11407 (ReleaseNotesSyncer) / #11392 (Lane C consumer) / #11409 (stale-ref cleanup) / #11387 (config audit)
  • Superseded this session: #11364 + PR #11458 (narrow archiveVersion retirement) — closed Drop+Supersede; substantive work folded into #11451's broader scope here.

Origin Session ID: 188acb85-b41e-435c-94ee-0cc9944d4c97 Body updated 2026-05-16 per operator-approved PR #11461 Cycle 2 boundary (GPT review PRR_kwDODSospM8AAAABAIXzdA). Retrieval Hint: "clean-slate migration ADR 0004 Task 10 syncer-cap exhaustive emission"

tobiu closed this issue on May 16, 2026, 4:51 PM