Context
This ticket implements Phase 1 Task 10 of Epic #11372: the clean-slate migration of the resources/content/ substrate to the Universal Ordinal-100 Content Architecture (ADR 0004). The producer and syncer code rewires (Phase 1 Tasks 1-9 via #11381 / #11403 / #11390 / #11407 / #11392 / #11409 / #11387) have all merged; this task delivers the data-side migration + clean-slate exhaustive emission contract in the syncer code.
The Problem
The legacy resources/content/ substrate uses retired GH-ID-stream <NNN>xx/ chunking + flat discussions + pre-Option-G release-notes shape. ADR 0004 anchors the solution on a single chunk-N/ universal ordinal primitive + _index.json map. The code changes are complete on origin/dev, but the on-disk substrate is still in the mixed/transitional shape that the syncers cannot incrementally migrate without dropping the cache.
The Architectural Reality (updated for PR #11461 Cycle 2)
The operator-approved migration boundary is two-phase:
- PR-merge phase (this ticket; landed via PR #11461 + fixup commit): commit the mass deletion + syncer-cap removal atomically. After merge,
resources/content/ contains only concepts/ + sandman_handoff.md symlink; .sync-metadata.json + _index.json are absent.
- Operator-side post-merge phase (operator runs
sync_all / npm run ai:orchestrator locally): the now-uncapped syncers rehydrate the full corpus from GitHub source-of-truth, emitting ordinal-100 chunk shape natively per contentPath.mjs.
This split is what makes the "no migration scripts" framing per ADR 0004 §3.6 mechanically true — the syncer logic IS the migration engine, executed by the operator post-merge.
The Fix
Per ADR 0004 §3.6 (clean-slate purge) + §1.3 (regeneratable-cache strategic principle):
- Delete
resources/content/{issues, pulls, discussions, release-notes, archive}/* + .sync-metadata.json + _index.json in a single atomic PR commit.
- Remove the hard 200-cap from
PullRequestSyncer + DiscussionSyncer so clean-slate emission paginates exhaustively via GraphQL cursor.
- Change
IssueSyncer clean-slate since: fallback to a pre-Neo date so pre-2025 dormant issues re-emit on first sync.
- Bump
maxIssues (10000 → 20000) for defensive headroom — Neo currently has 8,502 issues + 2,816 PRs (V-B-A'd 2026-05-16 via GitHub search per @neo-gpt PR #11461 Cycle 2 review PRR_kwDODSospM8AAAABAIZAdg); ~135% headroom on issues with room for growth.
- Operator runs
sync_all post-merge; substrate rehydrates at correct shape.
Acceptance Criteria (PR-merge phase)
Residual / Post-Merge Validation (operator-side, NOT pre-merge ACs)
Out of Scope
- Code changes to
LocalFileService, path primitives, or other downstream consumers (completed in earlier Phase 1 tickets).
- Phase 2 SEO / Portal app rewires.
archiveVersion carry-forward retirement (deferred follow-up; clean-cut + syncer fixes make it mechanically redundant but explicit removal is a separate cleanup).
- Regression tests for clean-slate exhaustive emission with multi-page GraphQL mock fixtures (deferred follow-up — substantial enough to warrant its own scope).
Avoided Traps
- Attempting to preserve git history with complex file-move scripts. Rejected by operator: "delete it all => clean slate". We use the sync pipeline as the migration engine.
- Committing recreated content in the migration PR. The ACs were initially structured this way but updated 2026-05-16 per operator-approved Cycle 2 boundary: PR commits deletion + syncer-cap removal atomically; rehydration runs post-merge via operator-side sync_all.
- Bundling speculative-support fixes.
archiveVersion retirement is correct but mechanically post-merge-redundant via clean-cut; deferred to follow-up rather than scope-creeping this PR.
Related
- Epic: #11372
- Authority: ADR 0004 (
0004-github-content-architecture.md) — §1.3 regeneratable-cache + §2.4 sealed-chunk via prevent-reopen.yml + §3.6 clean-slate purge + §9 item 10
- Implementation PR: #11461
- Sister Phase 1 PRs (merged): #11381 (contentPath.mjs) / #11403 (Lane B syncers) / #11390 (LocalFileService) / #11407 (ReleaseNotesSyncer) / #11392 (Lane C consumer) / #11409 (stale-ref cleanup) / #11387 (config audit)
- Superseded this session: #11364 + PR #11458 (narrow
archiveVersion retirement) — closed Drop+Supersede; substantive work folded into #11451's broader scope here.
Origin Session ID: 188acb85-b41e-435c-94ee-0cc9944d4c97
Body updated 2026-05-16 per operator-approved PR #11461 Cycle 2 boundary (GPT review PRR_kwDODSospM8AAAABAIXzdA).
Retrieval Hint: "clean-slate migration ADR 0004 Task 10 syncer-cap exhaustive emission"
Context
This ticket implements Phase 1 Task 10 of Epic #11372: the clean-slate migration of the
resources/content/substrate to the Universal Ordinal-100 Content Architecture (ADR 0004). The producer and syncer code rewires (Phase 1 Tasks 1-9 via #11381 / #11403 / #11390 / #11407 / #11392 / #11409 / #11387) have all merged; this task delivers the data-side migration + clean-slate exhaustive emission contract in the syncer code.The Problem
The legacy
resources/content/substrate uses retired GH-ID-stream<NNN>xx/chunking + flat discussions + pre-Option-G release-notes shape. ADR 0004 anchors the solution on a singlechunk-N/universal ordinal primitive +_index.jsonmap. The code changes are complete onorigin/dev, but the on-disk substrate is still in the mixed/transitional shape that the syncers cannot incrementally migrate without dropping the cache.The Architectural Reality (updated for PR #11461 Cycle 2)
The operator-approved migration boundary is two-phase:
resources/content/contains onlyconcepts/+sandman_handoff.mdsymlink;.sync-metadata.json+_index.jsonare absent.sync_all/npm run ai:orchestratorlocally): the now-uncapped syncers rehydrate the full corpus from GitHub source-of-truth, emitting ordinal-100 chunk shape natively percontentPath.mjs.This split is what makes the "no migration scripts" framing per ADR 0004 §3.6 mechanically true — the syncer logic IS the migration engine, executed by the operator post-merge.
The Fix
Per ADR 0004 §3.6 (clean-slate purge) + §1.3 (regeneratable-cache strategic principle):
resources/content/{issues, pulls, discussions, release-notes, archive}/*+.sync-metadata.json+_index.jsonin a single atomic PR commit.PullRequestSyncer+DiscussionSyncerso clean-slate emission paginates exhaustively via GraphQL cursor.IssueSyncerclean-slatesince:fallback to a pre-Neo date so pre-2025 dormant issues re-emit on first sync.maxIssues(10000 → 20000) for defensive headroom — Neo currently has 8,502 issues + 2,816 PRs (V-B-A'd 2026-05-16 via GitHub search per @neo-gpt PR #11461 Cycle 2 reviewPRR_kwDODSospM8AAAABAIZAdg); ~135% headroom on issues with room for growth.sync_allpost-merge; substrate rehydrates at correct shape.Acceptance Criteria (PR-merge phase)
resources/content/{issues, pulls, discussions, release-notes, archive}/*is deleted (6,673 files).resources/content/.sync-metadata.jsonis deleted.resources/content/_index.jsonis deleted.resources/content/concepts/is preserved (separate substrate, out of ADR 0004 scope).resources/content/sandman_handoff.mdsymlink is preserved (separate substrate).PullRequestSyncer200-cap removed; clean-slate exhaustive emission via GraphQL cursor pagination.DiscussionSyncer200-cap removed; same.IssueSyncerclean-slatesince:fallback set to '2017-01-01T00:00:00Z' (pre-Neo) so GraphQL fetches full repo history.maxIssuesconfig bumped 10000 → 20000 for defensive headroom.Residual / Post-Merge Validation (operator-side, NOT pre-merge ACs)
sync_all(e.g.,npm run ai:orchestratororsync_allMCP) in main checkout.resources/content/repopulates with ordinal-100 chunk shape — no<NNN>xx/GH-ID-stream folders survive._index.jsonregenerates correctly post-sync.issues/chunk-*/,pulls/chunk-*/,discussions/chunk-*/,release-notes/chunk-*/,archive/{type}/v<X>/chunk-*/.Out of Scope
LocalFileService, path primitives, or other downstream consumers (completed in earlier Phase 1 tickets).archiveVersioncarry-forward retirement (deferred follow-up; clean-cut + syncer fixes make it mechanically redundant but explicit removal is a separate cleanup).Avoided Traps
archiveVersionretirement is correct but mechanically post-merge-redundant via clean-cut; deferred to follow-up rather than scope-creeping this PR.Related
0004-github-content-architecture.md) — §1.3 regeneratable-cache + §2.4 sealed-chunk via prevent-reopen.yml + §3.6 clean-slate purge + §9 item 10archiveVersionretirement) — closed Drop+Supersede; substantive work folded into #11451's broader scope here.Origin Session ID: 188acb85-b41e-435c-94ee-0cc9944d4c97 Body updated 2026-05-16 per operator-approved PR #11461 Cycle 2 boundary (GPT review PRR_kwDODSospM8AAAABAIXzdA). Retrieval Hint: "clean-slate migration ADR 0004 Task 10 syncer-cap exhaustive emission"