LearnNewsExamplesServices
Frontmatter
id11291
titleMigrate PR archives to versioned archive/pulls paths
stateClosed
labels
enhancementaiarchitecturebuild
assigneesneo-gpt
createdAtMay 13, 2026, 9:44 AM
updatedAtMay 13, 2026, 3:14 PM
githubUrlhttps://github.com/neomjs/neo/issues/11291
authorneo-gpt
commentsCount0
parentIssue11187
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 13, 2026, 3:14 PM

Migrate PR archives to versioned archive/pulls paths

Closedenhancementaiarchitecturebuild
neo-gpt
neo-gpt commented on May 13, 2026, 9:44 AM

Context

This is the missing AC8 companion ticket for Epic #11187: reshape the existing PR archive corpus from legacy resources/content/pr-archive/ into the new single-root archive substrate at resources/content/archive/pulls/vN.M.K/{flat|chunk-N}/.

The gap surfaced after Gemini created the #11187 fanout (#11284-#11288). The fanout covered active discussions, read-path fallback, CI/build scripts, consumer cleanup, and validation/docs, but did not create a dedicated ticket for Epic #11187 AC8:

Reshape pr-archive/XXxx/ to archive/pulls/v*/{flat|chunk-N}/ via release-mapping.

PR #11282 already shipped the MetadataManager prerequisite that preserves pull-request mergedAt, milestone, and archiveVersion, which are required inputs for deterministic version inference.

The Problem

The issue archive has version folders, but the PR archive is currently ID-range oriented and lacks release-version grouping. Moving to the single-root archive shape requires mapping existing archived PR files to release versions before moving them under archive/pulls/.

Without a dedicated AC8 ticket, the migration can be accidentally folded into broad consumer cleanup or release-script work. That would blur the hardest part of the PR archive migration: deterministic version inference and dry-run evidence for existing historical files.

The Architectural Reality

Current and target surfaces:

  • Current legacy corpus: resources/content/pr-archive/ with PR ID-range layout.
  • Target corpus: resources/content/archive/pulls/vN.M.K/{flat|chunk-N}/pr-NNNN.md.
  • Existing active PR corpus: resources/content/pulls/pr-XXxx/pr-NNNN.md remains active-tier ID-range per Epic #11187.
  • Metadata prerequisite: ai/services/github-workflow/sync/MetadataManager.mjs now preserves mergedAt, milestone, and archiveVersion for pull metadata via PR #11282.
  • Archive path primitive: ai/services/github-workflow/shared/archivePath.mjs owns lazy 100-item archive chunking semantics.
  • Release/archive consumers include buildScripts/release/publish.mjs, buildScripts/release/analyzeClosedSinceRelease.mjs, .github/workflows/data-sync-pipeline.yml, and downstream docs/indexing scripts handled by sibling tickets.

Duplicate sweep notes:

  • #11117 is closed and represents the pre-#11187 PR archive/chunking shape; it targeted pr-archive/ and is superseded by the single-root archive/pulls/ substrate.
  • #11286 covers CI pipelines and release scripts, not the historical PR archive corpus migration and version inference itself.
  • #11287 covers consumer cleanup, not the data migration plan.
  • No open ticket found for AC8's PR archive version-mapping migration.

The Fix

Implement a deterministic PR archive migration plan for existing pr-archive files.

The implementation should:

  • Infer the target release version for each archived PR using the best available metadata in deterministic order.
  • Prefer explicit archiveVersion when present.
  • Fall back to milestone/release metadata when it is structurally reliable.
  • Fall back to mergedAt against release-cut dates when needed.
  • Emit an anomaly report for PRs that cannot be mapped without ambiguity.
  • Support a dry-run/report mode before moving files.
  • Move files only after the report is reviewable and deterministic.
  • Use the Epic #11187 lazy archive chunking contract: flat when a version folder has ≤100 PRs, chunk-N when it exceeds 100.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
resources/content/archive/pulls/vN.M.K/ Epic #11187 AC8 Existing archived PR files grouped by release version under single archive root Ambiguous PRs stay reported, not silently guessed Migration report plus docs reference in #11288 Dry-run output + fixture tests
PR version inference order This ticket + PR #11282 metadata retention archiveVersion first, then reliable milestone/release metadata, then mergedAt release-cut mapping Emit anomaly for unmappable or conflicting data Inline JSDoc / migration README if script added Fixture matrix covering all branches
Legacy pr-archive/ #11117 superseded by #11187 Drained or explicitly left as compatibility-only after migration Read-path fallback from #11285 during transition Consumer cleanup in #11287 Post-migration count diff

Acceptance Criteria

  • Migration logic maps legacy resources/content/pr-archive/**/pr-NNNN.md files into resources/content/archive/pulls/vN.M.K/{flat|chunk-N}/.
  • Version inference is deterministic and documented in code or migration output: archiveVersion → reliable milestone/release metadata → mergedAt against release-cut dates → anomaly report.
  • Dry-run/report mode lists planned moves, inferred version, inference source, and anomalies before writing files.
  • Fixture or targeted test coverage exercises explicit archiveVersion, milestone-based inference, mergedAt fallback, and unmappable PR anomaly cases.
  • Migration preserves active-tier PR layout under resources/content/pulls/pr-XXxx/.
  • Migration uses the Epic #11187 archive chunking contract: flat ≤100 files per version, chunk-N for >100.
  • Post-run validation reports legacy count, moved count, anomaly count, and target count.
  • PR body cites the dry-run output and explains any anomalies left for human review.

Out of Scope

  • Active pulls/ layout changes.
  • Issue archive migration; that is Epic #11187 AC7 / sibling lane.
  • Discussion archive creation; that is Epic #11187 AC9 / sibling lane.
  • Broad consumer cleanup beyond what is needed to execute and verify the PR archive migration; consumer cleanup belongs to #11287.
  • Release pipeline rewrites beyond migration support; release scripts belong to #11286.

Avoided Traps

  • Folding AC8 into consumer cleanup. Rejected because version inference and data movement require their own evidence trail.
  • Guessing release versions silently. Rejected; ambiguous PRs must surface as anomalies.
  • Using raw ID range as the target archive shape. Rejected because Epic #11187 selected versioned archive folders with lazy chunking for archive-tier content.
  • Moving files without dry-run evidence. Rejected because archive corpus migration is high-blast and generated-sync-noise prone.

Related

  • Epic #11187 AC8
  • #11117 — superseded pre-#11187 PR archive/chunking shape
  • #11282 / #11281 — MetadataManager preserved PR fields required for inference
  • #11285 — read-path fallback during transition
  • #11286 — CI and release-script archive substrate updates
  • #11287 — consumer cleanup
  • #11288 — validation/docs/anomaly hooks
  • resources/content/pr-archive/
  • resources/content/archive/pulls/
  • ai/services/github-workflow/shared/archivePath.mjs

Origin Session ID: d6d89930-f408-42a0-b60e-ec4487a8cc46

Handoff Retrieval Hints:

  • query_raw_memories(query="AC8 pr-archive version mapping archive pulls migration archiveVersion mergedAt milestone")
  • ask_knowledge_base(query="PR archive migration archive/pulls release mapping archiveVersion mergedAt")
  • GitHub evidence: Epic #11187 AC8 and PR #11282 metadata-retention merge
tobiu referenced in commit ca422cc - "feat(github-workflow): migrate PR archives to archive root (#11291) (#11301) on May 13, 2026, 3:14 PM
tobiu closed this issue on May 13, 2026, 3:14 PM
tobiu referenced in commit 559c73d - "fix(github-workflow): substrate cleanup — legacy archives + 195-PR collision + syncer fallback + pr- prefix (#11360) (#11362) on May 14, 2026, 5:40 PM