Context
Operator @tobiu surfaced 2026-05-10 during PR #11114 (Issue Chunk Migration) review:
"follow up tickets (where we can do better!) => we also sync discussions and pr conversations now. we do not have archives and chunking in place there yet."
resources/content/pulls/ currently holds 713 markdown files (one per synced PR conversation). At GitHub's 1000-file folder cap, that's ~287 files of headroom — at current PR cadence (this session alone added 8+ PRs in ~2 hours), the cap will be reached within weeks. Sibling directory discussions/ (64 files) is a preventive case filed separately.
The Problem
When resources/content/pulls/ exceeds 1000 files, GitHub's web UI folder tree truncates display — files beyond the 1000th become invisible / inaccessible via web UI. Same friction class as #11113 fixed for resources/content/issues/. Without chunking + archive separation, the substrate degrades the GitHub UX for the swarm + future contributors browsing PR-content.
Additionally: there is no pr-archive/ separation today (only issue-archive/). Active PRs and merged-PR archives co-mingle under pulls/. As Neo grows, the archive substrate should mirror the issue substrate.
The Architectural Reality
ai/services/github-workflow/sync/PullRequestSyncer.mjs determines local file paths (currently flat resources/content/pulls/pr-NNNN.md). Build-script consumers read this directory for analysis (e.g., buildScripts/release/analyzeClosedSinceRelease.mjs).
Per #11113 substrate (post-#11114 merge): XXxx-style chunking convention for issues. The substrate-correct shape is to inherit the chunking convention from #11113 rather than introduce a parallel-but-different shape — single substrate primitive across all GH-content syncs (issues, PRs, discussions).
The Fix
Apply the same chunking + archive pattern that #11114 established for issues:
- Subdivide
resources/content/pulls/ into chunked sub-folders using whatever convention #11114 cycle-3 documents (currently XXxx per PR #11114; pending operator-is-key defense by Gemini).
- Create
resources/content/pr-archive/ mirror with same chunked structure for closed/merged PRs.
- Update
PullRequestSyncer.mjs to dynamically compute + write to the correct chunked path based on PR number (#getPrPath mirror of #getIssuePath).
- Update consumers for recursive readdir parsing (mirror the 5 consumers updated in PR #11114: IssueIngestor → PRIngestor; TicketSource → PRSource if/where applicable; tickets.mjs index-builder → PR equivalents; analyzeClosedSinceRelease; SEO generate.mjs).
- Migration script to organize existing 713 files into chunked structure.
Per #11116 friction-gold lesson (separate code-change from data-migration commits): consider splitting this into 2 sub-tickets:
- Sub-A: code-side chunking implementation in
PullRequestSyncer.mjs + consumer updates (small, reviewable in isolation)
- Sub-B: data-migration of 713 existing files into the chunked structure (mechanical, separate review pass)
Sequencing: Sub-A lands first (enables new shape); Sub-B after (populates the new shape).
Acceptance Criteria
Out of Scope
- Naming-convention re-decision — defer to #11114 cycle-3 outcome; this ticket inherits whatever convention lands there. Don't introduce a parallel divergent shape.
- Generalized GH-content-chunking utility — could be extracted as shared substrate after issues + PRs + discussions are all chunked individually. Premature abstraction now.
- Discussion chunking — separate sibling ticket (filed parallel to this one).
- Active vs archive lifecycle policy — currently merged/closed PRs stay in
pulls/; archive-separation is a UX improvement, not a functional contract change. Define the policy in a separate ticket if scope requires.
Avoided Traps / Gold Standards Rejected
Decision Matrix
Mirror #11113/#11114 chunking + archive pattern directly (Selected): Reuses established substrate primitives + naming convention. Lowest implementation cost. Aligned with single-substrate-primitive-across-content-types shape.
Introduce a different chunking convention specific to PRs: Rejected. Divergent conventions across content types (issues vs PRs vs discussions) compound cognitive load + tooling complexity. Single substrate primitive is the substrate-correct shape.
Defer until cap is actually hit: Rejected. 713 → 1000 at current cadence is weeks. Preventive substrate-evolution is cheaper than reactive (no broken-UI period for swarm).
Generalize as a single GH-content-chunking utility before fixing PRs: Rejected as premature abstraction. Pattern is established by #11114 post-merge; abstract after we have 3+ instances (issues + PRs + discussions). Extract substrate when the friction of duplication exceeds the friction of the abstraction itself.
- Trap: Treating
pulls/ chunking as independent of issues/ chunking. Rejection: Same friction class, same cap, same consumer pattern (sync writer + recursive readdir consumers). Substrate-correct shape is single primitive applied across content types.
Related
- #11113 (canonical substrate ticket for issue chunking + archive)
- PR #11114 (in flight; #11113 implementation; cycle-2 naming-convention defense pending)
- #11116 (sister substrate-evolution: code-vs-data-migration commit-shape discipline; informs sub-ticket split shape for this ticket)
- Operator framing 2026-05-10 (this session, A2A-only —
[paraphrase] for peer corroboration: "we also sync discussions and pr conversations now. we do not have archives and chunking in place there yet.")
- Sibling ticket: discussion chunking (filed in parallel)
- Substrate-quality observation worth
[RETROSPECTIVE]: GH-content sync substrate should evolve as a SHARED primitive once 3+ content types (issues + PRs + discussions) hit the chunking pattern. Not in this ticket's scope; flag as future generalization candidate.
Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1
Retrieval Hint: "PR conversations chunking", "resources/content/pulls 1k cap", "PullRequestSyncer chunked path"
Context
Operator @tobiu surfaced 2026-05-10 during PR #11114 (Issue Chunk Migration) review:
resources/content/pulls/currently holds 713 markdown files (one per synced PR conversation). At GitHub's 1000-file folder cap, that's ~287 files of headroom — at current PR cadence (this session alone added 8+ PRs in ~2 hours), the cap will be reached within weeks. Sibling directorydiscussions/(64 files) is a preventive case filed separately.The Problem
When
resources/content/pulls/exceeds 1000 files, GitHub's web UI folder tree truncates display — files beyond the 1000th become invisible / inaccessible via web UI. Same friction class as #11113 fixed forresources/content/issues/. Without chunking + archive separation, the substrate degrades the GitHub UX for the swarm + future contributors browsing PR-content.Additionally: there is no
pr-archive/separation today (onlyissue-archive/). Active PRs and merged-PR archives co-mingle underpulls/. As Neo grows, the archive substrate should mirror the issue substrate.The Architectural Reality
ai/services/github-workflow/sync/PullRequestSyncer.mjsdetermines local file paths (currently flatresources/content/pulls/pr-NNNN.md). Build-script consumers read this directory for analysis (e.g.,buildScripts/release/analyzeClosedSinceRelease.mjs).Per #11113 substrate (post-#11114 merge):
XXxx-style chunking convention for issues. The substrate-correct shape is to inherit the chunking convention from #11113 rather than introduce a parallel-but-different shape — single substrate primitive across all GH-content syncs (issues, PRs, discussions).The Fix
Apply the same chunking + archive pattern that #11114 established for issues:
resources/content/pulls/into chunked sub-folders using whatever convention #11114 cycle-3 documents (currentlyXXxxper PR #11114; pending operator-is-keydefense by Gemini).resources/content/pr-archive/mirror with same chunked structure for closed/merged PRs.PullRequestSyncer.mjsto dynamically compute + write to the correct chunked path based on PR number (#getPrPathmirror of#getIssuePath).Per #11116 friction-gold lesson (separate code-change from data-migration commits): consider splitting this into 2 sub-tickets:
PullRequestSyncer.mjs+ consumer updates (small, reviewable in isolation)Sequencing: Sub-A lands first (enables new shape); Sub-B after (populates the new shape).
Acceptance Criteria
PullRequestSyncer.mjswrites new PRs to chunked sub-folder per chosen conventionresources/content/pr-archive/exists with same chunked structureOut of Scope
pulls/; archive-separation is a UX improvement, not a functional contract change. Define the policy in a separate ticket if scope requires.Avoided Traps / Gold Standards Rejected
Decision Matrix
Mirror #11113/#11114 chunking + archive pattern directly (Selected): Reuses established substrate primitives + naming convention. Lowest implementation cost. Aligned with single-substrate-primitive-across-content-types shape.
Introduce a different chunking convention specific to PRs: Rejected. Divergent conventions across content types (issues vs PRs vs discussions) compound cognitive load + tooling complexity. Single substrate primitive is the substrate-correct shape.
Defer until cap is actually hit: Rejected. 713 → 1000 at current cadence is weeks. Preventive substrate-evolution is cheaper than reactive (no broken-UI period for swarm).
Generalize as a single GH-content-chunking utility before fixing PRs: Rejected as premature abstraction. Pattern is established by #11114 post-merge; abstract after we have 3+ instances (issues + PRs + discussions). Extract substrate when the friction of duplication exceeds the friction of the abstraction itself.
pulls/chunking as independent ofissues/chunking. Rejection: Same friction class, same cap, same consumer pattern (sync writer + recursive readdir consumers). Substrate-correct shape is single primitive applied across content types.Related
[paraphrase]for peer corroboration: "we also sync discussions and pr conversations now. we do not have archives and chunking in place there yet.")[RETROSPECTIVE]: GH-content sync substrate should evolve as a SHARED primitive once 3+ content types (issues + PRs + discussions) hit the chunking pattern. Not in this ticket's scope; flag as future generalization candidate.Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1 Retrieval Hint: "PR conversations chunking", "resources/content/pulls 1k cap", "PullRequestSyncer chunked path"