LearnNewsExamplesServices
Frontmatter
id11117
titleChunk resources/content/pulls/ to bypass GitHub''s 1000-file folder cap (sibling to #11113)
stateClosed
labels
enhancementaimodel-experience
assigneesneo-gemini-3-1-pro
createdAtMay 10, 2026, 4:23 PM
updatedAtMay 10, 2026, 6:17 PM
githubUrlhttps://github.com/neomjs/neo/issues/11117
authorneo-opus-4-7
commentsCount0
parentIssue11120
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 10, 2026, 6:17 PM

Chunk resources/content/pulls/ to bypass GitHub's 1000-file folder cap (sibling to #11113)

Closedenhancementaimodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 10, 2026, 4:23 PM

Context

Operator @tobiu surfaced 2026-05-10 during PR #11114 (Issue Chunk Migration) review:

"follow up tickets (where we can do better!) => we also sync discussions and pr conversations now. we do not have archives and chunking in place there yet."

resources/content/pulls/ currently holds 713 markdown files (one per synced PR conversation). At GitHub's 1000-file folder cap, that's ~287 files of headroom — at current PR cadence (this session alone added 8+ PRs in ~2 hours), the cap will be reached within weeks. Sibling directory discussions/ (64 files) is a preventive case filed separately.

The Problem

When resources/content/pulls/ exceeds 1000 files, GitHub's web UI folder tree truncates display — files beyond the 1000th become invisible / inaccessible via web UI. Same friction class as #11113 fixed for resources/content/issues/. Without chunking + archive separation, the substrate degrades the GitHub UX for the swarm + future contributors browsing PR-content.

Additionally: there is no pr-archive/ separation today (only issue-archive/). Active PRs and merged-PR archives co-mingle under pulls/. As Neo grows, the archive substrate should mirror the issue substrate.

The Architectural Reality

ai/services/github-workflow/sync/PullRequestSyncer.mjs determines local file paths (currently flat resources/content/pulls/pr-NNNN.md). Build-script consumers read this directory for analysis (e.g., buildScripts/release/analyzeClosedSinceRelease.mjs).

Per #11113 substrate (post-#11114 merge): XXxx-style chunking convention for issues. The substrate-correct shape is to inherit the chunking convention from #11113 rather than introduce a parallel-but-different shape — single substrate primitive across all GH-content syncs (issues, PRs, discussions).

The Fix

Apply the same chunking + archive pattern that #11114 established for issues:

  1. Subdivide resources/content/pulls/ into chunked sub-folders using whatever convention #11114 cycle-3 documents (currently XXxx per PR #11114; pending operator-is-key defense by Gemini).
  2. Create resources/content/pr-archive/ mirror with same chunked structure for closed/merged PRs.
  3. Update PullRequestSyncer.mjs to dynamically compute + write to the correct chunked path based on PR number (#getPrPath mirror of #getIssuePath).
  4. Update consumers for recursive readdir parsing (mirror the 5 consumers updated in PR #11114: IssueIngestor → PRIngestor; TicketSource → PRSource if/where applicable; tickets.mjs index-builder → PR equivalents; analyzeClosedSinceRelease; SEO generate.mjs).
  5. Migration script to organize existing 713 files into chunked structure.

Per #11116 friction-gold lesson (separate code-change from data-migration commits): consider splitting this into 2 sub-tickets:

  • Sub-A: code-side chunking implementation in PullRequestSyncer.mjs + consumer updates (small, reviewable in isolation)
  • Sub-B: data-migration of 713 existing files into the chunked structure (mechanical, separate review pass)

Sequencing: Sub-A lands first (enables new shape); Sub-B after (populates the new shape).

Acceptance Criteria

  • PullRequestSyncer.mjs writes new PRs to chunked sub-folder per chosen convention
  • resources/content/pr-archive/ exists with same chunked structure
  • All build-script consumers updated for recursive readdir (mirror PR #11114's 5-consumer pattern)
  • Existing 713 PR markdown files migrated into chunked structure
  • No broken cross-references in consumers (build / sync / portal)
  • Naming convention documented (inheriting from #11114 cycle-3 outcome — defer to whatever convention lands there to keep a single substrate)

Out of Scope

  • Naming-convention re-decision — defer to #11114 cycle-3 outcome; this ticket inherits whatever convention lands there. Don't introduce a parallel divergent shape.
  • Generalized GH-content-chunking utility — could be extracted as shared substrate after issues + PRs + discussions are all chunked individually. Premature abstraction now.
  • Discussion chunking — separate sibling ticket (filed parallel to this one).
  • Active vs archive lifecycle policy — currently merged/closed PRs stay in pulls/; archive-separation is a UX improvement, not a functional contract change. Define the policy in a separate ticket if scope requires.

Avoided Traps / Gold Standards Rejected

Decision Matrix

  1. Mirror #11113/#11114 chunking + archive pattern directly (Selected): Reuses established substrate primitives + naming convention. Lowest implementation cost. Aligned with single-substrate-primitive-across-content-types shape.

  2. Introduce a different chunking convention specific to PRs: Rejected. Divergent conventions across content types (issues vs PRs vs discussions) compound cognitive load + tooling complexity. Single substrate primitive is the substrate-correct shape.

  3. Defer until cap is actually hit: Rejected. 713 → 1000 at current cadence is weeks. Preventive substrate-evolution is cheaper than reactive (no broken-UI period for swarm).

  4. Generalize as a single GH-content-chunking utility before fixing PRs: Rejected as premature abstraction. Pattern is established by #11114 post-merge; abstract after we have 3+ instances (issues + PRs + discussions). Extract substrate when the friction of duplication exceeds the friction of the abstraction itself.

  • Trap: Treating pulls/ chunking as independent of issues/ chunking. Rejection: Same friction class, same cap, same consumer pattern (sync writer + recursive readdir consumers). Substrate-correct shape is single primitive applied across content types.

Related

  • #11113 (canonical substrate ticket for issue chunking + archive)
  • PR #11114 (in flight; #11113 implementation; cycle-2 naming-convention defense pending)
  • #11116 (sister substrate-evolution: code-vs-data-migration commit-shape discipline; informs sub-ticket split shape for this ticket)
  • Operator framing 2026-05-10 (this session, A2A-only — [paraphrase] for peer corroboration: "we also sync discussions and pr conversations now. we do not have archives and chunking in place there yet.")
  • Sibling ticket: discussion chunking (filed in parallel)
  • Substrate-quality observation worth [RETROSPECTIVE]: GH-content sync substrate should evolve as a SHARED primitive once 3+ content types (issues + PRs + discussions) hit the chunking pattern. Not in this ticket's scope; flag as future generalization candidate.

Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1 Retrieval Hint: "PR conversations chunking", "resources/content/pulls 1k cap", "PullRequestSyncer chunked path"

tobiu closed this issue on May 10, 2026, 6:17 PM
tobiu referenced in commit 83c0eb0 - "refactor(github-workflow): implement chunked storage for pull requests (#11123) on May 10, 2026, 6:17 PM