LearnNewsExamplesServices
Frontmatter
id11187
titleAdopt single-root archive with lazy 100-item chunking (Discussion #11180 graduation)
stateClosed
labels
epicno auto closeaiarchitecturemodel-experience
assignees[]
createdAtMay 11, 2026, 10:16 AM
updatedAtMay 16, 2026, 12:53 PM
githubUrlhttps://github.com/neomjs/neo/issues/11187
authorneo-opus-4-7
commentsCount15
parentIssuenull
subIssues
11189 Phase 1: Refactor Config for Archive Shape Management
11190 Add archivePath helper for sealed archive chunks
11196 PullRequestService archive-write path refactor for lazy ordinal chunking (#11187 AC4)
11197 DiscussionService flat active-tier and lazy ordinal archive refactor (#11187 AC5)
11198 IssueService archive-write path refactor for lazy ordinal chunking (#11187 AC3)
11281 Preserve archive planning metadata in MetadataManager
11283 Preserve required lifecycle fields in MetadataManager.save
11284 Collapse active discussions to flat structure
11285 Implement dual-search read-path fallback in LocalFileService
11286 Update CI pipelines and build scripts for new archive substrate
11287 Cleanup consumer paths for TicketSource, SEO, and analyzers
11288 Validate archive migration and document anomaly hooks
11290 Fail loudly on stale GitHub Workflow archive config
11291 Migrate PR archives to versioned archive/pulls paths
11358 Epic #11187 Phase 0B: Fail-Closed Write Guard for SyncService
11360 Substrate cleanup: legacy folders + 195-PR collision + active-tier residue + syncer fixes
subIssuesCompleted16
subIssuesTotal16
blockedBy[]
blocking[]
closedAtMay 16, 2026, 12:53 PM

Adopt single-root archive with lazy 100-item chunking (Discussion #11180 graduation)

Closedepicno auto closeaiarchitecturemodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 11, 2026, 10:16 AM

Context

Graduation Epic from Discussion #11180 — 3-way swarm + operator-correction convergence on type-separated archive symmetry with lazy density-tuned chunking.

Origin friction-anchor: PR #11174 (cosmetic naming-normalization of discussion-100xx/100xx/) triggered operator architectural review on 2026-05-11. Cross-family ideation revealed:

  • Asymmetric archive substrate: issue-archive/ uses HYBRID version+chunk; pr-archive/ uses ID-only-no-version; no discussion-archive/ exists
  • Sparse-folder anti-pattern: at low type-density (~100 discussions total), 100-ID-range chunking produces 1-5-item folders
  • v13 release-cap risk: ~1500 issues closed-since-v12.1 currently in active issues/ would exceed 1000-cap if archived without chunking
  • Leverage opportunity: .github/workflows/prevent-reopen.yml CI-enforces closedAt immutability past 24h grace, enabling one-way archive placement

Empirical anchors verified:

  • ~1070 active issue files (open + closed-since-v12.1, approaching 1000-cap)
  • ~734 active PR files
  • ~68 active discussion files (sparse across 23 chunked subfolders averaging 3 items/folder)
  • ~3000 archived issue files (across 165 version folders, mostly ≤100 items each)
  • 212 archived PR files (ID-only chunks, no version layer)
  • 0 archived discussion files (substrate doesn't exist)

[Cycle 2 body amendment 2026-05-11 per GPT epic-review]:

  • Blocker 1 (Discussion #11180 body) RESOLVED by @neo-gemini-3-1-pro at 2026-05-11T08:26:38Z: Option A struck through and marked Withdrawn; full divergence matrix A through E''+S documented; OQ1-OQ3 marked [RESOLVED]; OQ2 specifically uses closedAt + answerChosenAt (raw updatedAt reserved for sync-freshness only); RESOLVED_TO_AC comment + Epic #11187 cited as canonical.
  • Blocker 2 (active-tier ordinal chunking) RESOLVED by this amendment: Active-tier issues + pulls retain XXxx ID-range chunking (Math.floor(id/100)*100 deterministic lookup via LocalFileService#getIssueById). Ordinal chunk-N semantics now applies only to sealed archive chunks where closedAt immutability + version-folder boundary makes chunk membership stable. AC6 + AC7 (active-tier ordinal migration) DROPPED; AC8 renumbered to AC6. Subsequent ACs renumbered. Avoided Traps + architectural shape diagram + Lazy chunking algorithm section updated to reflect the active/archive split.

Phase 1 fully unblocked. AC1 lane (#11189 @neo-gemini-3-1-pro), AC2 lane (@neo-gpt), AC3 lane (@neo-opus-4-7) cleared to proceed.

The Problem

Three coupled architectural defects in the current substrate:

  1. Top-level fragmentation: 6 archive-related top-level folders (issue-archive/, pr-archive/, discussion-archive/ to-be-created + 3 active) vs cleaner consolidation under one archive/ root
  2. Density-blind chunking: 100-ID-range chunking forces sparse folders for low-density types (discussions) while undersizing for high-density (v13 issues exceed even chunking-safe ranges per release)
  3. Two-way data flow: without leveraging prevent-reopen.yml, syncer would need continuous re-validation + file moves on closedAt shifts; with the lean, archive placement is one-way + sealed-chunk semantic preserved

The Architectural Reality

File:line surfaces touched (V-B-A'd via operator reads):

  • ai/mcp/server/github-workflow/config.template.mjs — substrate config substrate
  • ai/services/github-workflow/IssueService.mjs + IssueSyncer
  • ai/services/github-workflow/PullRequestService.mjs + PullRequestSyncer
  • ai/services/github-workflow/DiscussionService.mjs + DiscussionSyncer
  • ai/services/github-workflow/SyncService.mjs — orchestrator
  • ai/services/github-workflow/LocalFileService.mjs — file path resolution (deterministic ID-based; preserved for active tier)
  • ai/services/github-workflow/HealthService.mjs — path-presence checks
  • New archivePath() helper — lazy chunking primitive (archive-tier only)
  • buildScripts/release/publish.mjs — release-cut archive logic + GH_SyncService.runFullSync()
  • buildScripts/release/analyzeClosedSinceRelease.mjs — analyzer
  • .github/workflows/data-sync-pipeline.yml — pages-repo sync paths (currently hardcodes issues/ + issue-archive/)
  • .github/workflows/prevent-reopen.yml — leveraged as substrate-correctness primitive (closedAt immutability)
  • Migration scripts (multiple, per phase)
  • learn/agentos/sandman-handoff-format.md, learn/agentos/GitHubWorkflow.md — docs

The Fix

Adopt Option E''+S from Discussion #11180:

Architectural shape

resources/content/
├── issues/             (active, XXxx ID-range — deterministic ID lookup; UNCHANGED)
├── pulls/              (active, pr-XXxx ID-range — deterministic ID lookup; UNCHANGED)
├── discussions/        (active, flat — collapse current sparse XXxx; 68 items)
├── archive/
│   ├── issues/vN.M.K/    (flat ≤100; chunked >100 via chunk-N ordinal — sealed)
│   ├── pulls/vN.M.K/     (flat ≤100; chunked >100 via chunk-N ordinal — sealed)
│   └── discussions/vN.M.K/ (flat indefinitely at current density)
└── release-notes/

Active-tier (issues + pulls): UNCHANGED XXxx ID-range

  • 100-ID-range folders (110xx/, 111xx/, ...) — Math.floor(id/100)*100 deterministic lookup
  • Preserved per LocalFileService#getIssueById semantic — O(1) folder location from ID
  • Discussions collapse to flat (current density 68 items — sparse XXxx churns to flat)

Archive-tier lazy chunking algorithm (sealed chunks only)

  • ≤ 100 items in archive version-folder: flat (no chunk-N/ wrapper)
  • > 100 items: split into 100-item ordinal-count chunks named chunk-N/ (non-numeric prefix, sequential)
  • Chunks sealed once fullclosedAt immutability post-prevent-reopen.yml-24h-grace makes membership stable
  • New items at release-cut go to first non-full chunk or fresh chunk-N+1/
  • Ordinal-count chosen for archive-tier because: (a) sealed-chunk semantic makes membership deterministic by version + insertion order, (b) avoids sparse-folder anti-pattern for low-density types (discussions)

Substrate-correctness leverage

  • closedAt (and mergedAt) treated as immutable post-prevent-reopen.yml-24h-grace
  • Archive placement is one-way: items moved at release-cut, never re-moved
  • Sealed-chunk semantic preserved (no mid-archive rebalancing)
  • closedAt-shift anomaly → flagged for human review, not auto-corrected

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
archive/{type}/vN.M.K/ paths This Epic + Discussion #11180 RESOLVED_TO_AC Single-root containing per-type subfolders; version-folder per release Legacy gh-workflow config fallbacks (issueSync.archiveDir, defaultArchiveVersion) retired by #11363; remaining data migration handled by AC7, not active config alias Updated GitHubWorkflow.md + sandman-handoff-format.md Phase 1 path-helper test coverage
archivePath(type, version, id, count) helper New Lazy chunking: flat ≤100, chunk-N/ ordinal >100 None (single canonical path resolver) JSDoc on helper Phase 1 unit tests for boundary cases
prevent-reopen.yml substrate-leverage Existing CI workflow Trusted for closedAt-immutability post-24h-grace Anomaly-detection + human-review hook for shift cases Phase 4 docs Phase 5 AC validation
data-sync-pipeline.yml push paths Updated for new substrate Mirrors new archive/ shape to neomjs/pages Existing push pattern aliased during transition YAML inline comments Phase 4 CI run validation

Acceptance Criteria — Phase-decomposed

Phase 1 — Foundation (~5 sub-tickets, code-only, no data move):

  • AC1: Config refactor: archiveRoot + per-type sub-keys (archive.issues, archive.pulls, archive.discussions); legacy issueSync.archiveDir / defaultArchiveVersion config fallbacks retired by #11363
  • AC2: archivePath() helper: lazy 100-item ordinal chunking + sealed-chunk semantics (archive-tier only)
  • AC3: IssueService + IssueSyncer refactor to use new helper (archive-write path); active LocalFileService#getIssueById ID-range semantic preserved
  • AC4: PullRequestService + PullRequestSyncer refactor (archive-write path); active pr-XXxx ID-range semantic preserved
  • AC5: DiscussionService + DiscussionSyncer refactor (also collapse active to flat)

Phase 2 — Active-tier adjustment (~1 sub-ticket, discussions only):

  • AC6: Collapse active discussions/XXxx/ → flat discussions/discussion-NNNN.md (68 items currently, sparse across 23 XXxx folders)

Dropped (Cycle 2 amendment per GPT epic-review Blocker 2):

  • AC6 (original): Migrate active issues/XXxx/issues/chunk-N/ ordinal
  • AC7 (original): Migrate active pulls/pr-XXxx/pulls/chunk-N/ ordinal

Rationale: Active-tier items churn (open/close — though prevent-reopen.yml limits reopens). Ordinal chunk-N is INSERTION-ORDER-dependent, NOT deterministic from item ID. LocalFileService#getIssueById(id) currently computes Math.floor(id/100)*100 to locate the XXxx folder — O(1) deterministic. Switching to ordinal would force a folder-scan per lookup. ID-range XXxx remains correct shape for active tier; ordinal chunk-N applies only to sealed archive chunks where closedAt immutability guarantees membership.

Phase 3 — Archive-tier reshape (~3 sub-tickets, per existing type):

  • AC7: Reshape issue-archive/v*/XXxx/archive/issues/v*/{flat|chunk-N}/
  • AC8: Reshape pr-archive/XXxx/archive/pulls/v*/{flat|chunk-N}/ (via release-mapping)
  • AC9: Create archive/discussions/; populate lazily at next release-cut

Phase 4 — Release + distribution (~3 sub-tickets):

  • AC10: Update buildScripts/release/publish.mjs archive-cutting logic for new paths
  • AC11: Update buildScripts/release/analyzeClosedSinceRelease.mjs + other analyzers
  • AC12: Update .github/workflows/data-sync-pipeline.yml pages-repo push paths

Phase 5 — Validation + docs (~3 sub-tickets):

  • AC13: Test coverage: path resolution + chunking boundary + migration paths
  • AC14: Docs: sandman-handoff-format.md + GitHubWorkflow.md + analyzer JSDoc
  • AC15: prevent-reopen-lean documented + closedAt-shift anomaly-detection hook implemented

Phase 6 — Post-merge verification (open until 2026-09-01):

  • AC16: After v13 release-cut, verify archive shape is correct + no item moves happened post-cut
  • AC17: Empirical: no agent reports archive-related sync errors for 30 days post-final-sub-merge

Out of Scope

  • Active-tier ordinal-chunking refactor (deferred — would require LocalFileService deterministic-lookup redesign; ID-range remains correct shape per Cycle 2 amendment)
  • Active-tier consolidation under active/ root (deferred; bigger migration; operator didn't request)
  • Extending prevent-reopen.yml pattern to PRs + discussions (deferred; rare events; manually handle edge cases if surfaces)
  • Cross-substrate identity canonicalization audit (#11181 family + #11182 Layer 4 — already separate)
  • Broader release-process refactoring beyond archive-cutting (publish.mjs has other concerns)
  • Migration of neomjs/pages historical content beyond next post-merge sync cycle

Avoided Traps

  • Option B (always-chunk version-outer + chunk-inner): 3-way converged but operator-challenged on sparse-tree anti-pattern. Rejected: produces 1-5-item folders for discussion-archive at current density.
  • Option E (lazy chunking with 800-threshold): Rejected — 800-flat folder is browsing-hostile (GitHub UI + portal app concerns).
  • Option F (strictly flat, no chunking ever): Rejected — breaks at v13 issues (1500+ items would exceed 1000-cap).
  • Option G (per-type fixed shape: always-chunk-or-always-flat): Rejected — hardcodes type-density assumption; less elegant than lazy detection.
  • ID-range chunks under non-numeric prefix for ARCHIVE: Rejected — preserves sparse-folder problem for low-density types. Ordinal-count chunks deliver consistent 100-item density (archive-tier only).
  • Ordinal chunk-N for ACTIVE tier: Rejected (Cycle 2 per GPT epic-review Blocker 2) — LocalFileService#getIssueById relies on Math.floor(id/100)*100 to locate XXxx folder in O(1). Ordinal chunk-N is insertion-order-dependent and not deterministic from ID; active tier churns; folder-scan-per-lookup is the regression. Ordinal chunking applies only to sealed archive chunks where closedAt immutability + version-folder boundary makes the chunk membership stable. Active tier retains XXxx ID-range.
  • Two-way data flow with continuous closedAt re-validation: Rejected — leverage prevent-reopen.yml CI primitive for immutability.

Related

  • Discussion #11180 (parent ideation; full divergence-matrix + 3-way swarm convergence + 5 operator-corrections; body update awaited from @neo-gemini-3-1-pro per GPT epic-review Blocker 1)
  • PR #11174 (closed; original cosmetic-naming-normalization that triggered architectural review)
  • Epic #11120 (original chunking arc; this Epic supersedes its substrate direction)
  • #11113, #11116, #11118, #11121, #11122 (original chunking sub-tickets; superseded)
  • PR #11114 (original chunking implementation; precedent for XXxx primitive)
  • PR #11186 (substrate-discipline doc: stale-magic-close-in-commit-bodies; merged)
  • Discussion #11188 (Extended V-B-A divergent-thinking discipline; GPT's epic-review on this Epic IS the first positive empirical anchor for the proposed discipline)
  • .github/workflows/prevent-reopen.yml (leveraged substrate primitive)
  • buildScripts/release/publish.mjs (release-cut archive logic)
  • .github/workflows/data-sync-pipeline.yml (pages-repo sync paths)
  • ai/services/github-workflow/LocalFileService.mjs (deterministic ID-based path resolution; preserved for active tier per Cycle 2 amendment)

Origin Session ID

c2912891-b459-4a03-b2af-154d5e264df1

Handoff Retrieval Hints

  • query_raw_memories(query="single root archive substrate lazy 100-item chunking E prime prime S Discussion 11180")
  • ask_knowledge_base(query="archive folder structure version chunking GitHubWorkflow")
  • Git commit-range anchor: git log --oneline --grep="11180" --since="2026-05-11" for graduation-context
  • File:line anchors: ai/services/github-workflow/{IssueService,PullRequestService,DiscussionService,SyncService,LocalFileService}.mjs + buildScripts/release/publish.mjs + .github/workflows/data-sync-pipeline.yml
tobiu referenced in commit a7f7d2d - "feat(agents): graduate Step 2.5 Architectural Step-Back (#11192) (#11194) on May 11, 2026, 1:50 PM
tobiu referenced in commit c459245 - "refactor(github-workflow): use archivePath for pull requests and discussions (#11187) (#11199) on May 11, 2026, 2:04 PM
tobiu referenced in commit 48456be - "feat(github-workflow): dual-search read-path in LocalFileService for archive transition (#11285) (#11289) on May 13, 2026, 12:18 PM
tobiu referenced in commit e223b9d - "feat(github-workflow): pipeline + publish archive substrate updates for Epic #11187 (#11286) (#11296) on May 13, 2026, 12:18 PM
tobiu referenced in commit 86ec2e6 - "feat(github-workflow): fail loudly on stale archive config in syncer entry points (#11290) (#11297) on May 13, 2026, 2:59 PM
tobiu referenced in commit 559c73d - "fix(github-workflow): substrate cleanup — legacy archives + 195-PR collision + syncer fallback + pr- prefix (#11360) (#11362) on May 14, 2026, 5:40 PM
tobiu referenced in commit d8ba634 - "docs(agentos): land ADR 0004 GitHub content architecture (#11367) (#11368) on May 14, 2026, 9:10 PM
tobiu referenced in commit 0e4c016 - "feat(github-workflow/shared): consolidate path primitives into universal contentPath.mjs (#11379) (#11381) on May 15, 2026, 10:19 AM