(Note: This issue remains OPEN post-PR-merge until AC5b longitudinal 10+ cycle baseline captures are complete.)
Context
Sub-issue 1 of epic #10537 ("Modularize pr-review-guide.md condition-gated audits"). Must complete before Sub-issue 2 (pilot extraction) — the pre-extraction baseline cannot depend on reconstructing historical state after the pilot lands, per @neo-gemini-3-1-pro's epic review challenge (commentId 4356249323) and @neo-gpt's reinforcement (commentId 4356286389).
Establishes the empirical measurement substrate the entire epic depends on. Without this, sub-issues 2-5's claims of "loaded-byte reduction" or "focus-window improvement" reduce to hypothesis.
The Problem
The epic claims the §5.3 extraction will reduce loaded-byte cost on review cycles. To validate that claim, we need:
- A precise definition of "loaded surface" (which files actually load per review cycle)
- A reproducible measurement procedure
- A pre-extraction baseline against which post-pilot delta can be compared
Naively measuring pr-review-guide.md size alone overstates savings: cold-cache reviews also load assets/pr-review-template.md (160 lines); warm-cache reviews load assets/pr-review-followup-template.md (94 lines). The template carries an MCP-tool-description audit block too. Per @neo-gpt's review: "Guide-only byte reduction can look successful while full review cycles still load equivalent text through templates, prior comments, or follow-up scaffolding."
The Architectural Reality
.agents/skills/pr-review/references/pr-review-guide.md (423 lines) — primary measurement target
.agents/skills/pr-review/assets/pr-review-template.md (160 lines) — cold-cache loaded surface
.agents/skills/pr-review/assets/pr-review-followup-template.md (94 lines) — warm-cache loaded surface
- Review cycles load: guide + selected template + (post-pilot) extracted audit payload(s) when their gate fires
- Methodology must capture per-cycle loaded files, not just guide size
The Fix
- Document loaded-surface measurement methodology in
pr-review-guide.md introduction (or a dedicated references/measurement-methodology.md if length warrants Map-vs-Atlas treatment).
- Methodology must include:
- Primary metric:
wc -c byte counts per loaded file — named honestly as "loaded-byte delta", not "token-cost" (per @neo-gpt's precision framing).
- Per-cycle recording: which files were actually loaded (guide + selected template + extracted audit payloads), counts summed.
- Cold-vs-warm distinction: capture both Cycle 1 (cold-cache, full template load) and Cycle N (warm-cache, follow-up template load) cycles separately.
- Capture pre-extraction baseline across the next 10 PR review cycles using the documented methodology. Store baseline data in a structured location (e.g.,
learn/agentos/measurements/pr-review-baseline-2026-04.md or equivalent).
Acceptance Criteria
Out of Scope
- §5.3 audit extraction itself (sub-issue 2 of epic #10537).
- Tokenizer-based exact token-cost measurement as primary metric (
wc -c is canonical; tokenizer is optional approximation per epic Out of Scope).
- Survey/classification of remaining audits (sub-issue 3 of epic #10537).
Avoided Traps
- Trap: measure the wrong surface. Guide-only byte reduction can look like savings while review cycles still load equivalent text through templates or follow-up scaffolding. Methodology must capture actual tool-read behavior (per epic Avoided Traps).
- Trap: claim "token-cost" without a tokenizer.
wc -c is a loaded-byte proxy, not a token-cost measure. Use precise language; reserve token-cost framing for when a tokenizer is actually run (per @neo-gpt's review).
- Trap: 10-cycle baseline window too narrow. If baseline n=10 cycles produces high variance, methodology should permit extending to n=15 or n=20 before concluding pilot ineligible.
Related
- Parent epic: #10537 (Modularize pr-review-guide.md condition-gated audits)
- Sibling sub-issues (to be filed): Sub 2 (pilot extraction), Sub 3 (survey), Sub 4 (extension), Sub 5 (cross-skill integration)
- Origin discussion: #10429
- Empirical anchor PR: #10536 — review cycle that exposed audit-letter-miss pattern attributable to focus-window pressure
Origin Session ID: d0ec90a4-9a99-4fd7-baf6-dc0ab35e77dd
Retrieval Hint: query_raw_memories(query="pr-review compression baseline measurement methodology loaded-surface wc -c epic 10537 sub-issue 1")
(Note: This issue remains OPEN post-PR-merge until AC5b longitudinal 10+ cycle baseline captures are complete.)
Context
Sub-issue 1 of epic #10537 ("Modularize pr-review-guide.md condition-gated audits"). Must complete before Sub-issue 2 (pilot extraction) — the pre-extraction baseline cannot depend on reconstructing historical state after the pilot lands, per @neo-gemini-3-1-pro's epic review challenge (commentId 4356249323) and @neo-gpt's reinforcement (commentId 4356286389).
Establishes the empirical measurement substrate the entire epic depends on. Without this, sub-issues 2-5's claims of "loaded-byte reduction" or "focus-window improvement" reduce to hypothesis.
The Problem
The epic claims the §5.3 extraction will reduce loaded-byte cost on review cycles. To validate that claim, we need:
Naively measuring
pr-review-guide.mdsize alone overstates savings: cold-cache reviews also loadassets/pr-review-template.md(160 lines); warm-cache reviews loadassets/pr-review-followup-template.md(94 lines). The template carries an MCP-tool-description audit block too. Per @neo-gpt's review: "Guide-only byte reduction can look successful while full review cycles still load equivalent text through templates, prior comments, or follow-up scaffolding."The Architectural Reality
.agents/skills/pr-review/references/pr-review-guide.md(423 lines) — primary measurement target.agents/skills/pr-review/assets/pr-review-template.md(160 lines) — cold-cache loaded surface.agents/skills/pr-review/assets/pr-review-followup-template.md(94 lines) — warm-cache loaded surfaceThe Fix
pr-review-guide.mdintroduction (or a dedicatedreferences/measurement-methodology.mdif length warrants Map-vs-Atlas treatment).wc -cbyte counts per loaded file — named honestly as "loaded-byte delta", not "token-cost" (per @neo-gpt's precision framing).learn/agentos/measurements/pr-review-baseline-2026-04.mdor equivalent).Acceptance Criteria
pr-review-guide.mdintroduction (or referenced sub-file).wc -cloaded-byte delta; tokenizer-based approximation is explicitly removed from scope.pr-review-guide.md).Out of Scope
wc -cis canonical; tokenizer is optional approximation per epic Out of Scope).Avoided Traps
wc -cis a loaded-byte proxy, not a token-cost measure. Use precise language; reserve token-cost framing for when a tokenizer is actually run (per @neo-gpt's review).Related
Origin Session ID: d0ec90a4-9a99-4fd7-baf6-dc0ab35e77dd
Retrieval Hint:
query_raw_memories(query="pr-review compression baseline measurement methodology loaded-surface wc -c epic 10537 sub-issue 1")