Context
pr-review-guide §7.4 "Rhetorical-Drift Audit" (introduced by #10301) catches a specific drift shape: PR-body framing vs. mechanical code reality within a single PR. Empirically effective for the case it was designed for.
Today's incident surfaced a distinct, complementary drift shape that §7.4 does NOT currently catch:
Reviewer-Seeded Drift Across PR Boundaries — when a reviewer plants a "Future Enhancement" / "non-blocking observation" / "follow-up suggestion" in PR-A's review, the observation becomes the premise of PR-B without any V-B-A step in between. The seed enters PR-B's implementation under the protective halo of "prior peer-review identified this gap", short-circuiting the empirical-verification gate that should fire at implementation time.
Empirical Anchor
The #11149 → #11153 cascade (2026-05-10):
| Step |
Actor |
Artifact |
Drift |
| 1 |
@neo-opus-4-7 |
#11149 cycle-1 commentId 4416299001 |
Planted "Future Enhancement: pnpm node_modules/.pnpm/neo.mjs@*/... layout heuristic" WITHOUT V-B-A'ing whether Neo uses pnpm. (It does not — operator-confirmed: npm only.) |
| 2 |
@neo-gpt |
#11153 (filed, branch, code, tests) |
Treated the planted seed as a known gap and implemented a 3-tier heuristic + ticket #11152. Reasonable author behavior given an unchallenged peer observation. |
| 3 |
@neo-opus-4-7 |
#11153 cycle-1 commentId 4416341434 |
Code-shape APPROVED. Reaffirmed pnpm as "existing gap, not regression". No V-B-A. |
| 4 |
@neo-opus-4-7 |
#11153 cycle-2 commentId 4416355617 |
Formal APPROVED flip. No empirical recheck. Pure rubber-stamp on prior framing. |
| 5 |
@tobiu |
direct intervention |
"brutal hallucinated BULLSHIT. npx neo-app => creates a neo workspace... there is simply no node_modules/.pnpm. hardcore VBA failure." |
| 6 |
@neo-opus-4-7 |
#11153 cycle-3 retraction (pullrequestreview-4259953644) |
Approval withdrawn; recommended close-unmerged. |
Empirical cost: 1 hallucinated feature, 5 files of code, 1 new test for a pattern Neo doesn't use, 2 review cycles that compounded the drift instead of catching it. Operator's direct intervention required to break the cascade.
Why §7.4 did not catch it: §7.4's audit checklist focuses on this PR's body-vs-diff symmetry. The cycle-1 / cycle-2 reviews of #11153 did pass §7.4 checks — body did describe diff. The drift lived upstream, in #11149's cycle-1 observation that became #11153's premise. §7.4 has no cross-PR scope.
The Problem
§7.4 enforces V-B-A on PR-body claims about the current diff. It does not enforce V-B-A on observations a reviewer plants as "Future Enhancement" suggestions, even though those observations frequently graduate into:
- A follow-up ticket (the seed becomes a backlog item)
- A follow-up PR (the seed becomes implementation premise)
- Both (the seed becomes a complete substrate change)
The plant-time observation carries authority disproportionate to its empirical weight — it's a peer-review artifact, perceived as having passed reviewer scrutiny. Future implementers / authors / reviewers treat it as a verified gap rather than a hypothesis.
The Architectural Reality
pr-review-guide.md currently structures §7.4 around the following audit dimensions (verified by reading the current file):
- PR description framing matches diff
- No metaphor overshoot
- No
[RETROSPECTIVE] tag misuse
- Anchor Summaries describe boundaries accurately
The substrate gap: no audit dimension addresses observations the reviewer adds (Future Enhancement bullets, Non-Blocking Observations, follow-up suggestions). Those exit the review carrying reviewer authority but bypass V-B-A by construction.
Adjacent substrate already deployed:
- #10301 (closed, parent of §7.4): introduced the audit
- #10776 (OPEN, sibling-not-duplicate): captures follow-ups as actual tickets via
update_issue_relationship post-review. Addresses post-merge discoverability, not plant-time V-B-A.
The Fix
Extend .agents/skills/pr-review/references/pr-review-guide.md §7.4 with a Reviewer-Seeded Drift sub-section enforcing:
- Plant-Time V-B-A Pre-Flight: when a reviewer is about to add a Future Enhancement / Non-Blocking Observation / follow-up suggestion to a review, the reviewer MUST V-B-A the premise FIRST via the same tool inventory described in
learn/agentos/AGENTS_ATLAS.md §2 (V-B-A core value). The seeder owns V-B-A cost at plant-time, not at implement-time.
- Plant Tag Discipline: Reviewer-planted observations explicitly tagged with their evidence class (
L1: static-citation, L2: sandbox-runtime-verify, L3: live-service-verify, per learn/agentos/evidence-ladder.md). Unverified hunches MUST be tagged L0: hypothesis — needs V-B-A before implementation.
- Cross-PR Drift Audit Dimension: New audit checkpoint in §7.4: "Did this PR's premise originate as an unverified reviewer observation on a prior PR? If yes, did the author/reviewer V-B-A it at this PR's plant time?"
Companion: update .agents/skills/pr-review/assets/pr-review-template.md and pr-review-followup-template.md to surface the new audit dimensions in the standard review structure.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
pr-review-guide.md §7.4 sub-section |
learn/agentos/AGENTS_ATLAS.md §2 (V-B-A core value) |
New "Reviewer-Seeded Drift" sub-section + plant-time Pre-Flight + L0/L1/L2/L3 evidence-class tagging |
Reviewers continue planting without V-B-A; cascade risk persists |
Inline in §7.4 |
L1: this ticket cites concrete commentId 4416299001 cascade |
pr-review-template.md audit checklist |
pr-review-guide.md §7.4 |
New audit row for "reviewer-planted observations have evidence class tagged" |
Static review structure misses the dimension |
Inline in template |
L1: derived from above |
pr-review-followup-template.md |
pr-review-guide.md §7.4 |
Cycle-N reviews include audit of any seeds planted in cycle-(N-1) |
Drift compounds across cycles |
Inline |
L1: derived from above |
Acceptance Criteria
Out of Scope
- Mechanical hook that blocks PR review submission containing untagged plant observations — discipline-first; escalation only if discipline insufficient after 1-2 reflection cycles. File as scope-extension ticket if escalation needed.
- Auto-classification of planted observations into L0/L1/L2/L3 via NLP — manual tagging discipline at plant time; auto-classification is heavier scope and risks false confidence. Defer.
- Retroactive audit of historical PR reviews for unverified seeds — substrate-archaeology is unbounded. Forward-looking discipline only.
- Extension of V-B-A discipline to non-review surfaces (commit messages, ticket bodies, A2A messages) — those have separate V-B-A loci already documented in
AGENTS_ATLAS §2. This ticket scopes to pr-review skill.
Avoided Traps
- Frame as new top-level §10 or §11 instead of §7.4 extension: rejected. §7.4 already owns drift-audit semantics; cross-PR drift is a sub-shape of the same concept. Splitting into §10 would fragment the skill and create discoverability gaps.
- Frame as substrate-update to
claudeMd §3.5 V-B-A core value directly: rejected. The core value is the foundation; this ticket operationalizes one specific application context (pr-review skill). Core-value tier substrate changes need higher bar (multi-cycle peer dialogue per claudeMd §13.2); skill-tier substrate changes ship per ticket.
- Bundle with #10776 close-completion: rejected. #10776 covers post-review capture mechanics; this ticket covers plant-time V-B-A discipline. Different mechanisms, different acceptance criteria, complementary scope. Two clean tickets > one bundled ticket with conflated AC.
- Defer until rule-friction-capture cycle aggregates more empirical anchors: rejected. Today's cascade IS a high-cost empirical anchor (operator intervention required to break it). Single anchor is sufficient when the cost is concrete and the prescription is bounded.
Related
- Parent §7.4 ticket: #10301 (closed, Gemini-authored, Opus-implemented) — introduced §7.4 Rhetorical-Drift Audit. This ticket extends.
- Sibling follow-up substrate: #10776 (OPEN) — post-review capture of follow-ups via
update_issue_relationship. Complementary half.
- Empirical anchor PR: #11149 (merged, Gemini-authored — content correct), #11153 (open, GPT-authored — retracted, recommending close-unmerged), #11152 (open, parent ticket of #11153 — premise empirically false).
- Peer-review thread: retraction at https://github.com/neomjs/neo/pull/11153#pullrequestreview-4259953644
- Peer endorsement: @neo-gemini-3-1-pro endorsed substrate-evolution shape + volunteered peer-review of resulting PR (A2A MESSAGE:139f6a81-c3dc-4ce4-93fd-2f985e3e2a9d)
- Core-value foundation:
claudeMd §3.5 Verify-Before-Assert + AGENTS_ATLAS §2
- Evidence-class framework:
learn/agentos/evidence-ladder.md
Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1
Retrieval Hint: query_raw_memories(query="reviewer-seeded drift across PR boundaries plant-time V-B-A future enhancement pnpm hallucination cascade")
Context
pr-review-guide §7.4"Rhetorical-Drift Audit" (introduced by #10301) catches a specific drift shape: PR-body framing vs. mechanical code reality within a single PR. Empirically effective for the case it was designed for.Today's incident surfaced a distinct, complementary drift shape that §7.4 does NOT currently catch:
Reviewer-Seeded Drift Across PR Boundaries — when a reviewer plants a "Future Enhancement" / "non-blocking observation" / "follow-up suggestion" in PR-A's review, the observation becomes the premise of PR-B without any V-B-A step in between. The seed enters PR-B's implementation under the protective halo of "prior peer-review identified this gap", short-circuiting the empirical-verification gate that should fire at implementation time.
Empirical Anchor
The #11149 → #11153 cascade (2026-05-10):
node_modules/.pnpm/neo.mjs@*/...layout heuristic" WITHOUT V-B-A'ing whether Neo uses pnpm. (It does not — operator-confirmed: npm only.)Empirical cost: 1 hallucinated feature, 5 files of code, 1 new test for a pattern Neo doesn't use, 2 review cycles that compounded the drift instead of catching it. Operator's direct intervention required to break the cascade.
Why §7.4 did not catch it: §7.4's audit checklist focuses on this PR's body-vs-diff symmetry. The cycle-1 / cycle-2 reviews of #11153 did pass §7.4 checks — body did describe diff. The drift lived upstream, in #11149's cycle-1 observation that became #11153's premise. §7.4 has no cross-PR scope.
The Problem
§7.4 enforces V-B-A on PR-body claims about the current diff. It does not enforce V-B-A on observations a reviewer plants as "Future Enhancement" suggestions, even though those observations frequently graduate into:
The plant-time observation carries authority disproportionate to its empirical weight — it's a peer-review artifact, perceived as having passed reviewer scrutiny. Future implementers / authors / reviewers treat it as a verified gap rather than a hypothesis.
The Architectural Reality
pr-review-guide.mdcurrently structures §7.4 around the following audit dimensions (verified by reading the current file):[RETROSPECTIVE]tag misuseThe substrate gap: no audit dimension addresses observations the reviewer adds (Future Enhancement bullets, Non-Blocking Observations, follow-up suggestions). Those exit the review carrying reviewer authority but bypass V-B-A by construction.
Adjacent substrate already deployed:
update_issue_relationshippost-review. Addresses post-merge discoverability, not plant-time V-B-A.The Fix
Extend
.agents/skills/pr-review/references/pr-review-guide.md§7.4 with a Reviewer-Seeded Drift sub-section enforcing:learn/agentos/AGENTS_ATLAS.md §2(V-B-A core value). The seeder owns V-B-A cost at plant-time, not at implement-time.L1: static-citation,L2: sandbox-runtime-verify,L3: live-service-verify, perlearn/agentos/evidence-ladder.md). Unverified hunches MUST be taggedL0: hypothesis — needs V-B-A before implementation.Companion: update
.agents/skills/pr-review/assets/pr-review-template.mdandpr-review-followup-template.mdto surface the new audit dimensions in the standard review structure.Contract Ledger Matrix
pr-review-guide.md§7.4 sub-sectionlearn/agentos/AGENTS_ATLAS.md§2 (V-B-A core value)pr-review-template.mdaudit checklistpr-review-guide.md§7.4pr-review-followup-template.mdpr-review-guide.md§7.4Acceptance Criteria
pr-review-guide.md§7.4 extended with "Reviewer-Seeded Drift" sub-section defining: cross-PR drift shape; plant-time vs implement-time V-B-A ownership; the seeder owns the V-B-A cost.AGENTS_ATLAS §2shape): "Before planting Future-Enhancement observation X, I will run [tool] to V-B-A the premise."L0: hypothesis — needs V-B-A before implementation.pr-review-template.mdadds the new audit checkpoint to its existing audit checklist structure.pr-review-followup-template.mdadds cycle-N audit dimension for cycle-(N-1) seeds.Out of Scope
AGENTS_ATLAS §2. This ticket scopes to pr-review skill.Avoided Traps
claudeMd §3.5V-B-A core value directly: rejected. The core value is the foundation; this ticket operationalizes one specific application context (pr-review skill). Core-value tier substrate changes need higher bar (multi-cycle peer dialogue perclaudeMd §13.2); skill-tier substrate changes ship per ticket.Related
update_issue_relationship. Complementary half.claudeMd §3.5Verify-Before-Assert +AGENTS_ATLAS §2learn/agentos/evidence-ladder.mdOrigin Session ID: c2912891-b459-4a03-b2af-154d5e264df1
Retrieval Hint:
query_raw_memories(query="reviewer-seeded drift across PR boundaries plant-time V-B-A future enhancement pnpm hallucination cascade")