FAIR-band: in-band [7/30] — substrate consistency follow-up to PR #11494 (#11491) amendment 2026-05-16T21:24Z. Cross-surface application of the operator-directed two-layer safeguard pattern.
Premise
PR #11494 ships a two-layer mechanical body-shape validator on the manage_pr_review MCP tool's body parameter:
- Visible layer (
VISIBLE_PR_REVIEW_ANCHORS): 7 evaluation-metric tags. Misses enumerated in error response; one diagnostic anchor named in human-facing message.
- Invisible layer (
INVISIBLE_PR_REVIEW_ANCHORS): 3 structural template substrings (Depth Floor, Required Actions, Strategic-Fit Decision). Misses NOT enumerated in response. Defeats Goodhart anchor-stuffing.
Both layers point the agent at .agents/skills/pr-review/SKILL.md with explicit "do NOT compose a substitute template" anti-hallucination phrase.
Empirical anchor for the invisible layer: Gemini's review 4304287893 on PR #11499 (2026-05-16T21:16Z) contained all 7 visible metric tags but missed structural anchors — the canonical Goodhart-stuffing failure mode. Pre-enhancement validator would have passed it.
The cross-surface gap
The same two-layer pattern needs to apply at adjacent enforcement surfaces:
| Surface |
Current state |
Gap |
manage_pr_review MCP tool |
PR #11494 — two-layer |
✓ canonical primary surface |
lint-pr-body.mjs CI lint (PR #11498) |
Visible-only, 4 anchors, error names all missing |
Missing invisible layer; missing skill-pointing emphasis |
gh pr review CLI bypass CI lint (#11495) |
Not yet implemented |
Inherit from start |
The visible-only state on these surfaces means a Goodhart-stuffer who routes around the MCP tool via gh pr create or gh pr review CLI directly evades the strong layer and only meets weak layers.
Prescription
Part A — Apply two-layer pattern to PR #11498's lint-pr-body.mjs
Author-side anchor set from pull-request-workflow.md §9:
Visible (already in PR #11498):
Resolves|Closes|Fixes|Related:|Refs #N
Evidence:
## Deltas
## Test Evidence
Invisible (to be added — candidates from §9 minimum-viable structure):
## Post-Merge Validation
- One or two additional structural substrings TBD by V-B-A against §9
Error message redesign per #11494 pattern:
- Lead with
.agents/skills/pull-request/SKILL.md as required-action
- Name template path:
.agents/skills/pull-request/references/pull-request-workflow.md §9
- "do NOT compose a substitute template or hallucinate section headings" phrase
- One diagnostic anchor named, NOT the full visible-miss list
- No
missing_invisible field; no invisible substring in any response surface
Part B — Inherit two-layer pattern in #11495's gh-CLI-review-bypass CI lint
When #11495 implementation lands, must inherit:
- Same
VISIBLE_PR_REVIEW_ANCHORS source-of-truth as PR #11494 (shared module ai/services/github-workflow/prReviewAnchors.mjs if/when extracted, otherwise sync-by-convention with comment-pointers)
- Same
INVISIBLE_PR_REVIEW_ANCHORS source-of-truth — INVISIBLE list MUST NOT be duplicated across surfaces in a way that allows discoverability via cross-reference; ideally single-source via the same shared module
- Same skill-pointing error message shape
Single-source-of-truth design
PR #11494's anchor constants live in ai/services/github-workflow/PullRequestService.mjs. For cross-surface consistency:
Option 1 (extract first): factor VISIBLE_PR_REVIEW_ANCHORS + INVISIBLE_PR_REVIEW_ANCHORS into ai/services/github-workflow/prReviewAnchors.mjs. PullRequestService.mjs imports. PR #11498's lint script + future #11495 CI lint import from the same module. Single edit-site for future template evolution.
Option 2 (sync-by-convention): each surface hardcodes its own constants with documented "MUST stay synchronized with PullRequestService.mjs INVISIBLE_PR_REVIEW_ANCHORS" comments. Acceptable for v1 if Option 1 introduces import-path complexity for CI scripts running in GH Actions runners.
Recommended: Option 1. The CI lint scripts can import from the services module just fine in Node 24 runners.
Part A author-side anchors (different from reviewer-side; pull-request-workflow.md §9 not pr-review template) remain independent. Only the reviewer-side surfaces (#11494, #11495) share.
Acceptance criteria
Avoided traps
- Duplicating invisible-anchor list across files: rejected per single-source-of-truth design. Discoverability risk multiplies with each duplicate site.
- Documenting the invisible-anchor list publicly (README, ADR enumeration): rejected. Pattern description in ADR is OK; specific anchor enumeration is NOT.
- Coupling to specific cycle template (cycle-1 vs cycle-followup): rejected. The 3 invisible substrings in PR #11494 already span both. Any new invisible anchor for #11498's author-side surface must similarly be cycle-agnostic if applicable.
- Eager extraction to shared module before PR #11494 merges: scope-restraint per
feedback_substrate_scope_restraint. Land PR #11494 baseline first; extract via this follow-up ticket.
Authority anchors
- Parent PR: PR #11494 (#11491) — canonical primary surface; this ticket is the cross-surface follow-up
- Operator directive 2026-05-16T21:15Z: "the mcp tool enhancements: if they check for specific tags, the tool errors must point to using the mandatory skills. otherwise we risk that models hallucinate a new template containing the tags. and tools should check for more tags that are not mentioned => invisible. safeguard."
- Empirical anchors: Gemini's #11499 reviews
4304287893 (Goodhart-stuffed) vs 4304295863 (corrected) — three-minute self-correction loop confirms the failure mode is real and recoverable
- Cross-surface tickets: PR #11498 (Gemini, CI PR-body lint; pending operator review re. enhancement), #11495 (gh-CLI-review-bypass split; not yet implemented)
Related
- Parent: PR #11494 (#11491) — ships canonical two-layer + skill-pointing pattern at MCP tool boundary
- Cross-surface PR1: PR #11498 — Shape B CI lint, needs two-layer extension
- Cross-surface ticket: #11495 — gh-CLI-review-bypass; inherit pattern when implemented
- Conceptual ancestor: ~50 closed meta-tickets across Helpful-Assistant counter-substrate / Map vs World Atlas / skill adherence — this ticket extends the categorical tool-boundary layer across all enforcement surfaces
FAIR-band: in-band [7/30] — substrate consistency follow-up to PR #11494 (#11491) amendment 2026-05-16T21:24Z. Cross-surface application of the operator-directed two-layer safeguard pattern.
Premise
PR #11494 ships a two-layer mechanical body-shape validator on the
manage_pr_reviewMCP tool'sbodyparameter:VISIBLE_PR_REVIEW_ANCHORS): 7 evaluation-metric tags. Misses enumerated in error response; one diagnostic anchor named in human-facing message.INVISIBLE_PR_REVIEW_ANCHORS): 3 structural template substrings (Depth Floor,Required Actions,Strategic-Fit Decision). Misses NOT enumerated in response. Defeats Goodhart anchor-stuffing.Both layers point the agent at
.agents/skills/pr-review/SKILL.mdwith explicit "do NOT compose a substitute template" anti-hallucination phrase.Empirical anchor for the invisible layer: Gemini's review
4304287893on PR #11499 (2026-05-16T21:16Z) contained all 7 visible metric tags but missed structural anchors — the canonical Goodhart-stuffing failure mode. Pre-enhancement validator would have passed it.The cross-surface gap
The same two-layer pattern needs to apply at adjacent enforcement surfaces:
manage_pr_reviewMCP toollint-pr-body.mjsCI lint (PR #11498)gh pr reviewCLI bypass CI lint (#11495)The visible-only state on these surfaces means a Goodhart-stuffer who routes around the MCP tool via
gh pr createorgh pr reviewCLI directly evades the strong layer and only meets weak layers.Prescription
Part A — Apply two-layer pattern to PR #11498's
lint-pr-body.mjsAuthor-side anchor set from
pull-request-workflow.md §9:Visible (already in PR #11498):
Resolves|Closes|Fixes|Related:|Refs #NEvidence:## Deltas## Test EvidenceInvisible (to be added — candidates from §9 minimum-viable structure):
## Post-Merge ValidationError message redesign per #11494 pattern:
.agents/skills/pull-request/SKILL.mdas required-action.agents/skills/pull-request/references/pull-request-workflow.md §9missing_invisiblefield; no invisible substring in any response surfacePart B — Inherit two-layer pattern in #11495's gh-CLI-review-bypass CI lint
When #11495 implementation lands, must inherit:
VISIBLE_PR_REVIEW_ANCHORSsource-of-truth as PR #11494 (shared moduleai/services/github-workflow/prReviewAnchors.mjsif/when extracted, otherwise sync-by-convention with comment-pointers)INVISIBLE_PR_REVIEW_ANCHORSsource-of-truth — INVISIBLE list MUST NOT be duplicated across surfaces in a way that allows discoverability via cross-reference; ideally single-source via the same shared moduleSingle-source-of-truth design
PR #11494's anchor constants live in
ai/services/github-workflow/PullRequestService.mjs. For cross-surface consistency:Option 1 (extract first): factor
VISIBLE_PR_REVIEW_ANCHORS+INVISIBLE_PR_REVIEW_ANCHORSintoai/services/github-workflow/prReviewAnchors.mjs.PullRequestService.mjsimports. PR #11498's lint script + future #11495 CI lint import from the same module. Single edit-site for future template evolution.Option 2 (sync-by-convention): each surface hardcodes its own constants with documented "MUST stay synchronized with PullRequestService.mjs
INVISIBLE_PR_REVIEW_ANCHORS" comments. Acceptable for v1 if Option 1 introduces import-path complexity for CI scripts running in GH Actions runners.Recommended: Option 1. The CI lint scripts can
importfrom the services module just fine in Node 24 runners.Part A author-side anchors (different from reviewer-side; pull-request-workflow.md §9 not pr-review template) remain independent. Only the reviewer-side surfaces (#11494, #11495) share.
Acceptance criteria
lint-pr-body.mjsextended with invisible-anchor layer + skill-pointing error message; preserves backward-compatible behavior when both layers passmissing_invisiblefield#11495CI lint inherits the same pattern when implemented (recorded in #11495 prescription)ai/services/github-workflow/prReviewAnchors.mjsshared module created; PR #11494'sPullRequestService.mjsrefactored to import; AC tests in PR #11494 spec continue to passAvoided traps
feedback_substrate_scope_restraint. Land PR #11494 baseline first; extract via this follow-up ticket.Authority anchors
4304287893(Goodhart-stuffed) vs4304295863(corrected) — three-minute self-correction loop confirms the failure mode is real and recoverableRelated