LearnNewsExamplesServices
Frontmatter
id10469
titleCodify Verify-Before-Assert Pre-Flight Check in AGENTS.md core directives
stateClosed
labels
documentationenhancementaiarchitecture
assigneesneo-opus-4-7
createdAtApr 28, 2026, 12:41 PM
updatedAtMay 15, 2026, 2:44 PM
githubUrlhttps://github.com/neomjs/neo/issues/10469
authorneo-opus-4-7
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 1, 2026, 12:46 PM

Codify Verify-Before-Assert Pre-Flight Check in AGENTS.md core directives

Closeddocumentationenhancementaiarchitecture
neo-opus-4-7
neo-opus-4-7 commented on Apr 28, 2026, 12:41 PM

Context

Surfaced as the root finding of a 24-hour arc spanning the SQLite IN-clause overflow incident (#10463 / #10464 / #10465 / #10466), the demo cancellation, the Antigravity wake editor-corruption bug (#10467 / #10468), and the panic-test retrospective. @tobiu identified the meta-pattern: under pressure (and even without it), agents repeatedly skipped empirical validation when the falsifying tools were one tool-call away. Existing per-phase tickets (#9975, #9969, #9948, #9812, #9844) each address ONE phase of the discipline (intake / review / commit / creation). What's missing is the umbrella Pre-Flight Check that applies BEFORE any factual assertion in any public artifact at any phase.

The Problem

Five empirical anchors from this session arc, each demonstrating the same root cognitive failure:

  1. Cursor speculation in #10411 (Gemini, 2026-04-27). Added appName === 'Cursor' → 'l' to the bridge daemon dispatch without empirically smoke-testing Cursor end-to-end for 2026. Removed via #10468 commit-2 the next day under @tobiu's Truth-in-Code framing. Tool not used: smoke-test on a Cursor harness.
  2. "Merge gate violation" hallucination (Gemini, 2026-04-28). Asserted in an A2A retrospective that "we (or one of us) overstepped and executed a merge" based on @tobiu's rhetorical comment ("you overruled the merge gate policy"). Almost propagated into a public Discussion. Self-corrected via gh pr view --json mergedBy ~14 minutes later, which showed all merges executed by tobiu, zero by agents. Tool not used: gh pr view --json mergedBy (read-only, available throughout).
  3. 4-options Fix framing in ticket #10467 (Claude, 2026-04-28). Filed an actionable ticket with 4 abstract options A-D instead of the single empirical fix. Edit-in-placed after @tobiu flagged the noise. Tool not used: read the existing dispatch at bridge-daemon.mjs:506 first to identify the SPECIFIC missing knob (Antigravity wasn't in the appName switch). Pattern-abstraction substituted for empirical specificity.
  4. PR review template-discipline skip (Claude, 2026-04-28). Posted PR reviews on #10464 / #10466 via gh CLI under demo time pressure with topical substance but flat structure, missing the canonical sectioned form the Retrospective daemon regex-matches. Edit-in-placed after @tobiu flagged. Tool not used: load .agent/skills/pr-review/assets/pr-review-template.md before composing.
  5. Cmd+L shortcut challenge to @tobiu (Claude, 2026-04-28, no time pressure). Challenged the merged #10468 dispatch's 'Antigravity' → 'l' mapping based on extrapolation from VS Code's "Go to Line" keybinding semantics. The challenge was wrong — Cmd+L in Antigravity exists, focuses the agent input field, and is idempotent across visible/hidden panel states (empirically confirmed by @tobiu after my challenge). Tool not used: WebSearch for "Antigravity shortcut to focus agent" — the canonical authoritative answer (Google's own Codelabs + Antigravity Cheat Sheet 2026 + Antigravity Lab + antigravity.google/docs) is a single web-search away and unambiguously confirms Cmd+L. Internal-repo tools (gh, sqlite3, etc.) cannot answer "what does Antigravity do" — Antigravity launched 2026, post-training-cutoff for most models, so WebSearch is the appropriate falsifying tool for this class of question. Familiar-system-semantics extrapolation (VS Code's "Go to Line") substituted for actual-system verification.

Pattern across all five: the failing agent had the right tool one tool-call away. None were tool-availability gaps. All were stress-compression OR familiar-pattern-extrapolation bypasses of empirical validation. Anchor #5 specifically rules out "stress" as the root cause — the discipline failure happens under low-pressure conditions too, when familiar-system semantics seduce the agent into skipping verification.

The Architectural Reality

Files this ticket modifies:

  • AGENTS.md — adds a new section codifying the Verify-Before-Assert Pre-Flight Check, modeled on the existing reasoning-statement Pre-Flights (§3 Gate 1 Pre-Commit Check, §4.2 Consolidate-Then-Save). The Pre-Flight is a commitment-statement the agent makes in its internal reasoning BEFORE writing a factual claim into a public artifact: "To assert X, I will run [specific tool] and let the result determine the assertion."
  • .agent/skills/pr-review/references/pr-review-guide.md — references the AGENTS.md section as a §7-tier discipline floor (alongside §7.1 Minimum-One-Challenge, §7.4 Rhetorical-Drift Audit). Adds a check item: "Before asserting any factual claim about prior PRs, commits, or repository state in a review, run the empirical tool that would falsify it (gh pr view, git log, gh issue view, etc.)."
  • .agent/skills/ticket-create/references/ticket-create-workflow.md — references the AGENTS.md section in §2 Stage 2 Prescription discipline. The 5-stage challenge already implicitly assumes empirical grounding; making it explicit closes the gap that produced the 4-options framing in #10467.
  • .agent/skills/ticket-intake/references/ticket-intake-workflow.md (or its current equivalent) — references the AGENTS.md section as the discipline that catches premise-without-verification at intake.

Memory file already saved (private layer, not part of this ticket): feedback_verify_before_assert.md codifies the umbrella discipline at the agent's persistent-memory layer. This ticket is the public-substrate counterpart.

The Fix

Single prescription. Add a new section to AGENTS.md at an appropriate placement (suggested: between §3 Pre-Commit Hard Gates and §4 Memory Core Protocol, since Verify-Before-Assert is the cognitive substrate the per-phase gates ride on top of). Section content is the codification of the discipline:

  • Statement of the discipline: before asserting any factual claim, architectural premise, or framing in a public artifact (PR review, ticket body, Discussion, comment, commit message, public memory entry), run the empirical tool that would falsify it. The tools are always available, always read-only, always cheap.
  • Pre-Flight Check shape: state in internal reasoning BEFORE writing the claim — "To assert X, I will run [specific tool] and let the result determine the assertion." The commitment-statement is the gate that permits writing the claim.
  • Tool inventory (non-exhaustive): gh pr view --json [field], git log, gh issue view, query_documents, query_raw_memories, query_summaries, ask_knowledge_base, grep, direct read-only file inspection, sqlite3 read queries, ps/mdfind/lsof for system state. For subjects outside the model's training-data cutoff (e.g., 2026-released tools, recent product changes): WebSearch against authoritative sources (vendor docs, official cheat-sheets, primary specs). Internal-repo tools cannot falsify external claims; web search is the appropriate substrate for "what does product X actually do as of [recent year]" questions.
  • Anti-patterns: familiar-system-semantics extrapolation (see anchor #5), pattern-abstraction without empirical specificity (#3), self-recrimination assertions under stress without verification (#2), inherited-speculation propagation (#1), template-discipline shortcuts under time pressure (#4).
  • Cross-skill enforcement: the per-phase skills (pr-review, ticket-create, ticket-intake) reference this section as the discipline floor underneath their phase-specific gates.

Followed by per-skill reference updates: each skill file gains a 2-3 line reference to the AGENTS.md section, naming the specific tool-set most relevant to that phase.

Acceptance Criteria

  • AGENTS.md gains a new section codifying Verify-Before-Assert as a Pre-Flight Check (statement of the discipline + Pre-Flight reasoning-statement shape + tool inventory + anti-patterns + cross-skill enforcement note)
  • .agent/skills/pr-review/references/pr-review-guide.md references the new AGENTS.md section in its §7 Depth Floor cluster
  • .agent/skills/ticket-create/references/ticket-create-workflow.md references the new AGENTS.md section in §2 Stage 2 Prescription
  • .agent/skills/ticket-intake/references/ticket-intake-workflow.md (or equivalent intake skill reference) references the new AGENTS.md section as the discipline catching premise-without-verification
  • Empirical anchors section in the new AGENTS.md content cites the five 2026-04-28 incidents above (linking #10411, #10467, #10468) — concrete reference material that survives context-pruning across future sessions
  • Existing related tickets (#9975, #9969, #9948, #9812, #9844) explicitly cited in the new AGENTS.md section as per-phase instances the umbrella subsumes
  • Commit subject ends with (#TICKET_ID) per AGENTS.md §3 Gate 1; type=docs, scope=agents (or ai if convention prefers); no <noreply@*> Co-Authored-By footers per §0 invariant 4

Out of Scope

  • Tooling-layer enforcement (linter, hook, automated check that flags ungrounded claims pre-write). Worth a future Discussion if the discipline-layer codification proves insufficient. This ticket is discipline-layer only.
  • Retroactive cleanup of past public artifacts that violated Verify-Before-Assert (e.g., the original 4-options #10467 body — already edit-in-placed; the unstructured #10464/#10466 reviews — already edit-in-placed). Forward-looking discipline; past artifacts are case-studies, not cleanup targets.
  • Codifying specific tool-invocation thresholds (e.g., "run gh pr view if the assertion mentions a PR" — too prescriptive, and the discipline is about reasoning-state-before-assertion, not mechanical tool-routing). Keep the rule abstract; trust the agent to map situation → falsifying tool.
  • The Cmd+L→tabShortcut subscription configuration fix (separate ticket: Gemini's WAKE_SUB:b3d1179c has tabShortcut: null which suppresses the dispatch). Sibling scope, file separately.

Avoided Traps

  • Filing as a Discussion instead of an Issue. Rejected. The discipline is concretely actionable (AGENTS.md updated, skill files reference) with defined success criteria. Per ticket-create-workflow §9, Discussions are for shaping open questions; Issues are for actionable work. We don't have open questions — we have a clearly diagnosed discipline that needs codification.
  • Over-prescribing Pre-Flight templates per artifact type. Rejected. The discipline is "before asserting X, run the falsifying tool"; mechanical templates per artifact type calcify the rule into checklist-following rather than reasoning-state cultivation. Keep the AGENTS.md section abstract; let the per-skill references add phase-specific tool-inventory examples.
  • Treating this as duplicate of #9975 or #9969. Both cover specific phases (intake-time empirical verification, ticket-intake scaffolding). Verify-Before-Assert is the cross-phase umbrella that subsumes them as instance-coverage. Ticket explicitly cites both in Related so the relationship is graph-traversable.
  • Including a tooling-layer prescription in this ticket. Rejected. Discipline-layer codification first; if discipline proves insufficient, a separate Discussion can shape the tooling-layer question. Mixing layers in one ticket creates scope-creep risk.
  • Bundling the Cmd+L subscription-config fix into the same ticket. Rejected. That's a substrate-config bug; this is a discipline codification. Different scopes, different lifecycles, different reviewers may want to weigh in. File separately.

Related

  • Per-phase instances this umbrella subsumes:
    • #9975 — Hardening Agent Intake via Empirical Verification (intake-time empirical proof)
    • #9969 — ticket-intake skill (Pre-Execution Reflection Gate)
    • #9948 — Stepping Back Self-Reflection Protocol (PR self-review)
    • #9812 — Meta Gate 0 Deduplication (creation-time pre-check)
    • #9844 — Safe Commit Pipeline / CommitGate (pre-commit validation)
  • Empirical-anchor incidents:
    • #10411 — PR introducing Cursor speculation (anchor #1)
    • #10467 — bridge daemon wake editor corruption ticket (anchor #3, my 4-options framing)
    • #10468 — bridge daemon wake fix PR (anchors #4, #5 — review template skip + Cmd+L challenge)
  • Adjacent substrate fixes from the same arc (cross-link for graph traversal, not direct dependencies):
  • Sibling not-yet-filed: the subscription-configuration fix where WAKE_SUB:b3d1179c has tabShortcut: null suppressing the merged dispatch. Out of scope here; file separately.

Origin Session ID: 4bb6859b-860f-440d-9055-320e20b0ee22

Retrieval Hint: verify before assert Pre-Flight Check empirical validation falsifying tool AGENTS.md core directive cross-phase umbrella discipline familiar-system-semantics extrapolation panic-test retrospective

tobiu referenced in commit bd4cc65 - "docs(agent): Codify Verify-Before-Assert Pre-Flight Check in core directives (#10471) on Apr 28, 2026, 1:28 PM
tobiu referenced in commit 0137bbb - "docs(agents): elevate friction→gold + verify-before-assert as core values (#11092) (#11098) on May 10, 2026, 2:05 PM