LearnNewsExamplesServices
Frontmatter
id11195
titleValidate Step 2.5 Architectural Step-Back post-merge (30-day observation)
stateOpen
labels
documentationenhancementaimodel-experience
assigneesneo-opus-ada
createdAtMay 11, 2026, 11:05 AM
updatedAtJun 7, 2026, 7:10 PM
githubUrlhttps://github.com/neomjs/neo/issues/11195
authorneo-opus-ada
commentsCount2
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]

Validate Step 2.5 Architectural Step-Back post-merge (30-day observation)

Open Backlog/active-chunk-11 documentationenhancementaimodel-experience
neo-opus-ada
neo-opus-ada commented on May 11, 2026, 11:05 AM

Context

Post-merge validation follow-up split from #11192 per @neo-gpt's PR #11194 Cycle 1 review — the close-target evidence mismatch.

#11192 (graduation Issue) carries implementation ACs (AC1-AC6 covering ideation-sandbox-workflow.md §5.2 + peer-role-mode.md §8 + AGENTS.md §3.5 + AGENTS.md §21 + Discussion #11188 body update marker + cross-family review). All 6 are static-review-verifiable + ship with PR #11194. But AC7 (≥2 Step 2.5 firings within 30 days of merge) + AC8 (substrate accretion net-byte audit by reviewer) are explicitly post-merge — they cannot exist at PR-merge time.

Per pull-request-workflow §9 close-target-evidence semantics, closing #11192 immediately would record completion before AC7+AC8 evidence exists. The clean shape is: split AC7+AC8 into this validation tracker, let #11192 close on PR #11194 merge for implementation ACs, then this ticket carries the 30-day post-merge verification.

The Problem

After PR #11194 merges (graduates Step 2.5 Architectural Step-Back), the discipline needs empirical verification across real ideation cycles to confirm:

  1. The 8-point cross-substrate sweep checklist actually surfaces substrate gaps that would have caused post-graduation friction.
  2. The high-blast-radius trigger criteria are calibrated correctly (not over-triggering on routine ideation, not under-triggering on actually-load-bearing decisions).
  3. The convergence-rate tripwire in peer-role-mode.md §8 fires at the right cadence.
  4. Substrate accretion stays within AGENTS.md §13 budget — net-byte audit of all 4 surfaces (ideation-sandbox-workflow.md, peer-role-mode.md, AGENTS.md).

Without this validation tracker, the discipline ships without an explicit "did it work?" check. Per AGENTS.md §13 Substrate Accretion Defense, every substrate-mutation needs decay-mitigation rationale; the verification IS the mitigation.

The Architectural Reality

Validation surfaces:

  • Memory Core: query_summaries(query="Step 2.5 STEP_BACK") after 30 days — count firings.
  • GitHub: search STEP_BACK comments on Discussions filed since PR #11194 merge.
  • A2A mailbox archive: count convergence-rate tripwire fires from peer-role-mode.md §8.
  • Static byte-count audit of the 3 substrate files post-merge vs pre-merge.

Source-of-truth: Memory Core (operator + swarm session memories) is the canonical record of ideation arcs. GitHub Discussions + STEP_BACK comments are the empirical surface.

The Fix

30-day observation window starting at PR #11194 merge timestamp:

  1. Day 0: capture baseline byte-counts of ideation-sandbox-workflow.md, peer-role-mode.md, AGENTS.md post-merge.
  2. Day 0 + 30: query Memory Core + GitHub for Step 2.5 firings; count actual cycles where the 8-point sweep ran.
  3. Day 0 + 30: net-byte audit: did the 3 substrate files grow beyond their landed baselines? If yes, audit reasons (organic refinement vs accretion).
  4. Verify ≥1 substrate gap surfaced: at least one of the captured Step 2.5 sweeps must have flagged a substrate gap that the proposal otherwise would have shipped with. If zero gaps surfaced across ≥2 firings, the discipline may be over-triggering or under-detecting; file substrate-evolution ticket to recalibrate.

Acceptance Criteria

  • AC1: Baseline byte-counts captured at PR #11194 merge timestamp for ideation-sandbox-workflow.md (§5.2 add), peer-role-mode.md (§8 third trigger), AGENTS.md (§3.5 pointer + §21 row extension). Capture as Day-0 anchor comment on this ticket.
  • AC2: At Day 30 post-merge, query Memory Core for Step 2.5 firings: query_summaries(query="Step 2.5 STEP_BACK 8-point cross-substrate sweep"). Count ≥2 firings → AC2 satisfied. If <2 firings, file recalibration ticket (under-triggering candidate).
  • AC3: At Day 30 post-merge, verify ≥1 captured firing surfaced a substrate gap (authority drift, path-determinism break, consumer-sweep miss, etc.) that the proposal would have shipped with. If 0 gaps surfaced across ≥2 firings, file recalibration ticket (false-positive candidate).
  • AC4: Net-byte audit of all 4 surfaces (ideation-sandbox-workflow.md, peer-role-mode.md, AGENTS.md §3.5, AGENTS.md §21) at Day 30. Document any growth + classify (organic refinement vs accretion). Sunset criteria from #11192 AC8 inherit here.
  • AC5: Post 30-day retrospective comment on this ticket summarizing: total firings, gap-detection rate, byte-growth, calibration findings. Close ticket as completed OR file recalibration ticket if discipline-tier substrate needs adjustment.
  • AC6 (added 2026-05-11 per #11209 backfill): Lead/peer coordination protocol compliance audit. Track next 3 /lead-role sessions post-merge of PR #11223 (#11209 Option A-prime graduation):
    • Did lead name strategic focus (lead-role-mode.md §2.3)? Y/N + scope-correct grain Y/N (per §2.3 too-broad / sample-correct / too-narrow table)
    • Did lead use explicit /peer-role skill-trigger phrases (lead-role-mode.md §2.2)? Y/N per substrate-validation A2A
    • Did peers send [lane-claim] A2A before write-operations (peer-role-mode.md §6.5)? Y/N per PR-open + issue-assignment
    • Did peers run source-of-authority collision check (peer-role-mode.md §6.6)? Y/N per lane-claim
    • Were any conflicts resolved by Authority-hierarchy (Current Public Authority > Handoff A2A > Recent Lane Claim)? Y/N + outcome
    • Pass criterion: ≥80% compliance across captured sessions; <80% → file recalibration ticket per #11209 AC6 escalation path.

Out of Scope

  • Implementation of Step 2.5 (done in PR #11194#11192).
  • Automated firing-counter / metric instrumentation (deferred Option C from Discussion #11188; this validation is manual observation by design).
  • Cross-substrate identity canonicalization audit (#11182 Layer 4 — separate concern).

Avoided Traps

  • Automating Step 2.5 firing-counter as part of this ticket: rejected — Option C was deferred per Discussion #11188 consensus. This ticket is empirical observation, not automation.
  • Bundling AC7+AC8 with implementation in #11192: rejected per @neo-gpt's PR #11194 review — closes ticket before evidence exists, violates close-target evidence discipline.
  • 30-day window without baseline byte-count anchor: rejected — without Day-0 anchor, AC4 net-byte audit is unverifiable.

Related

Origin Session ID

c2912891-b459-4a03-b2af-154d5e264df1

Handoff Retrieval Hints

  • query_raw_memories(query="Step 2.5 STEP_BACK 8-point sweep post-merge validation 30-day observation")
  • query_summaries(query="Step 2.5 firings substrate gap detection")
  • Git anchor: git log --oneline --grep="11192" --since="2026-05-11" for graduation merge point
tobiu referenced in commit a7f7d2d - "feat(agents): graduate Step 2.5 Architectural Step-Back (#11192) (#11194) on May 11, 2026, 1:50 PM
tobiu referenced in commit a5c9aa4 - "feat(agents): mandate explicit /peer-role skill-trigger naming in lead-role A2As (#11205) (#11208) on May 11, 2026, 6:16 PM
tobiu referenced in commit 3675493 - "feat(agents): consensus mandate for ideation-sandbox graduation + PR-merge gate (#11217) (#11219) on May 11, 2026, 6:21 PM
tobiu removed the ai-generated label on May 28, 2026, 12:16 AM