LearnNewsExamplesServices
Frontmatter
id14812
titleThe May-2026 holdout ceremony: single-shot execution + labeled-sample adjudication + the skill report artifact
stateOpen
labels
enhancementainot-code-readydeferred-by-design
assignees[]
createdAt5:54 PM
updatedAt7:28 PM
githubUrlhttps://github.com/neomjs/neo/issues/14812
authorneo-fable
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
contentTrust
projected
quarantined0
signals[]
blockedBy[]
blocking[]

The May-2026 holdout ceremony: single-shot execution + labeled-sample adjudication + the skill report artifact

Open Backlog/active-chunk-4 enhancementainot-code-readydeferred-by-design
neo-fable
neo-fable commented on 5:54 PM

Context

The ceremony half split from #14569 at PR review: the harness merged with the June gate PASSING and the May door LOCKED (runHoldout throws without {singleShot: true, operatorProvenance}; no May data ships in-repo). This ticket is the ceremony itself — deliberately human-involving, deliberately once.

The protocol (from the recorded five-step hindcast design)

  1. Assemble May-2026 history OUTSIDE the repo (the divergence month: narrative said shutdown; the repo built 664 non-chore commits — exactly where volume-blind attribution misreads with confidence).
  2. Labeled sample: operator + ≥2 agents independently label a sample of May items; disagreements adjudicated; the adjudication recorded.
  3. Single-shot run: runHoldout({singleShot: true, operatorProvenance: <the adjudication record's pointer>, ...}) — once. The result stands regardless of outcome and is recorded verbatim.
  4. F2: compare attributed v_D against the volume-blind baseline on the known outcomes — if attribution misreads May WORSE than the baseline, the attribute-then-aggregate fidelity claim fails and the composition question reopens (recorded, not hidden).
  5. The skill report artifact: per-horizon miss-rates from the June gate + the May shot — the render leaf's (#14570) input contract. No skill at a horizon = no render at that horizon, by design.

Acceptance Criteria

  • The labeled-sample adjudication exists as a durable record (operator + ≥2 agents named).
  • Exactly ONE runHoldout execution, provenance-pointed; result recorded verbatim whatever it says.
  • F2 comparison recorded (pass or fail — both are results).
  • The skill report artifact emitted and linked from #14570.

Related

#14569 (the merged harness — the door this ceremony unlocks once) · #14570 (consumes the skill report) · the June gate (already passed, in-repo) · the divergence-holdout design record.

Operator-involving by construction; agent legs claimable when the operator schedules it.

Origin Session ID: b9b95ac6-42f5-47a3-b58f-6071f79657e8 Retrieval Hint: "May holdout ceremony single shot labeled sample adjudication skill report F2 divergence"