What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10775
title	Mechanical Pre-Flight: enforce evaluation-metrics block in pr-review comments
state	Closed
labels	documentationenhancementaimodel-experience
assignees	[]
createdAt	May 5, 2026, 9:16 PM
updatedAt	May 31, 2026, 8:30 AM
githubUrl	https://github.com/neomjs/neo/issues/10775
author	neo-opus-ada
commentsCount	1
parentIssue	10757
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 31, 2026, 8:30 AM

Mechanical Pre-Flight: enforce evaluation-metrics block in pr-review comments

Closed Backlog/active-chunk-9 documentationenhancementaimodel-experience

neo-opus-ada commented on May 5, 2026, 9:16 PM

Context

Empirical anchor from session 2026-05-05 (workflow-discipline-over-velocity calibration with @tobiu). My PR #10768 review (PRR_kwDODSospM78Iqlj) shipped with all pr-review-guide audit sections (Strategic-Fit Step-Back, Depth Floor, Source-of-Authority, Cross-Skill Integration, Test-Execution Audit, etc.) — but omitted the §3.1 decile-anchor evaluation metrics block ([ARCH_ALIGNMENT] / [CONTENT_COMPLETENESS] / [EXECUTION_QUALITY] / [PRODUCTIVITY] / [IMPACT] / [COMPLEXITY] / [EFFORT_PROFILE]).

@tobiu caught the workflow lapse at merge-time: "hint, evaluation metrics are missing inside your review." I posted a supplemental block post-merge for graph-completeness, but the lapse-then-cleanup pattern is the failure mode — discipline-only gating doesn't reliably fire when agents compress the review template under self-imposed time pressure.

The Problem

The pr-review-guide §3.1 decile-anchor rubric is load-bearing for graph-ingestion — IssueIngestor and downstream cross-cycle measurability depend on the metrics being machine-parseable. A review without the metrics block:

Has no [ARCH_ALIGNMENT] / [EXECUTION_QUALITY] / etc. tags for the Retrospective daemon's regex parser
Provides no per-cycle measurability for cross-family review-quality trends
Cannot feed into the §8/§7.2 cross-model asymmetry empirical-grounding measurement (#10756 methodology)

Codifying the discipline in the skill manual is necessary but not sufficient — the workflow-discipline-over-velocity lapse pattern (feedback_workflow_discipline_over_velocity.md) shows agents under pressure skip the metrics block even when the codification exists.

The Architectural Reality

.agents/skills/pr-review/references/pr-review-guide.md §3.1 — the existing decile-anchor rubric codification
.agents/skills/pr-review/assets/pr-review-template.md — the asset template that should include the metrics block as a required section
.agents/skills/pr-review/references/audits/ — directory of audit sub-payloads; potentially the right home for a new "evaluation-metrics-pre-flight" audit
ai/daemons/services/IssueIngestor.mjs:327 — the regex parser that picks up [KB_GAP] / [TOOLING_GAP] / [RETROSPECTIVE] tags; the metrics tags follow the same shape and the parser could be extended to surface metric-completeness gaps

The Fix

Option A (lightest): Update pr-review-template.md to include the evaluation-metrics block as a required section with explicit empty-placeholders:

<h3 class="neo-h3" data-record-id="6">📊 Evaluation Metrics</h3>

- **`[ARCH_ALIGNMENT]`**: __ — _justification_
- **`[CONTENT_COMPLETENESS]`**: __ — _justification_
- **`[EXECUTION_QUALITY]`**: __ — _justification_
- **`[PRODUCTIVITY]`**: __ — _justification_
- **`[IMPACT]`**: __ — _justification_
- **`[COMPLEXITY]`**: __ — _justification_
- **`[EFFORT_PROFILE]`**: __ (Quick Win / Mid-effort / Major / N/A)

The placeholder pattern forces the reviewer to either fill in or explicitly mark N/A — empty __ left in the comment is visually obvious as a workflow lapse.

Option B (mechanical gate): Pre-Flight in pull-request-workflow (or a new pr-review-workflow Pre-Flight section) that grep-checks the review-comment body for the 7 metric tags before allowing manage_issue_comment invocation; halt if any are absent without explicit N/A justification.

Option C (graph-side enforcement): Extend IssueIngestor.mjs:327 regex to ALSO capture the metric tags; emit a [METRICS_GAP] graph-ingestion tag for reviews missing metrics so the gap is visible in graph queries even if the review shipped.

Recommendation: ship Option A first (lowest cost, immediate fix), file Option B + Option C as scope-extension follow-ups if Option A doesn't reduce the lapse rate enough.

Acceptance Criteria

(AC1) pr-review-template.md updated with required Evaluation Metrics section + empty-placeholder pattern (per Option A)
(AC2) pr-review-guide §3.1 cross-references the template requirement explicitly (so reviewers know the template's metric block is required, not optional)
(AC3) Anti-pattern table entry in pr-review-guide flagging "review-comment shipped without evaluation-metrics block" as the failure mode
(AC4) Empirical validation: next 5 cross-family reviews after merge include metrics block (sampled; manual spot-check)

Out of Scope

Option B (mechanical Pre-Flight gate) — file as scope-extension if Option A insufficient
Option C (graph-side [METRICS_GAP] tagging) — file as scope-extension if observability gap persists
Standardizing the per-metric scoring rubric (already in pr-review-guide §3.1)
Metrics-driven review-quality measurement (separate scope, ties into #10756 §8/§7.2 empirical grounding)

Avoided Traps

Mechanical-gate-first: would block legitimate brief comments (e.g., approval acks). Template-shape with placeholders is lighter; mechanical gate is appropriate only if discipline alone proves insufficient.
Removing the discipline from the skill manual: the codification is necessary even when mechanically enforced, for cross-family agent training on the rubric semantics.

Empirical anchor: PR #10768 review (mine, missing metrics block); @tobiu calibration 2026-05-05; supplemental block posted post-merge for graph-completeness
Sibling substrate: #10164 (cross-PR file-collision Pre-Flight — author-side analog)
Sibling substrate: #10755 (Resolves/Fixes vs Closes Pre-Flight — author-side keyword-presence check; same family)
Parent substrate: #10757 V2 (cycle-2 mutation-time gate — broader Pre-Flight family for substrate-touching PRs)
Cross-reference: #10756 §8/§7.2 empirical grounding — metrics ARE the measurement substrate

Origin Session ID: 23b9cbcd-4938-4a46-b21a-0d48dd12e7e7

Retrieval Hint: query_raw_memories(query="pr-review evaluation metrics decile anchor mechanical pre-flight workflow discipline 10768 metrics block")