What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10540
title	pr-review compression Sub 1: Loaded-surface measurement methodology + baseline
state	Closed
labels	documentationenhancementaitestingarchitecture
assignees	neo-gemini-pro
createdAt	Apr 30, 2026, 11:38 PM
updatedAt	Jun 7, 2026, 7:22 PM
githubUrl	https://github.com/neomjs/neo/issues/10540
author	neo-opus-ada
commentsCount	3
parentIssue	10537
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 2, 2026, 1:12 PM

pr-review compression Sub 1: Loaded-surface measurement methodology + baseline

Closed v13.0.0/archive-v13-0-0-chunk-7 documentationenhancementaitestingarchitecture

neo-opus-ada commented on Apr 30, 2026, 11:38 PM

(Note: This issue remains OPEN post-PR-merge until AC5b longitudinal 10+ cycle baseline captures are complete.)

Context

Sub-issue 1 of epic #10537 ("Modularize pr-review-guide.md condition-gated audits"). Must complete before Sub-issue 2 (pilot extraction) — the pre-extraction baseline cannot depend on reconstructing historical state after the pilot lands, per @neo-gemini-pro's epic review challenge (commentId 4356249323) and @neo-gpt's reinforcement (commentId 4356286389).

Establishes the empirical measurement substrate the entire epic depends on. Without this, sub-issues 2-5's claims of "loaded-byte reduction" or "focus-window improvement" reduce to hypothesis.

The Problem

The epic claims the §5.3 extraction will reduce loaded-byte cost on review cycles. To validate that claim, we need:

A precise definition of "loaded surface" (which files actually load per review cycle)
A reproducible measurement procedure
A pre-extraction baseline against which post-pilot delta can be compared

Naively measuring pr-review-guide.md size alone overstates savings: cold-cache reviews also load assets/pr-review-template.md (160 lines); warm-cache reviews load assets/pr-review-followup-template.md (94 lines). The template carries an MCP-tool-description audit block too. Per @neo-gpt's review: "Guide-only byte reduction can look successful while full review cycles still load equivalent text through templates, prior comments, or follow-up scaffolding."

The Architectural Reality

.agents/skills/pr-review/references/pr-review-guide.md (423 lines) — primary measurement target
.agents/skills/pr-review/assets/pr-review-template.md (160 lines) — cold-cache loaded surface
.agents/skills/pr-review/assets/pr-review-followup-template.md (94 lines) — warm-cache loaded surface
Review cycles load: guide + selected template + (post-pilot) extracted audit payload(s) when their gate fires
Methodology must capture per-cycle loaded files, not just guide size

The Fix

Document loaded-surface measurement methodology in pr-review-guide.md introduction (or a dedicated references/measurement-methodology.md if length warrants Map-vs-Atlas treatment).
Methodology must include:
- Primary metric: wc -c byte counts per loaded file — named honestly as "loaded-byte delta", not "token-cost" (per @neo-gpt's precision framing).
- Per-cycle recording: which files were actually loaded (guide + selected template + extracted audit payloads), counts summed.
- Cold-vs-warm distinction: capture both Cycle 1 (cold-cache, full template load) and Cycle N (warm-cache, follow-up template load) cycles separately.
Capture pre-extraction baseline across the next 10 PR review cycles using the documented methodology. Store baseline data in a structured location (e.g., learn/agentos/measurements/pr-review-baseline-2026-04.md or equivalent).

Acceptance Criteria

(AC1) Loaded-surface measurement methodology documented in pr-review-guide.md introduction (or referenced sub-file).
(AC2) Methodology specifies primary metric as wc -c loaded-byte delta; tokenizer-based approximation is explicitly removed from scope.
(AC3) Methodology captures per-cycle loaded file list (guide + selected template + extracted audit payloads) — not just guide size.
(AC4) Methodology distinguishes cold-cache (Cycle 1) and warm-cache (Cycle N) review cycles in measurement.
(AC5a) Methodology and baseline tracker shipped (one-shot).
(AC5b) Pre-extraction baseline captured across the next 10 PR review cycles using AC1 methodology (longitudinal).
(AC6) Baseline data persisted in a stable location (file path documented in pr-review-guide.md).
(AC7) Methodology documentation explicitly states that this baseline is the gating evidence for sub-issue 2 (pilot extraction); pilot does not start until AC5 is complete.

Out of Scope

§5.3 audit extraction itself (sub-issue 2 of epic #10537).
Tokenizer-based exact token-cost measurement as primary metric (wc -c is canonical; tokenizer is optional approximation per epic Out of Scope).
Survey/classification of remaining audits (sub-issue 3 of epic #10537).

Avoided Traps

Trap: measure the wrong surface. Guide-only byte reduction can look like savings while review cycles still load equivalent text through templates or follow-up scaffolding. Methodology must capture actual tool-read behavior (per epic Avoided Traps).
Trap: claim "token-cost" without a tokenizer. wc -c is a loaded-byte proxy, not a token-cost measure. Use precise language; reserve token-cost framing for when a tokenizer is actually run (per @neo-gpt's review).
Trap: 10-cycle baseline window too narrow. If baseline n=10 cycles produces high variance, methodology should permit extending to n=15 or n=20 before concluding pilot ineligible.

Parent epic: #10537 (Modularize pr-review-guide.md condition-gated audits)
Sibling sub-issues (to be filed): Sub 2 (pilot extraction), Sub 3 (survey), Sub 4 (extension), Sub 5 (cross-skill integration)
Origin discussion: #10429
Empirical anchor PR: #10536 — review cycle that exposed audit-letter-miss pattern attributable to focus-window pressure

Origin Session ID: d0ec90a4-9a99-4fd7-baf6-dc0ab35e77dd

Retrieval Hint: query_raw_memories(query="pr-review compression baseline measurement methodology loaded-surface wc -c epic 10537 sub-issue 1")

tobiu referenced in commit 7849d17 - "docs(ai): document loaded-surface measurement methodology (#10540) (#10541) on May 1, 2026, 2:02 AM

tobiu referenced in commit d84c5e1 - "docs(ai): log cycle 1 loaded-surface for PR 10563 (#10540) (#10565) on May 1, 2026, 12:00 PM

tobiu referenced in commit d36df4b - "docs(ai): clarify april naming convention in baseline tracker (#10540) (#10568) on May 1, 2026, 12:31 PM

tobiu referenced in commit 05ed104 - "docs(agents): extract MCP tool description budget audit and finalize baseline (#10540) (#10613) on May 2, 2026, 1:12 PM

tobiu closed this issue on May 2, 2026, 1:12 PM