What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	10734
title	Baseline & inventory: per-harness loaded-surface measurement methodology
state	Closed
labels	documentationenhancementcontributor-experienceai
assignees	neo-gemini-pro
createdAt	May 5, 2026, 11:04 AM
updatedAt	May 5, 2026, 4:42 PM
githubUrl	https://github.com/neomjs/neo/issues/10734
author	neo-opus-ada
commentsCount	2
parentIssue	10733
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 5, 2026, 4:42 PM

Baseline & inventory: per-harness loaded-surface measurement methodology

Closed v13.0.0/archive-v13-0-0-chunk-8 documentationenhancementcontributor-experienceai

neo-opus-ada commented on May 5, 2026, 11:04 AM

Context

Sub 1 of Epic #10733 — measurement-first sequencing. Gating sub — Subs 2-5 reference the baseline this sub captures. Without it, post-edit deltas can't be measured, and #10512's partial-scope outcome would repeat.

The Problem

#10537 ships a pr-review-only loaded-surface measurement methodology (pr-review-guide.md introduction + measurement-methodology.md). This Epic broadens the scope to AGENTS.md, boot ramp, and all skill payloads — but #10537's methodology has two gaps for the broader scope (per @neo-gpt's external-source addendum on Discussion #10732):

Repo line counts ≠ true prompt load. Imports/concatenation behavior differs per harness. Splitting files into multiple references doesn't reduce true loaded bytes if the harness concatenates them at boot.
Loaded-byte delta is necessary but not sufficient. Lower bytes + higher correction cycles = false win. The deeper failure mode is template-skip / audit-letter-miss / Cycle-2.5 churn, not file size alone.

The Architectural Reality

.agents/skills/pr-review/references/measurement-methodology.md — existing #10537 methodology, file-size-only
Per-harness verification primitives:
- Gemini CLI: /memory show exposes the actual concatenated GEMINI.md prompt
- Claude Code: /memory displays loaded memory contents
- Codex Desktop: active-instruction audit via project_doc_max_bytes config + harness-specific introspection
Correction-cycle metrics: PR Request-Changes count, A2A round-trip count per PR, Cycle-N count per review (already partially captured in feedback_pr_review_iteration_calibration and graph-ingestion data)

The Fix

Extend measurement-methodology.md (or fork into a sibling under .agents/skills/ if scope diverges) to cover:

The full cognitive surface (boot + AGENTS.md + all 21 skill payloads + all assets)
Per-harness combined-prompt verification using harness-native primitives
Correction-cycle metrics paired with loaded-byte counts

Capture a pre-edit baseline snapshot for every surface this Epic targets, per active harness, before Subs 2-5 fire. Store the baseline as a committed artifact so post-edit deltas can be measured against a stable reference.

Acceptance Criteria

(AC0) Per-harness combined-prompt-load measured pre-edit using harness-native primitives: /memory show (Gemini CLI), /memory (Claude Code), active-instruction audit (Codex Desktop). Repo line counts captured separately as a complementary metric, NOT as the primary one
(AC1) Loaded-surface measurement methodology documented — extending #10537's measurement-methodology.md to cover boot ramp + AGENTS.md + per-skill payloads + per-harness true-prompt-load
(AC2) Methodology records BOTH loaded-byte counts (wc -c proxy) AND correction-cycle metrics (Request-Changes count + A2A round-trip count per PR)
(AC3) Pre-edit baseline captured for every surface this Epic targets (boot, AGENTS.md §-by-§, all 21 skill payloads, all assets) per-harness — committed as .agents/skills/.../baselines/cognitive-load-baseline-YYYY-MM-DD.md or equivalent

Out of Scope

Editing any of AGENTS.md, AGENTS_STARTUP.md, skill payloads, or templates — those are Subs 2-5
Building automated tooling to replace manual /memory show / /memory invocations — manual capture is sufficient for the baseline; automation is a follow-up if Sub 1 proves it's worth the substrate cost

Parent Epic: #10733
Predecessor methodology: #10537 (measurement-methodology.md — Sub 1 extends, not replaces)
Origin discussion: #10732 (especially @neo-gpt's external-source addendum 16813972)

Origin Session ID: 7e52099b-9632-4c67-a2a1-4e1a1ad1c414

Retrieval Hint: query_raw_memories(query="baseline measurement methodology per-harness loaded-surface correction-cycle metrics 10733 10732")

tobiu referenced in commit 4c3766d - "docs(agentos): baseline and inventory for loaded-surface methodology (#10734) (#10746) on May 5, 2026, 4:42 PM

tobiu closed this issue on May 5, 2026, 4:42 PM