What is the 'Neural Link'?

The Neural Link is a bi-directional bridge that connects AI agents (like Gemini or Claude) directly to the Neo.mjs runtime. It allows agents to 'see' the application's Scene Graph, inspect component state, verify event listeners, and even mutate the running application in real-time. This turns the application into a 'glass box' for AI, enabling autonomous debugging and feature development.

Why is Neo.mjs called an 'Application Engine' instead of a framework?

Traditional frameworks are general-purpose libraries (like a Toyota) that help you organize code, but they compile away into ephemeral DOM nodes. Neo.mjs is a precision-engineered runtime (like an F1 car), similar to Unreal Engine for games. It maintains a persistent 'Scene Graph' of objects in a separate worker thread. These objects retain their identity, state, and relationships, allowing for advanced capabilities like multi-window orchestration, runtime permutation, and deep AI introspection that are impossible with 'melted plastic' DOM-based frameworks.

What is 'Context Engineering'?

Context Engineering is the practice of curating the environment and information flow for AI agents to maximize their autonomy. In Neo.mjs, this is implemented via three Model Context Protocol (MCP) servers: a Knowledge Base for semantic code understanding, a Memory Core for learning from past sessions, and a GitHub Workflow server for project management. This ecosystem allows agents to work as fully integrated members of the development team.

What is 'Object Permanence' in the context of Neo.mjs?

In Neo.mjs, UI components are persistent JavaScript objects living in the App Worker, not just transient rendering results. This 'Object Permanence' means a component (like a Dashboard) maintains its state (scroll position, user input, internal logic) even if it is detached from the DOM or moved to a different browser window. This is the 'Lego Technic' approach versus the 'Duplo' approach of traditional frameworks.

What is an 'Agent Operating System'?

Neo.mjs v11 introduced the concept of an Agent OS, where the platform itself provides the tools and interfaces for AI agents to operate. It combines a standalone, type-safe AI SDK for autonomous 'Code Execution' with the Neural Link for runtime control. This enables agents to monitor, debug, and heal the application autonomously, effectively acting as an operating system for synthetic intelligence.

How does Neo.mjs handle multi-window applications?

Neo.mjs uses shared web workers to run a single application instance across multiple browser windows. All windows share the same application state and data in real-time. Components can even move between windows while retaining their JavaScript instances. This enables desktop-class experiences like multi-window IDEs, browser-based email clients, and multi-screen control rooms.

What makes Neo.mjs different from React, Angular, or Vue?

Neo.mjs is fundamentally different in its architecture: (1) It uses True Multithreading via web workers to prevent UI jank, (2) It is an AI-Native Application Engine with built-in bridges for AI collaboration, and (3) It treats components as persistent objects in a Scene Graph, enabling native multi-window support and runtime permutation.

Frontmatter

id	10903
title	Audit unit-test failures in clean CI env (substrate-data + browser gaps)
state	Closed
labels	bugaitestingarchitecture
assignees	neo-gpt
createdAt	May 7, 2026, 3:23 PM
updatedAt	May 7, 2026, 8:21 PM
githubUrl	https://github.com/neomjs/neo/issues/10903
author	neo-opus-4-7
commentsCount	1
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 7, 2026, 8:21 PM

Audit unit-test failures in clean CI env (substrate-data + browser gaps)

Name: Neo.mjs Application Engine
Author: Neo.mjs

Closedbugaitestingarchitecture

neo-opus-4-7 commented on May 7, 2026, 3:23 PM

Context

Surfaced 2026-05-07 by #10897 Lane C CI workflow (PR #10899) on first unit matrix-row run. ~30 unit tests fail in a truly clean CI environment (ubuntu-latest, fresh checkout, no .neo-ai-data/ content). Locally these tests pass for developers because their machines carry accumulated substrate data that CI lacks.

This is a silent CI-locality coupling that has been hidden until automated CI gating arrived.

The Problem

Lane C CI run 25497549380 unit job reported:

1006 passed (5.2m)
~30 failed
5 flaky
11 skipped
6 did not run
1 error was not part of any test

Failure clusters observed in the log:

Native Edge Graph collection-not-initialized errors:

   IssueIngestor GET error: Error: [CollectionProxy] get() failed: 
    No underlying collection available for type 'graph'
    at test/playwright/unit/ai/daemons/DreamServiceGoldenPath.spec.mjs:95:9

The SQLite graph collection isn't initialized in CI. The test depends on a substrate-bootstrap step that doesn't fire in clean env.

MCP Server Health (McpServersHealth.spec.mjs): 'github-workflow', 'knowledge-base', 'memory-core' server bootstrap tests all fail. Likely missing config, env vars, or substrate paths.
MCP Server Isolation (McpServersIsolation.spec.mjs): "Broken server should still boot in degraded mode" + "Healthy server unaffected by sibling failure" — depend on the same MCP server bootstrap path as cluster 2.
MCP Authorization (Authorization.spec.mjs): 'should allow access with a valid token' fails — auth substrate gap.
GitHub Workflow PullRequestService (PullRequestService.spec.mjs): getPullRequestDiff tests fail (#10748 substrate). Likely needs gh CLI setup or test-env GitHub auth.
Memory Core Services: QueryReRanker, SessionService, SessionService.ResumeValidation, SessionSummarization (#10725 + #10643 + #10673 substrate) — all depend on Memory Core data substrate.
DestructiveOperationGuard (DestructiveOperationGuard.spec.mjs): #10845 substrate — depends on memory-core graph being initialized.
AI Scripts (checkAllAgentIdle, checkSunsetted, resumeHarness): #10643, #10673, #10674, #10677 substrate — agent-state SQLite tables.
Grid tests (LockedColumns, Pooling, Teleportation): Frontend tests that likely need Playwright browser binaries (the CI workflow doesn't currently run playwright install).

The Architectural Reality

Test substrate dependencies fall into 4 buckets:
- (a) Substrate-data — needs pre-populated SQLite collections, ChromaDB rows, or .neo-ai-data/ content.
- (b) Browser binaries — Playwright tests using chromium need npx playwright install step.
- (c) External tools — gh CLI auth, git config, etc. Some are present on ubuntu-latest by default; some need explicit setup.
- (d) Env vars — provider API keys, OIDC config, etc. Test fixtures may rely on process.env.* that's absent in CI.
Why locally passes ≠ CI passes: developers accumulate substrate over months — graph SQLite has rows, Chroma collections exist, .neo-ai-data/ has content. CI starts each run from git clone + npm ci + this workflow's empty .neo-ai-data/ guard (which I added in #10897 to skip the heavyweight KB download).
Per-test-isolation discipline gap: strong isolation would mock these substrates in unit tests. The current test corpus depends on shared substrate state — passes the local-dev experience but breaks the unit-test isolation contract.

The Fix (Investigation-Shaped)

This ticket is diagnostic-first rather than prescription-first because the right fix depends on per-cluster investigation. Estimated 4 sub-fixes correspond to the 4 buckets above:

Investigation Phase

Triage each failing test into bucket (a)/(b)/(c)/(d) by reading the spec source + observed CI failure output.
For bucket (b) — Playwright browsers: confirm hypothesis via local npx playwright install skip-test then re-run. If correct, fix is ONE line in Lane C workflow: add npx playwright install --with-deps chromium step.
For bucket (a) — substrate-data: identify which specs lack proper test-fixture setup. Per-spec fixture additions OR shared bootstrap step.
For bucket (c) — external tools: audit ubuntu-latest defaults vs assumed availability.
For bucket (d) — env vars: audit which env vars tests assume; document required CI secrets vs provide test-mode defaults.

Fix Phase (per-bucket sub-PRs)

Each bucket likely needs its own dedicated sub-PR for cross-family review. Investigation-phase output: list of file:line references + per-spec bucket assignment + per-bucket fix shape.

Coordination with Lane C (#10899)

Lane C will drop unit from the matrix until this audit's per-bucket fixes land, then add unit back as a matrix row in a follow-up PR. Avoids landing a CI workflow with known-failing matrix rows.

Acceptance Criteria

Phase 1 (Investigation): Triage table mapping each of the ~30 failing tests to bucket (a)/(b)/(c)/(d), with file:line references and observed-failure-shape per spec.
Phase 2 (Per-bucket fixes): Sub-tickets filed per bucket with concrete prescriptions (sub-tickets may merge concurrently or sequentially as they touch different surfaces).
Phase 3 (Lane C re-enablement): Once per-bucket fixes land in dev, follow-up PR adds unit back to Lane C's matrix; CI green on unit matrix row demonstrated empirically.
Optional: Document the substrate-vs-isolation discipline gap in learn/guides/testing/UnitTesting.md — what it takes for a unit test to pass in clean CI, what fixtures are required, what's the boundary between "needs substrate" and "should mock".

Out of Scope

Fixing all bucket failures in this single ticket — investigation-shaped, not prescription-shaped. Each bucket gets its own ticket+PR.
CI workflow design — Lane C #10897 already owns that surface; this ticket is about test-substrate alignment.
Refactoring the unit test corpus to use proper mocking everywhere — a "long arc" architectural concern; this audit fixes the immediate clean-CI-pass gap, deeper isolation refactor is future ticket.
Frontend Grid tests if they require browser substrate beyond playwright install — may surface as needing OffscreenCanvas, font rendering, etc.; if discovered, file as separate ticket per-bucket.

Avoided Traps / Gold Standards Rejected

Rejected: bundle all sub-fixes into a single PR. Cross-cutting concern affecting 4+ subsystems (DreamService, MCP servers, Memory Core, Grid). Per-bucket PRs keep cross-family review focused.
Rejected: skip the failing tests via test.skip annotations. Treats symptom not cause; future regressions hide. Substrate fixes preserve coverage.
Rejected: provision substrate at CI workflow level (Lane C #10897) — couples test substrate to CI infra. Test fixtures should be self-contained per-spec (bucket a) or workflow-level only for tools (bucket b).
Rejected: assume all failures are CI-environment issues rather than real test bugs. Some may be: investigation phase will distinguish.

Surfacing context: Lane C CI workflow PR #10899 (closes #10897) — first clean-CI run revealing these failures.
Sibling substrate fix (CI-prerequisite): #10902 (Dockerfile prepare-lifecycle bug) — also surfaced by Lane C CI; integration matrix row blocked until that fix.
Existing related ticket: #10714 (Codex sandbox bootstrap probe for .neo-ai-data/sqlite access) — adjacent shape (per-harness substrate-bootstrap), different cause (sandbox permissions vs CI clean env).

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="unit test failures clean CI substrate data .neo-ai-data Lane C audit per-bucket investigation")

tobiu referenced in commit 30ad551 - "feat(testing): NEO_TEST_SKIP_CI guard for heavy-SLM + auth specs (#10903) (#10907) on May 7, 2026, 5:58 PM

tobiu referenced in commit f039bbb - "test(ci): add bucket c clean-ci skip guards (#10903) (#10910) on May 7, 2026, 6:03 PM

tobiu referenced in commit 4fb4bca - "feat(ci): test matrix workflow gating PRs on unit + integration suites (#10897) (#10899) on May 7, 2026, 8:19 PM

tobiu closed this issue on May 7, 2026, 8:21 PM

tobiu referenced in commit 4cb5184 - "test(ci): add bucket B+D clean-ci skip guards (#10903) (#10921) on May 7, 2026, 8:21 PM

tobiu referenced in commit 98897fc - "feat(ci): re-add unit suite to matrix post-Bucket-G substrate (#10939) (#10953) on May 8, 2026, 2:43 PM