What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	9792
title	Optimize OpenAiCompatible Provider by Natively Wrapping LLM Streaming API
state	Closed
labels	enhancementai
assignees	tobiu
createdAt	Apr 8, 2026, 7:22 PM
updatedAt	Apr 8, 2026, 7:22 PM
githubUrl	https://github.com/neomjs/neo/issues/9792
author	tobiu
commentsCount	1
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	Apr 8, 2026, 7:22 PM

Optimize OpenAiCompatible Provider by Natively Wrapping LLM Streaming API

Closed v13.0.0/archive-v13-0-0-chunk-3 enhancementai

tobiu commented on Apr 8, 2026, 7:22 PM

Context & Architectural Strategy

The memory core utilizes local LLM models (via LM Studio or Ollama backend OpenAiCompatible endpoints) heavily for graph extraction (Tri-Vector Synthesis, Topological Conflicts). By default, standard generative endpoint calls stream: false force the local model servers to buffer and serialize entirely localized JSON structures before releasing the REST packet. On local Apple Silicon instances computing large graphs natively, this synchronous wait-lock inflates generation times drastically.

Performance analysis demonstrated that enforcing stream: true (which offloads token string concatenation to V8 without holding an HTTP buffer lock) provides ~30% physical latency reductions over the existing monolithic request architecture, nearly halving Topological inference latency.

Actionable Scope

Refactor the internal architecture of Neo.ai.provider.OpenAiCompatible's default generate() footprint.

Map .generate() to dynamically execute the internal this.stream() AsyncGenerator.
Concatenate the chunk strings securely in the Node environment.
Preserve the existing structural response contract (returning { content, raw }) unchanged so downstream platform dependencies remain functionally decoupled from the streaming optimization.
Adhere strictly to the "Anchor & Echo" semantic tagging (@summary) strategy.

Implementation Details

Modified Neo.ai.provider.OpenAiCompatible to absorb the logic internally, resulting in:

Tri-Vector Extraction Latency: -6% isolated drop
Topological Conflict Extraction: -50% isolated drop

Avoided Pitfalls

Contract Violation: Avoid exposing chunked interfaces natively in .generate() to prevent shattering standard API client expectations in existing scripts.

tobiu added the enhancement label on Apr 8, 2026, 7:22 PM

tobiu added the ai label on Apr 8, 2026, 7:22 PM

tobiu referenced in commit dbd6a24 - "feat(ai/provider/OpenAiCompatible.mjs): natively wrap streaming generator to bypass LM Studio serialization buffers (#9792)" on Apr 8, 2026, 7:22 PM

tobiu assigned to @tobiu on Apr 8, 2026, 7:22 PM

tobiu Apr 8, 2026, 7:22 PM

Input from Antigravity (Gemini 3.1 Pro):

✦ Successfully refactored OpenAiCompatible.generate() to natively wrap stream: true iteration blocks. This effectively bypasses the monolithic buffer serialization penalties inside local LLM endpoints (LM Studio, llama.cpp), decreasing graph rendering latency without altering external function signatures.

Verification Results:

Tri-Vector Extraction latency: Reduced by ~6%

Topological Conflict Extraction latency: Reduced by ~50%

Testing manually validated using the continuous DreamService REM-sleep background extraction via runSandman.mjs.

Closing ticket as Definition of Done is met.

tobiu closed this issue on Apr 8, 2026, 7:22 PM