What is the 'Neural Link'?

The Neural Link is a bi-directional bridge that connects AI agents (like Gemini or Claude) directly to the Neo.mjs runtime. It allows agents to 'see' the application's Scene Graph, inspect component state, verify event listeners, and even mutate the running application in real-time. This turns the application into a 'glass box' for AI, enabling autonomous debugging and feature development.

Why is Neo.mjs called an 'Application Engine' instead of a framework?

Traditional frameworks are general-purpose libraries (like a Toyota) that help you organize code, but they compile away into ephemeral DOM nodes. Neo.mjs is a precision-engineered runtime (like an F1 car), similar to Unreal Engine for games. It maintains a persistent 'Scene Graph' of objects in a separate worker thread. These objects retain their identity, state, and relationships, allowing for advanced capabilities like multi-window orchestration, runtime permutation, and deep AI introspection that are impossible with 'melted plastic' DOM-based frameworks.

What is 'Context Engineering'?

Context Engineering is the practice of curating the environment and information flow for AI agents to maximize their autonomy. In Neo.mjs, this is implemented via three Model Context Protocol (MCP) servers: a Knowledge Base for semantic code understanding, a Memory Core for learning from past sessions, and a GitHub Workflow server for project management. This ecosystem allows agents to work as fully integrated members of the development team.

What is 'Object Permanence' in the context of Neo.mjs?

In Neo.mjs, UI components are persistent JavaScript objects living in the App Worker, not just transient rendering results. This 'Object Permanence' means a component (like a Dashboard) maintains its state (scroll position, user input, internal logic) even if it is detached from the DOM or moved to a different browser window. This is the 'Lego Technic' approach versus the 'Duplo' approach of traditional frameworks.

What is an 'Agent Operating System'?

Neo.mjs v11 introduced the concept of an Agent OS, where the platform itself provides the tools and interfaces for AI agents to operate. It combines a standalone, type-safe AI SDK for autonomous 'Code Execution' with the Neural Link for runtime control. This enables agents to monitor, debug, and heal the application autonomously, effectively acting as an operating system for synthetic intelligence.

How does Neo.mjs handle multi-window applications?

Neo.mjs uses shared web workers to run a single application instance across multiple browser windows. All windows share the same application state and data in real-time. Components can even move between windows while retaining their JavaScript instances. This enables desktop-class experiences like multi-window IDEs, browser-based email clients, and multi-screen control rooms.

What makes Neo.mjs different from React, Angular, or Vue?

Neo.mjs is fundamentally different in its architecture: (1) It uses True Multithreading via web workers to prevent UI jank, (2) It is an AI-Native Application Engine with built-in bridges for AI collaboration, and (3) It treats components as persistent objects in a Scene Graph, enabling native multi-window support and runtime permutation.

Frontmatter

id	11393
title	Memory Core add_memory must retry-on-model-unload when embedding provider drops idle models
state	Closed
labels	bugaiagent-task:pendingmodel-experience
assignees	[]
createdAt	May 15, 2026, 5:12 AM
updatedAt	May 15, 2026, 7:21 AM
githubUrl	https://github.com/neomjs/neo/issues/11393
author	neo-opus-4-7
commentsCount	0
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 15, 2026, 7:21 AM

Memory Core add_memory must retry-on-model-unload when embedding provider drops idle models

Name: Neo.mjs Application Engine
Author: Neo.mjs

Closedbugaiagent-task:pendingmodel-experience

neo-opus-4-7 commented on May 15, 2026, 5:12 AM

Context

Memory Core's add_memory MCP tool depends on the configured embedding provider (per ai/mcp/server/memory-core/config.mjs, default embeddingProvider: 'openAiCompatible' at http://127.0.0.1:1234 consuming text-embedding-qwen3-embedding-8b). The local-development pattern (/Users/Shared/github/neomjs/neo workstation per #11380 anchor) uses LM Studio as the openai-compatible host.

LM Studio's default behavior is JIT model loading — models are loaded on first request and automatically unloaded after an idle timeout (configurable; default ≈5-15 minutes depending on RAM-pressure heuristics). This is correct LM Studio behavior for a desktop UI tool. It becomes a substrate-friction footgun for Memory Core sessions that intersperse activity with idle windows (e.g., overnight nightshift sessions awaiting cross-family review cycles, sessions parked at a human-merge-gate, agents in legitimate post-review-pickup §4 halt-state).

The Problem

Empirical anchor (this session, Origin Session ID: e095c569-beac-4743-998f-e07d4344492e):

00:00Z session start; LM Studio loaded with gemma-4-31b-it for inference + text-embedding-qwen3-embedding-8b for embeddings; add_memory works (verified via MESSAGE:b2008e98 GPT sanity-ping context + multiple successful add_memory calls through 01:08Z).
~01:08Z – ~03:04Z session enters legitimate halt-state awaiting cross-family review cycles + operator merge. No add_memory calls during this window.

03:04Z wake event arrives; add_memory retry-test fails with:

  openAiCompatible embedding error HTTP 400: {"error":"Model was unloaded while the request was still in queue.."}

Recovery: curl -sS http://127.0.0.1:1234/v1/models confirms LM Studio is still listening + the model is in its catalog, but JIT-unloaded out of resident memory.

The failure mode is mechanically deterministic: any Memory Core session that idles past LM Studio's auto-unload threshold then attempts add_memory hits HTTP 400, breaking the AGENTS.md §0 Invariant 5 ("No skipping add_memory at end of turn") gate.

Current workaround substrate: agents use add_message self-DM fallback per AGENTS.md §4.3 ("Un-savable Turns"). This works because add_message writes to SQLite only — the embedding-provider dependency lives only in add_memory's vector-embedding step. The fallback preserves the turn-memory text content but loses the embedded-vector semantic indexing until the operator manually reloads the model or the next session boot re-warms the provider.

The Architectural Reality

Embedding callsite: ai/services/memory-core/ChromaManager.mjs (or sibling — exact path resolution per current source) constructs the /v1/embeddings POST against the openai-compatible host configured by aiConfig.openAiCompatible.host.
Error surface: the LM Studio response shape {"error":"Model was unloaded while the request was still in queue.."} is the HTTP-400 body that surfaces to add_memory's error path.
Existing retry semantics: none for this specific failure mode. The error propagates directly to the MCP-tool caller.
Companion substrate: #11380 (orchestrator daemon managing MC Chroma) demonstrates the broader pattern — local-supporting-services should be daemon-managed for substrate-evolution-flywheel robustness. The embedding provider is currently NOT under that pattern; it's operator-side desktop-app responsibility.

The Fix

Narrow prescription (this ticket): add retry-on-model-unload semantics inside the openai-compatible embedding client path in Memory Core.

When the embedding callsite receives an HTTP 400 with the LM Studio model-unloaded shape:

Detect the unload-error pattern in the response body (substring match on "Model was unloaded" is fine; LM Studio's error shape is stable).
Sleep briefly (e.g., 500ms) to allow LM Studio's load-on-next-request semantics to kick in.
Retry the same /v1/embeddings POST up to N=3 times. Each subsequent request triggers LM Studio's automatic warm-load (the model gets loaded back into RAM transparently within the retry window — typical warmup is 5-15s for an 8B-param model).
If still failing after N retries, propagate the original error to the caller (existing behavior preserved as last-resort fallback).

This handles 95%+ of the failure cases empirically (operator-tested at LM Studio default 8B-embedding behavior). The remaining 5% (operator has the model evicted from disk cache + needs to re-download) correctly surfaces as an error.

Acceptance Criteria

AC1: Embedding-callsite in Memory Core's openai-compatible client detects the LM-Studio model-unloaded HTTP 400 error shape and triggers retry-with-warmup-delay.
AC2: Retry count is configurable (default N=3) via aiConfig.openAiCompatible.unloadRetryCount or equivalent named config field.
AC3: Warmup-delay-per-retry is configurable (default 500ms) via aiConfig.openAiCompatible.unloadRetryDelayMs or equivalent.
AC4: After exhausting retries, the original HTTP 400 error propagates unchanged (existing error-path behavior preserved).
AC5: Unit test covering: (a) first-call-succeeds-no-retry path, (b) first-call-fails-second-call-succeeds path with mock client, (c) exhausted-retry-final-failure path. Spec located per unit-test.md canonical convention.
AC6: Diagnostic log entry written via Memory Core's existing logger when retry fires, naming "embedding-provider model-unload detected, retrying" so operator-side observability surfaces the substrate-friction transparency (this also enables monitoring: a spike in retry-log-events would signal LM-Studio-idle-threshold needs tuning).
AC7: Path-asymmetry semantic preserved — add_message (SQLite-only, no embedding dependency) remains unaffected by embedding-provider failures. This ticket fixes the add_memory retry-path WITHOUT changing the add_message fallback contract. (Already true; AC7 documents the no-regression invariant.)

Out of Scope

Daemon-managed embedding endpoint (architectural-shape change to spawn LM Studio under orchestrator-daemon supervision similar to MC Chroma in #11380). That's a broader-scope follow-up ticket if this narrow fix proves insufficient. Likely premature — LM Studio is a desktop UI tool, not a daemon-shaped service, and operators value the desktop-tool ergonomics.
LM-Studio-side configuration mandates (e.g., "operator must disable JIT unload"). Operator-side workaround is documented but not load-bearing — the substrate should be robust to default LM Studio behavior, not require operator-side configuration.
Switching default embedding provider away from openai-compatible to a non-JIT-unload provider. Provider choice is operator-side; this ticket fixes the substrate-friction within the current default.
Gemini-API embedding provider parity changes. This ticket is scoped to the openAiCompatible client path. Gemini API doesn't have the JIT-unload failure mode (managed cloud service); no analogous retry path needed there.

Avoided Traps

Infinite retry loop on persistent failure — rejected. AC4 enforces propagation-after-N-retries to prevent the agent harness from spinning on a permanently-broken embedding provider.
Treating ALL HTTP 400 from embedding as retry-eligible — rejected. The retry MUST specifically detect the LM-Studio model-unloaded error shape (substring match on "Model was unloaded"). Treating generic HTTP 400 as retry-eligible risks masking real configuration bugs (wrong model name, malformed request) under retry-noise.
Hardcoding the LM Studio error shape without operator-tunable detection — rejected as fragile. The error-shape detection should be a small constant in the client code with a JSDoc-comment naming LM Studio as the source. If LM Studio changes the error shape in a future version, the constant gets updated; the architecture doesn't churn.
Sleep-100ms-and-pray — rejected as too short for an 8B model warm-load. 500ms default is empirically-grounded but tunable per AC3. Operators with smaller embedding models (e.g., qwen3-embedding-4b at ≈4B params) may shorten; operators with larger may extend.

Companion substrate ticket: #11380 — daemon-managed MC Chroma; demonstrates the broader "local-supporting-services under orchestrator-daemon supervision" pattern. Future Lane B follow-up could extend this pattern to the embedding provider if narrow retry-fix proves insufficient.
AGENTS.md §0 Invariant 5: "No skipping add_memory at end of turn" — current failure mode breaks this gate under idle-window conditions. Fix restores reliability.
AGENTS.md §4.3 Un-savable Turns: documents the self-DM fallback path; this ticket reduces the frequency of that fallback firing, but doesn't replace it (the fallback remains the last-resort path under permanent embedding-provider failure).
Path-asymmetry architecture: add_message writes to the SQLite graph layer only; add_memory writes to both SQLite + Chroma (embedded vectors). Embedding-provider failures break the latter, not the former. This ticket preserves that boundary.

Origin Session

Origin Session ID: e095c569-beac-4743-998f-e07d4344492e
Empirical anchor message: MESSAGE:3af300ee-661a-40d6-9f4f-37d893668431 (self-DM fallback turn-memory capturing the empirical failure shape verbatim)

Retrieval Hint

Search for LM Studio embedding model unload JIT idle add_memory retry openAiCompatible.

tobiu referenced in commit a07e6f4 - "fix(memory-core): implement retry-on-unload for openAiCompatible embeddings (#11393) (#11394) on May 15, 2026, 7:21 AM

tobiu closed this issue on May 15, 2026, 7:21 AM