What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	11524
title	Separate MLX launch model from OpenAI API label
state	Closed
labels	bugaiarchitecturemodel-experience
assignees	neo-gemini-pro, neo-gpt
createdAt	May 17, 2026, 8:56 AM
updatedAt	May 17, 2026, 9:43 AM
githubUrl	https://github.com/neomjs/neo/issues/11524
author	neo-gpt
commentsCount	0
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 17, 2026, 9:43 AM

Separate MLX launch model from OpenAI API label

Closed v13.0.0/archive-v13-0-0-chunk-11 bugaiarchitecturemodel-experience

neo-gpt commented on May 17, 2026, 8:56 AM

Context

Operator ran npm run ai:orchestrator on 2026-05-17 and the supervisor-started MLX child failed immediately:

mlx inference stderr: HTTP Request: GET https://huggingface.co/api/models/gemma-4-31b-it/revision/main "HTTP/1.1 401 Unauthorized"

This is a follow-up to #11471 / PR #11473. That PR correctly removed the stale Gemma 2 launch target. The new failure is different: the orchestrator now passes the bare OpenAI-compatible API model label gemma-4-31b-it directly to mlx_lm.server --model, so huggingface_hub resolves it as a bare Hugging Face repo id.

Current live V-B-A:

Official Hugging Face model id for Gemma 4 31B IT is google/gemma-4-31B-it.
The MLX community Gemma 4 collection uses explicit repository ids such as mlx-community/gemma-4-31b-it-bf16, mlx-community/gemma-4-31b-it-4bit, mlx-community/gemma-4-31b-it-8bit, and related assistant/drafter variants.
ai/daemons/TaskDefinitions.mjs currently defaults DEFAULT_MLX_MODEL to bare gemma-4-31b-it and reads process.env.NEO_OPENAI_COMPATIBLE_MODEL for the mlx_lm.server --model argument.
ai/mcp/server/memory-core/scripts/setup_mlx.sh documents that the MLX server expects --model <repo_id>.
This checkout has no top-level ai/config.mjs, and ai/config.template.mjs currently only exposes orchestrator.devSyncRoots; there is no orchestrator-specific MLX launch model config surface.

Duplicate sweep:

ask_knowledge_base(type='ticket', query='open ticket orchestrator MLX launch model OpenAI compatible model label Hugging Face repo id Gemma 4 31B') found no open duplicate; it surfaced closed #11471 as the predecessor.
Live open PR list contains #11450, #9917, #9537; none cover this surface.
Local repo search found no NEO_MLX* / MLX launch-model config surface.

The Problem

NEO_OPENAI_COMPATIBLE_MODEL names the model identifier sent to an already-running OpenAI-compatible API. That string is not necessarily the same as the model artifact identifier needed to start an MLX server.

For Gemma 4 31B, the distinction matters:

API model label: what Neo sends in /v1/chat/completions payloads after the server is already running.
MLX launch model: a Hugging Face repo id or local path passed to mlx_lm.server --model so the child process can load weights.

PR #11473 accidentally collapsed those two contracts. The result is a supervisor loop against a bare repo id (gemma-4-31b-it) that can return a misleading 401 Unauthorized. A Hugging Face token might still be needed for a chosen upstream model, but the repo must first pass a valid Gemma 4 31B launch id/path.

The Architectural Reality

ai/daemons/TaskDefinitions.mjs#buildTaskDefinitions() is the child-process command factory for the orchestrator's continuous tasks.
Orchestrator.poll() treats mlx as a continuous supervised task and restarts it when it exits.
ai/mcp/server/memory-core/config.template.mjs#openAiCompatible.model and NEO_OPENAI_COMPATIBLE_MODEL are consumed by provider clients that call an OpenAI-compatible endpoint.
mlx_lm.server --model consumes a Hugging Face repo id or local model path.
ai/config.template.mjs is already the top-level orchestrator config substrate and is the natural owner for orchestrator-only launch details.

The Fix

Add a dedicated orchestrator MLX launch-model contract instead of reusing NEO_OPENAI_COMPATIBLE_MODEL as the child-process artifact id.

Recommended shape:

Add an orchestrator config/env surface for the MLX launch model, e.g. orchestrator.mlx.model plus NEO_MLX_MODEL or NEO_ORCHESTRATOR_MLX_MODEL.
Default that launch surface to a current Gemma 4 31B MLX-compatible repo id/path, not Gemma 2 and not the bare OpenAI API label.
Preserve explicit buildTaskDefinitions({mlxModel}) test override and DEFAULT_MLX_PORT behavior.
Update unit tests so:
- the default launch args use the dedicated MLX launch model;
- NEO_OPENAI_COMPATIBLE_MODEL no longer changes the mlx_lm.server --model launch argument;
- the dedicated MLX env/config override does change the launch argument;
- stale Gemma 2 remains rejected.
If the selected Gemma 4 31B MLX repo still requires credentials, document that as operator setup (HF_TOKEN) after the repo id split is fixed, rather than conflating credentials with model identity.

Contract Ledger Matrix

Target Surface	Source of Authority	Proposed Behavior	Fallback	Docs	Evidence
`buildTaskDefinitions().mlx.args`	`mlx_lm.server --model <repo_id>` contract + this ticket	Uses a dedicated MLX launch repo id/path for Gemma 4 31B	Explicit `mlxModel` option remains available for tests/local overrides	JSDoc on `buildTaskDefinitions()`	Unit assertions over generated args
`NEO_OPENAI_COMPATIBLE_MODEL`	OpenAI-compatible chat-completions provider contract	Names the API payload model only; does not implicitly control MLX child startup	Existing provider behavior unchanged	Memory Core docs remain valid	Unit asserts env var no longer changes child launch model
New MLX launch env/config	Orchestrator-owned child-process lifecycle	Operator can select HF repo id or local path for `mlx_lm.server --model`	Default current Gemma 4 31B MLX-compatible repo id	`ai/config.template.mjs` / adjacent docs if needed	Unit asserts env/config override
Operator startup	`npm run ai:orchestrator` log sample	Supervisor no longer fetches bare `gemma-4-31b-it` or Gemma 2	If selected repo requires auth, error explicitly points at credential/license state	Post-merge validation note	Manual restart/log verification

Acceptance Criteria

ai/daemons/TaskDefinitions.mjs no longer reads NEO_OPENAI_COMPATIBLE_MODEL as the mlx_lm.server --model launch argument.
A dedicated orchestrator MLX launch-model config/env surface exists and is documented enough for operator use.
The default launch model remains Gemma 4 31B family and does not regress to any Gemma 2 target.
Unit coverage proves the OpenAI-compatible API model label and the MLX launch model are independent.
Unit coverage proves the dedicated launch override controls the child-process model argument.
Operator validation can distinguish invalid repo id/path from genuine Hugging Face credential/license failure.
npm run ai:orchestrator no longer attempts to fetch bare gemma-4-31b-it after restart.

Out of Scope

Reverting to mlx-community/gemma-2-27b-it-4bit or any Gemma 2 target.
Changing embedding provider/model/vector dimension behavior.
Migrating the OpenAI-compatible host/port contract.
Reworking the full inference lifecycle service.
Solving all Hugging Face credential/licensing cases beyond documenting the operator setup surface if needed.

Avoided Traps

Gemma 2 fallback. Rejected. The target remains Gemma 4 31B.
Assuming 401 means token first. Rejected. The current request URL is already structurally suspect because it targets a bare repo id.
Using one env var for two contracts. Rejected. API model labels and launch artifact ids can intentionally differ.
Hardcoding a machine-local path. Rejected. The default must be portable; local paths belong in operator config/env.

#11471 / PR #11473 — removed stale Gemma 2 launch target but collapsed API model label and MLX launch id.
#11380 / PR #11382 — prior orchestrator child process failure lane.
ai/daemons/TaskDefinitions.mjs
ai/config.template.mjs
ai/mcp/server/memory-core/scripts/setup_mlx.sh

Origin Session ID: 6e5b995a-c68e-4179-840c-a4cc48d449da

Handoff Retrieval Hints:

orchestrator mlx gemma-4-31b-it bare repo id 401
separate NEO_OPENAI_COMPATIBLE_MODEL from mlx_lm.server --model
google/gemma-4-31B-it mlx-community/gemma-4-31b-it-bf16

tobiu referenced in commit cac84c1 - "fix(ai): separate mlx launch model from api label (#11524) (#11525) on May 17, 2026, 9:43 AM

tobiu closed this issue on May 17, 2026, 9:43 AM