Context
Operator ran npm run ai:orchestrator on 2026-05-17 and the supervisor-started MLX child failed immediately:
mlx inference stderr: HTTP Request: GET https://huggingface.co/api/models/gemma-4-31b-it/revision/main "HTTP/1.1 401 Unauthorized"
This is a follow-up to #11471 / PR #11473. That PR correctly removed the stale Gemma 2 launch target. The new failure is different: the orchestrator now passes the bare OpenAI-compatible API model label gemma-4-31b-it directly to mlx_lm.server --model, so huggingface_hub resolves it as a bare Hugging Face repo id.
Current live V-B-A:
- Official Hugging Face model id for Gemma 4 31B IT is
google/gemma-4-31B-it.
- The MLX community Gemma 4 collection uses explicit repository ids such as
mlx-community/gemma-4-31b-it-bf16, mlx-community/gemma-4-31b-it-4bit, mlx-community/gemma-4-31b-it-8bit, and related assistant/drafter variants.
ai/daemons/TaskDefinitions.mjs currently defaults DEFAULT_MLX_MODEL to bare gemma-4-31b-it and reads process.env.NEO_OPENAI_COMPATIBLE_MODEL for the mlx_lm.server --model argument.
ai/mcp/server/memory-core/scripts/setup_mlx.sh documents that the MLX server expects --model <repo_id>.
- This checkout has no top-level
ai/config.mjs, and ai/config.template.mjs currently only exposes orchestrator.devSyncRoots; there is no orchestrator-specific MLX launch model config surface.
Duplicate sweep:
ask_knowledge_base(type='ticket', query='open ticket orchestrator MLX launch model OpenAI compatible model label Hugging Face repo id Gemma 4 31B') found no open duplicate; it surfaced closed #11471 as the predecessor.
- Live open PR list contains #11450, #9917, #9537; none cover this surface.
- Local repo search found no
NEO_MLX* / MLX launch-model config surface.
The Problem
NEO_OPENAI_COMPATIBLE_MODEL names the model identifier sent to an already-running OpenAI-compatible API. That string is not necessarily the same as the model artifact identifier needed to start an MLX server.
For Gemma 4 31B, the distinction matters:
- API model label: what Neo sends in
/v1/chat/completions payloads after the server is already running.
- MLX launch model: a Hugging Face repo id or local path passed to
mlx_lm.server --model so the child process can load weights.
PR #11473 accidentally collapsed those two contracts. The result is a supervisor loop against a bare repo id (gemma-4-31b-it) that can return a misleading 401 Unauthorized. A Hugging Face token might still be needed for a chosen upstream model, but the repo must first pass a valid Gemma 4 31B launch id/path.
The Architectural Reality
ai/daemons/TaskDefinitions.mjs#buildTaskDefinitions() is the child-process command factory for the orchestrator's continuous tasks.
Orchestrator.poll() treats mlx as a continuous supervised task and restarts it when it exits.
ai/mcp/server/memory-core/config.template.mjs#openAiCompatible.model and NEO_OPENAI_COMPATIBLE_MODEL are consumed by provider clients that call an OpenAI-compatible endpoint.
mlx_lm.server --model consumes a Hugging Face repo id or local model path.
ai/config.template.mjs is already the top-level orchestrator config substrate and is the natural owner for orchestrator-only launch details.
The Fix
Add a dedicated orchestrator MLX launch-model contract instead of reusing NEO_OPENAI_COMPATIBLE_MODEL as the child-process artifact id.
Recommended shape:
- Add an orchestrator config/env surface for the MLX launch model, e.g.
orchestrator.mlx.model plus NEO_MLX_MODEL or NEO_ORCHESTRATOR_MLX_MODEL.
- Default that launch surface to a current Gemma 4 31B MLX-compatible repo id/path, not Gemma 2 and not the bare OpenAI API label.
- Preserve explicit
buildTaskDefinitions({mlxModel}) test override and DEFAULT_MLX_PORT behavior.
- Update unit tests so:
- the default launch args use the dedicated MLX launch model;
NEO_OPENAI_COMPATIBLE_MODEL no longer changes the mlx_lm.server --model launch argument;
- the dedicated MLX env/config override does change the launch argument;
- stale Gemma 2 remains rejected.
- If the selected Gemma 4 31B MLX repo still requires credentials, document that as operator setup (
HF_TOKEN) after the repo id split is fixed, rather than conflating credentials with model identity.
Contract Ledger Matrix
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback |
Docs |
Evidence |
buildTaskDefinitions().mlx.args |
mlx_lm.server --model <repo_id> contract + this ticket |
Uses a dedicated MLX launch repo id/path for Gemma 4 31B |
Explicit mlxModel option remains available for tests/local overrides |
JSDoc on buildTaskDefinitions() |
Unit assertions over generated args |
NEO_OPENAI_COMPATIBLE_MODEL |
OpenAI-compatible chat-completions provider contract |
Names the API payload model only; does not implicitly control MLX child startup |
Existing provider behavior unchanged |
Memory Core docs remain valid |
Unit asserts env var no longer changes child launch model |
| New MLX launch env/config |
Orchestrator-owned child-process lifecycle |
Operator can select HF repo id or local path for mlx_lm.server --model |
Default current Gemma 4 31B MLX-compatible repo id |
ai/config.template.mjs / adjacent docs if needed |
Unit asserts env/config override |
| Operator startup |
npm run ai:orchestrator log sample |
Supervisor no longer fetches bare gemma-4-31b-it or Gemma 2 |
If selected repo requires auth, error explicitly points at credential/license state |
Post-merge validation note |
Manual restart/log verification |
Acceptance Criteria
Out of Scope
- Reverting to
mlx-community/gemma-2-27b-it-4bit or any Gemma 2 target.
- Changing embedding provider/model/vector dimension behavior.
- Migrating the OpenAI-compatible host/port contract.
- Reworking the full inference lifecycle service.
- Solving all Hugging Face credential/licensing cases beyond documenting the operator setup surface if needed.
Avoided Traps
- Gemma 2 fallback. Rejected. The target remains Gemma 4 31B.
- Assuming
401 means token first. Rejected. The current request URL is already structurally suspect because it targets a bare repo id.
- Using one env var for two contracts. Rejected. API model labels and launch artifact ids can intentionally differ.
- Hardcoding a machine-local path. Rejected. The default must be portable; local paths belong in operator config/env.
Related
- #11471 / PR #11473 — removed stale Gemma 2 launch target but collapsed API model label and MLX launch id.
- #11380 / PR #11382 — prior orchestrator child process failure lane.
ai/daemons/TaskDefinitions.mjs
ai/config.template.mjs
ai/mcp/server/memory-core/scripts/setup_mlx.sh
Origin Session ID: 6e5b995a-c68e-4179-840c-a4cc48d449da
Handoff Retrieval Hints:
orchestrator mlx gemma-4-31b-it bare repo id 401
separate NEO_OPENAI_COMPATIBLE_MODEL from mlx_lm.server --model
google/gemma-4-31B-it mlx-community/gemma-4-31b-it-bf16
Context
Operator ran
npm run ai:orchestratoron 2026-05-17 and the supervisor-started MLX child failed immediately:This is a follow-up to #11471 / PR #11473. That PR correctly removed the stale Gemma 2 launch target. The new failure is different: the orchestrator now passes the bare OpenAI-compatible API model label
gemma-4-31b-itdirectly tomlx_lm.server --model, sohuggingface_hubresolves it as a bare Hugging Face repo id.Current live V-B-A:
google/gemma-4-31B-it.mlx-community/gemma-4-31b-it-bf16,mlx-community/gemma-4-31b-it-4bit,mlx-community/gemma-4-31b-it-8bit, and related assistant/drafter variants.ai/daemons/TaskDefinitions.mjscurrently defaultsDEFAULT_MLX_MODELto baregemma-4-31b-itand readsprocess.env.NEO_OPENAI_COMPATIBLE_MODELfor themlx_lm.server --modelargument.ai/mcp/server/memory-core/scripts/setup_mlx.shdocuments that the MLX server expects--model <repo_id>.ai/config.mjs, andai/config.template.mjscurrently only exposesorchestrator.devSyncRoots; there is no orchestrator-specific MLX launch model config surface.Duplicate sweep:
ask_knowledge_base(type='ticket', query='open ticket orchestrator MLX launch model OpenAI compatible model label Hugging Face repo id Gemma 4 31B')found no open duplicate; it surfaced closed #11471 as the predecessor.NEO_MLX*/ MLX launch-model config surface.The Problem
NEO_OPENAI_COMPATIBLE_MODELnames the model identifier sent to an already-running OpenAI-compatible API. That string is not necessarily the same as the model artifact identifier needed to start an MLX server.For Gemma 4 31B, the distinction matters:
/v1/chat/completionspayloads after the server is already running.mlx_lm.server --modelso the child process can load weights.PR #11473 accidentally collapsed those two contracts. The result is a supervisor loop against a bare repo id (
gemma-4-31b-it) that can return a misleading401 Unauthorized. A Hugging Face token might still be needed for a chosen upstream model, but the repo must first pass a valid Gemma 4 31B launch id/path.The Architectural Reality
ai/daemons/TaskDefinitions.mjs#buildTaskDefinitions()is the child-process command factory for the orchestrator's continuous tasks.Orchestrator.poll()treatsmlxas a continuous supervised task and restarts it when it exits.ai/mcp/server/memory-core/config.template.mjs#openAiCompatible.modelandNEO_OPENAI_COMPATIBLE_MODELare consumed by provider clients that call an OpenAI-compatible endpoint.mlx_lm.server --modelconsumes a Hugging Face repo id or local model path.ai/config.template.mjsis already the top-level orchestrator config substrate and is the natural owner for orchestrator-only launch details.The Fix
Add a dedicated orchestrator MLX launch-model contract instead of reusing
NEO_OPENAI_COMPATIBLE_MODELas the child-process artifact id.Recommended shape:
orchestrator.mlx.modelplusNEO_MLX_MODELorNEO_ORCHESTRATOR_MLX_MODEL.buildTaskDefinitions({mlxModel})test override andDEFAULT_MLX_PORTbehavior.NEO_OPENAI_COMPATIBLE_MODELno longer changes themlx_lm.server --modellaunch argument;HF_TOKEN) after the repo id split is fixed, rather than conflating credentials with model identity.Contract Ledger Matrix
buildTaskDefinitions().mlx.argsmlx_lm.server --model <repo_id>contract + this ticketmlxModeloption remains available for tests/local overridesbuildTaskDefinitions()NEO_OPENAI_COMPATIBLE_MODELmlx_lm.server --modelai/config.template.mjs/ adjacent docs if needednpm run ai:orchestratorlog samplegemma-4-31b-itor Gemma 2Acceptance Criteria
ai/daemons/TaskDefinitions.mjsno longer readsNEO_OPENAI_COMPATIBLE_MODELas themlx_lm.server --modellaunch argument.npm run ai:orchestratorno longer attempts to fetch baregemma-4-31b-itafter restart.Out of Scope
mlx-community/gemma-2-27b-it-4bitor any Gemma 2 target.Avoided Traps
401means token first. Rejected. The current request URL is already structurally suspect because it targets a bare repo id.Related
ai/daemons/TaskDefinitions.mjsai/config.template.mjsai/mcp/server/memory-core/scripts/setup_mlx.shOrigin Session ID: 6e5b995a-c68e-4179-840c-a4cc48d449da
Handoff Retrieval Hints:
orchestrator mlx gemma-4-31b-it bare repo id 401separate NEO_OPENAI_COMPATIBLE_MODEL from mlx_lm.server --modelgoogle/gemma-4-31B-it mlx-community/gemma-4-31b-it-bf16