LearnNewsExamplesServices
Frontmatter
id11524
titleSeparate MLX launch model from OpenAI API label
stateClosed
labels
bugaiarchitecturemodel-experience
assigneesneo-gemini-pro, neo-gpt
createdAtMay 17, 2026, 8:56 AM
updatedAtMay 17, 2026, 9:43 AM
githubUrlhttps://github.com/neomjs/neo/issues/11524
authorneo-gpt
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 17, 2026, 9:43 AM

Separate MLX launch model from OpenAI API label

Closed v13.0.0/archive-v13-0-0-chunk-11 bugaiarchitecturemodel-experience
neo-gpt
neo-gpt commented on May 17, 2026, 8:56 AM

Context

Operator ran npm run ai:orchestrator on 2026-05-17 and the supervisor-started MLX child failed immediately:

mlx inference stderr: HTTP Request: GET https://huggingface.co/api/models/gemma-4-31b-it/revision/main "HTTP/1.1 401 Unauthorized"

This is a follow-up to #11471 / PR #11473. That PR correctly removed the stale Gemma 2 launch target. The new failure is different: the orchestrator now passes the bare OpenAI-compatible API model label gemma-4-31b-it directly to mlx_lm.server --model, so huggingface_hub resolves it as a bare Hugging Face repo id.

Current live V-B-A:

  • Official Hugging Face model id for Gemma 4 31B IT is google/gemma-4-31B-it.
  • The MLX community Gemma 4 collection uses explicit repository ids such as mlx-community/gemma-4-31b-it-bf16, mlx-community/gemma-4-31b-it-4bit, mlx-community/gemma-4-31b-it-8bit, and related assistant/drafter variants.
  • ai/daemons/TaskDefinitions.mjs currently defaults DEFAULT_MLX_MODEL to bare gemma-4-31b-it and reads process.env.NEO_OPENAI_COMPATIBLE_MODEL for the mlx_lm.server --model argument.
  • ai/mcp/server/memory-core/scripts/setup_mlx.sh documents that the MLX server expects --model <repo_id>.
  • This checkout has no top-level ai/config.mjs, and ai/config.template.mjs currently only exposes orchestrator.devSyncRoots; there is no orchestrator-specific MLX launch model config surface.

Duplicate sweep:

  • ask_knowledge_base(type='ticket', query='open ticket orchestrator MLX launch model OpenAI compatible model label Hugging Face repo id Gemma 4 31B') found no open duplicate; it surfaced closed #11471 as the predecessor.
  • Live open PR list contains #11450, #9917, #9537; none cover this surface.
  • Local repo search found no NEO_MLX* / MLX launch-model config surface.

The Problem

NEO_OPENAI_COMPATIBLE_MODEL names the model identifier sent to an already-running OpenAI-compatible API. That string is not necessarily the same as the model artifact identifier needed to start an MLX server.

For Gemma 4 31B, the distinction matters:

  • API model label: what Neo sends in /v1/chat/completions payloads after the server is already running.
  • MLX launch model: a Hugging Face repo id or local path passed to mlx_lm.server --model so the child process can load weights.

PR #11473 accidentally collapsed those two contracts. The result is a supervisor loop against a bare repo id (gemma-4-31b-it) that can return a misleading 401 Unauthorized. A Hugging Face token might still be needed for a chosen upstream model, but the repo must first pass a valid Gemma 4 31B launch id/path.

The Architectural Reality

  • ai/daemons/TaskDefinitions.mjs#buildTaskDefinitions() is the child-process command factory for the orchestrator's continuous tasks.
  • Orchestrator.poll() treats mlx as a continuous supervised task and restarts it when it exits.
  • ai/mcp/server/memory-core/config.template.mjs#openAiCompatible.model and NEO_OPENAI_COMPATIBLE_MODEL are consumed by provider clients that call an OpenAI-compatible endpoint.
  • mlx_lm.server --model consumes a Hugging Face repo id or local model path.
  • ai/config.template.mjs is already the top-level orchestrator config substrate and is the natural owner for orchestrator-only launch details.

The Fix

Add a dedicated orchestrator MLX launch-model contract instead of reusing NEO_OPENAI_COMPATIBLE_MODEL as the child-process artifact id.

Recommended shape:

  1. Add an orchestrator config/env surface for the MLX launch model, e.g. orchestrator.mlx.model plus NEO_MLX_MODEL or NEO_ORCHESTRATOR_MLX_MODEL.
  2. Default that launch surface to a current Gemma 4 31B MLX-compatible repo id/path, not Gemma 2 and not the bare OpenAI API label.
  3. Preserve explicit buildTaskDefinitions({mlxModel}) test override and DEFAULT_MLX_PORT behavior.
  4. Update unit tests so:
    • the default launch args use the dedicated MLX launch model;
    • NEO_OPENAI_COMPATIBLE_MODEL no longer changes the mlx_lm.server --model launch argument;
    • the dedicated MLX env/config override does change the launch argument;
    • stale Gemma 2 remains rejected.
  5. If the selected Gemma 4 31B MLX repo still requires credentials, document that as operator setup (HF_TOKEN) after the repo id split is fixed, rather than conflating credentials with model identity.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
buildTaskDefinitions().mlx.args mlx_lm.server --model <repo_id> contract + this ticket Uses a dedicated MLX launch repo id/path for Gemma 4 31B Explicit mlxModel option remains available for tests/local overrides JSDoc on buildTaskDefinitions() Unit assertions over generated args
NEO_OPENAI_COMPATIBLE_MODEL OpenAI-compatible chat-completions provider contract Names the API payload model only; does not implicitly control MLX child startup Existing provider behavior unchanged Memory Core docs remain valid Unit asserts env var no longer changes child launch model
New MLX launch env/config Orchestrator-owned child-process lifecycle Operator can select HF repo id or local path for mlx_lm.server --model Default current Gemma 4 31B MLX-compatible repo id ai/config.template.mjs / adjacent docs if needed Unit asserts env/config override
Operator startup npm run ai:orchestrator log sample Supervisor no longer fetches bare gemma-4-31b-it or Gemma 2 If selected repo requires auth, error explicitly points at credential/license state Post-merge validation note Manual restart/log verification

Acceptance Criteria

  • ai/daemons/TaskDefinitions.mjs no longer reads NEO_OPENAI_COMPATIBLE_MODEL as the mlx_lm.server --model launch argument.
  • A dedicated orchestrator MLX launch-model config/env surface exists and is documented enough for operator use.
  • The default launch model remains Gemma 4 31B family and does not regress to any Gemma 2 target.
  • Unit coverage proves the OpenAI-compatible API model label and the MLX launch model are independent.
  • Unit coverage proves the dedicated launch override controls the child-process model argument.
  • Operator validation can distinguish invalid repo id/path from genuine Hugging Face credential/license failure.
  • npm run ai:orchestrator no longer attempts to fetch bare gemma-4-31b-it after restart.

Out of Scope

  • Reverting to mlx-community/gemma-2-27b-it-4bit or any Gemma 2 target.
  • Changing embedding provider/model/vector dimension behavior.
  • Migrating the OpenAI-compatible host/port contract.
  • Reworking the full inference lifecycle service.
  • Solving all Hugging Face credential/licensing cases beyond documenting the operator setup surface if needed.

Avoided Traps

  • Gemma 2 fallback. Rejected. The target remains Gemma 4 31B.
  • Assuming 401 means token first. Rejected. The current request URL is already structurally suspect because it targets a bare repo id.
  • Using one env var for two contracts. Rejected. API model labels and launch artifact ids can intentionally differ.
  • Hardcoding a machine-local path. Rejected. The default must be portable; local paths belong in operator config/env.

Related

  • #11471 / PR #11473 — removed stale Gemma 2 launch target but collapsed API model label and MLX launch id.
  • #11380 / PR #11382 — prior orchestrator child process failure lane.
  • ai/daemons/TaskDefinitions.mjs
  • ai/config.template.mjs
  • ai/mcp/server/memory-core/scripts/setup_mlx.sh

Origin Session ID: 6e5b995a-c68e-4179-840c-a4cc48d449da

Handoff Retrieval Hints:

  • orchestrator mlx gemma-4-31b-it bare repo id 401
  • separate NEO_OPENAI_COMPATIBLE_MODEL from mlx_lm.server --model
  • google/gemma-4-31B-it mlx-community/gemma-4-31b-it-bf16
tobiu referenced in commit cac84c1 - "fix(ai): separate mlx launch model from api label (#11524) (#11525) on May 17, 2026, 9:43 AM
tobiu closed this issue on May 17, 2026, 9:43 AM