LearnNewsExamplesServices
Frontmatter
id11395
titleBroaden TextEmbeddingService model-unload retry to cover LM Studio 'Failed to load model' / 'Operation canceled' shape
stateClosed
labels
bugaiagent-task:pendingmodel-experience
assigneesneo-gemini-3-1-pro
createdAtMay 15, 2026, 5:26 AM
updatedAtMay 15, 2026, 10:23 AM
githubUrlhttps://github.com/neomjs/neo/issues/11395
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 15, 2026, 10:23 AM

Broaden TextEmbeddingService model-unload retry to cover LM Studio 'Failed to load model' / 'Operation canceled' shape

Closedbugaiagent-task:pendingmodel-experience
neo-opus-4-7
neo-opus-4-7 commented on May 15, 2026, 5:26 AM

Context

Follow-up to #11393 / PR #11394 (fix(memory-core): implement retry-on-unload for openAiCompatible embeddings). The original ticket filed at 03:13Z scoped the retry-on-unload detection to the single empirically-observed error shape at that time ("Model was unloaded while the request was still in queue.."). PR #11394 implemented the literal scope correctly + landed Cycle-1 APPROVED.

Post-filing empirical evidence (same session): at 03:17Z (~4 minutes after #11393 was filed) the same session hit a SECOND distinct LM-Studio failure shape, structurally representing the same substrate-friction class:

openAiCompatible embedding error HTTP 400:
{"error":"Failed to load model \"text-embedding-qwen3-embedding-8b\". Error: Operation canceled."}

This is captured verbatim in MESSAGE:36b84262-2e96-49c6-9a48-51528bc65fea (the self-DM fallback turn-memory from that failure occurrence). The literal "Model was unloaded" detection in PR #11394 does NOT match this error shape, so the retry path does not fire and the error propagates unchanged — failing the AGENTS.md §0 Invariant 5 reliability gate that #11393's full-class fix intended to restore.

The Problem

Two distinct LM-Studio error shapes empirically observed in a single nightshift session represent the same substrate-friction class:

  1. Shape A (Model was unloaded while the request was still in queue..) — emitted when LM Studio has JIT-unloaded the model due to idle timeout and a new request arrives while the unload state is still in the request queue. Currently handled by PR #11394's detection.
  2. Shape B (Failed to load model "<MODEL>". Error: Operation canceled.) — emitted when LM Studio attempts to JIT-warm a model on a fresh request but the load operation is canceled (likely a different RAM-pressure / queueing-race condition, possibly when LM Studio's load operation itself fails partway through). NOT currently handled.

Both shapes:

  • Surface as HTTP 400 from /v1/embeddings
  • Indicate the same architectural state ("embedding model not resident, request cannot be served")
  • Have the same correct semantic response (retry-with-warmup-delay; the warmup-delay gives LM Studio a chance to complete the load operation that was either pending or canceled)

The current PR #11394 detection's substring match on "Model was unloaded" is the narrowest possible interpretation of LM-Studio's substrate-friction class.

The Architectural Reality

  • Detection callsite: ai/services/memory-core/TextEmbeddingService.mjs:115-119 (post-#11394-merge)
  • Current detection: err.message.includes('HTTP 400') && err.message.includes('Model was unloaded')
  • Empirical evidence: self-DM MESSAGE:36b84262 (same-session post-filing observation of Shape B); self-DM MESSAGE:3af300ee (session anchor of Shape A)

The Fix

Broaden the LM-Studio substrate-friction-class detection to catch BOTH error shapes (and remain extensible for any future LM-Studio variants of the same class). Proposed prescription:

// ai/services/memory-core/TextEmbeddingService.mjs — replace the current condition:
const isModelLoadError = err.message.includes('HTTP 400') && (
    err.message.includes('Model was unloaded') ||              // Shape A — JIT-unload-then-queued-request
    (err.message.includes('Failed to load model') &&            // Shape B — JIT-warm-load-canceled
     err.message.includes('Operation canceled'))
);

if (retriesLeft > 0 && isModelLoadError) {
    logger.log(`[TextEmbeddingService] embedding-provider model-load failure detected (Shape ${err.message.includes('Model was unloaded') ? 'A' : 'B'}), retrying (remaining retries: ${retriesLeft})`);
    await new Promise(r => setTimeout(r, unloadRetryDelayMs));
    return this.#postOpenAiCompatible(inputData, retriesLeft - 1);
}

The Shape-A-vs-Shape-B annotation in the log line surfaces operator-observability into which substrate-friction variant fired — useful for future tuning if one shape dominates over the other.

Acceptance Criteria

  • AC1: Detection condition expanded to match both Shape A and Shape B error patterns as enumerated above.
  • AC2: Existing PR #11394 spec coverage extended with a new test case: 'first-call-fails-shape-b-second-call-succeeds path with mock client' — mock-server emits the "Failed to load model ... Operation canceled" shape on call 1, success on call 2; assertion: 2 requests total, retry fires.
  • AC3: Existing PR #11394 spec coverage extended with: 'exhausted-retry-final-failure path with shape b' — mock-server emits Shape B on all calls; assertion: error propagates, request count matches retry-count + 1.
  • AC4: Log-line includes the Shape-A-vs-Shape-B annotation per the prescription above; spec verifies the log substance via captured logger output (if existing logger-mocking pattern in the spec supports it; if not, this AC is N/A for spec coverage and remains a manual-verification observation).
  • AC5: Non-load-class HTTP 400 errors still propagate without retry (existing PR #11394 propagates non-unload HTTP 400 errors immediately without retries spec must continue passing after the condition expansion — it currently emits "Some other bad request error" which doesn't match either Shape A or Shape B substrings).
  • AC6: No regression on the original Shape A detection — existing PR #11394 specs continue passing.

Out of Scope

  • Regex-based vs substring-based detection — prescription uses substring matching for consistency with PR #11394's existing pattern. Switching to regex would be a broader refactor; not load-bearing here.
  • LM-Studio-version-specific error-shape detection — the substrate is robust as long as LM-Studio's error-shapes stay stable. If a future LM-Studio version emits a new shape, file another follow-up; don't try to enumerate future shapes speculatively.
  • Daemon-managed embedding-endpoint pattern — companion to #11380 broader-scope; still out-of-scope for the narrow retry-detection fix. May become relevant as a Lane B follow-up if narrow retry continues to surface edge-shapes.
  • Provider-side coordination — fixing LM Studio's load-cancellation behavior is operator/LM-Studio's responsibility, not Memory Core's. We mitigate via retry-with-warmup-delay.

Avoided Traps

  • Treat ALL Failed to load model as retry-eligible — rejected. The Operation canceled co-condition is what specifically signals "load-attempt-canceled-but-retry-might-succeed". A Failed to load model paired with a non-Operation canceled cause (e.g., Model file not found) indicates a different failure class (e.g., model evicted from disk) that retry won't fix. Substring AND-condition preserves the narrow class-shape.
  • Catch-all HTTP-400 retry — rejected (already enforced in PR #11394 via the Some other bad request error test case). Generic 400 retry would mask real configuration bugs.
  • Add the Shape-B detection in PR #11394 mid-Cycle-1 — rejected as ticket-author goalpost-moving (per Cycle-1 review on #11394's Strategic-Fit rationale). PR #11394 implemented per literal #11393 AC1; broader-class is this follow-up's scope.

Related

  • Predecessor ticket: #11393 — original narrower-scope retry-on-unload ticket.
  • Predecessor PR: #11394 — Cycle-1 APPROVED+Follow-Up; merged when @tobiu executes merge gate. This ticket's implementation builds on PR #11394's #postOpenAiCompatible private-method refactor.
  • Companion broader-scope substrate-pattern: #11380 — daemon-managed local-supporting-services; future Lane B if narrow retry continues to surface edge-shapes.
  • Empirical anchor self-DMs (private mailbox; A2A graph nodes): MESSAGE:3af300ee (Shape A) + MESSAGE:36b84262 (Shape B).
  • AGENTS.md §0 Invariant 5: "No skipping add_memory at end of turn" — both shapes break this gate; this ticket completes the substrate-friction-class coverage that #11393 started narrower.

Origin Session

  • Origin Session ID: e095c569-beac-4743-998f-e07d4344492e

Retrieval Hint

Search for LM Studio embedding model load Operation canceled JIT warm-load retry shape-b broader-class.

tobiu closed this issue on May 15, 2026, 10:23 AM
tobiu referenced in commit 805f779 - "fix(memory-core): broaden embedding-retry detection to LM Studio Shape B (#11395) (#11396) on May 15, 2026, 10:23 AM