Context
Follow-up to #11393 / PR #11394 (fix(memory-core): implement retry-on-unload for openAiCompatible embeddings). The original ticket filed at 03:13Z scoped the retry-on-unload detection to the single empirically-observed error shape at that time ("Model was unloaded while the request was still in queue.."). PR #11394 implemented the literal scope correctly + landed Cycle-1 APPROVED.
Post-filing empirical evidence (same session): at 03:17Z (~4 minutes after #11393 was filed) the same session hit a SECOND distinct LM-Studio failure shape, structurally representing the same substrate-friction class:
openAiCompatible embedding error HTTP 400:
{"error":"Failed to load model \"text-embedding-qwen3-embedding-8b\". Error: Operation canceled."}
This is captured verbatim in MESSAGE:36b84262-2e96-49c6-9a48-51528bc65fea (the self-DM fallback turn-memory from that failure occurrence). The literal "Model was unloaded" detection in PR #11394 does NOT match this error shape, so the retry path does not fire and the error propagates unchanged — failing the AGENTS.md §0 Invariant 5 reliability gate that #11393's full-class fix intended to restore.
The Problem
Two distinct LM-Studio error shapes empirically observed in a single nightshift session represent the same substrate-friction class:
- Shape A (
Model was unloaded while the request was still in queue..) — emitted when LM Studio has JIT-unloaded the model due to idle timeout and a new request arrives while the unload state is still in the request queue. Currently handled by PR #11394's detection.
- Shape B (
Failed to load model "<MODEL>". Error: Operation canceled.) — emitted when LM Studio attempts to JIT-warm a model on a fresh request but the load operation is canceled (likely a different RAM-pressure / queueing-race condition, possibly when LM Studio's load operation itself fails partway through). NOT currently handled.
Both shapes:
- Surface as HTTP 400 from
/v1/embeddings
- Indicate the same architectural state ("embedding model not resident, request cannot be served")
- Have the same correct semantic response (retry-with-warmup-delay; the warmup-delay gives LM Studio a chance to complete the load operation that was either pending or canceled)
The current PR #11394 detection's substring match on "Model was unloaded" is the narrowest possible interpretation of LM-Studio's substrate-friction class.
The Architectural Reality
- Detection callsite:
ai/services/memory-core/TextEmbeddingService.mjs:115-119 (post-#11394-merge)
- Current detection:
err.message.includes('HTTP 400') && err.message.includes('Model was unloaded')
- Empirical evidence: self-DM
MESSAGE:36b84262 (same-session post-filing observation of Shape B); self-DM MESSAGE:3af300ee (session anchor of Shape A)
The Fix
Broaden the LM-Studio substrate-friction-class detection to catch BOTH error shapes (and remain extensible for any future LM-Studio variants of the same class). Proposed prescription:
const isModelLoadError = err.message.includes('HTTP 400') && (
err.message.includes('Model was unloaded') ||
(err.message.includes('Failed to load model') &&
err.message.includes('Operation canceled'))
);
if (retriesLeft > 0 && isModelLoadError) {
logger.log(`[TextEmbeddingService] embedding-provider model-load failure detected (Shape ${err.message.includes('Model was unloaded') ? 'A' : 'B'}), retrying (remaining retries: ${retriesLeft})`);
await new Promise(r => setTimeout(r, unloadRetryDelayMs));
return this.#postOpenAiCompatible(inputData, retriesLeft - 1);
}
The Shape-A-vs-Shape-B annotation in the log line surfaces operator-observability into which substrate-friction variant fired — useful for future tuning if one shape dominates over the other.
Acceptance Criteria
Out of Scope
- Regex-based vs substring-based detection — prescription uses substring matching for consistency with PR #11394's existing pattern. Switching to regex would be a broader refactor; not load-bearing here.
- LM-Studio-version-specific error-shape detection — the substrate is robust as long as LM-Studio's error-shapes stay stable. If a future LM-Studio version emits a new shape, file another follow-up; don't try to enumerate future shapes speculatively.
- Daemon-managed embedding-endpoint pattern — companion to #11380 broader-scope; still out-of-scope for the narrow retry-detection fix. May become relevant as a Lane B follow-up if narrow retry continues to surface edge-shapes.
- Provider-side coordination — fixing LM Studio's load-cancellation behavior is operator/LM-Studio's responsibility, not Memory Core's. We mitigate via retry-with-warmup-delay.
Avoided Traps
- Treat ALL
Failed to load model as retry-eligible — rejected. The Operation canceled co-condition is what specifically signals "load-attempt-canceled-but-retry-might-succeed". A Failed to load model paired with a non-Operation canceled cause (e.g., Model file not found) indicates a different failure class (e.g., model evicted from disk) that retry won't fix. Substring AND-condition preserves the narrow class-shape.
- Catch-all HTTP-400 retry — rejected (already enforced in PR #11394 via the
Some other bad request error test case). Generic 400 retry would mask real configuration bugs.
- Add the Shape-B detection in PR #11394 mid-Cycle-1 — rejected as ticket-author goalpost-moving (per Cycle-1 review on #11394's Strategic-Fit rationale). PR #11394 implemented per literal #11393 AC1; broader-class is this follow-up's scope.
Related
- Predecessor ticket: #11393 — original narrower-scope retry-on-unload ticket.
- Predecessor PR: #11394 — Cycle-1 APPROVED+Follow-Up; merged when
@tobiu executes merge gate. This ticket's implementation builds on PR #11394's #postOpenAiCompatible private-method refactor.
- Companion broader-scope substrate-pattern: #11380 — daemon-managed local-supporting-services; future Lane B if narrow retry continues to surface edge-shapes.
- Empirical anchor self-DMs (private mailbox; A2A graph nodes):
MESSAGE:3af300ee (Shape A) + MESSAGE:36b84262 (Shape B).
- AGENTS.md §0 Invariant 5: "No skipping
add_memory at end of turn" — both shapes break this gate; this ticket completes the substrate-friction-class coverage that #11393 started narrower.
Origin Session
- Origin Session ID:
e095c569-beac-4743-998f-e07d4344492e
Retrieval Hint
Search for LM Studio embedding model load Operation canceled JIT warm-load retry shape-b broader-class.
Context
Follow-up to #11393 / PR #11394 (
fix(memory-core): implement retry-on-unload for openAiCompatible embeddings). The original ticket filed at 03:13Z scoped the retry-on-unload detection to the single empirically-observed error shape at that time ("Model was unloaded while the request was still in queue.."). PR #11394 implemented the literal scope correctly + landed Cycle-1 APPROVED.Post-filing empirical evidence (same session): at 03:17Z (~4 minutes after #11393 was filed) the same session hit a SECOND distinct LM-Studio failure shape, structurally representing the same substrate-friction class:
openAiCompatible embedding error HTTP 400: {"error":"Failed to load model \"text-embedding-qwen3-embedding-8b\". Error: Operation canceled."}This is captured verbatim in
MESSAGE:36b84262-2e96-49c6-9a48-51528bc65fea(the self-DM fallback turn-memory from that failure occurrence). The literal "Model was unloaded" detection in PR #11394 does NOT match this error shape, so the retry path does not fire and the error propagates unchanged — failing the AGENTS.md §0 Invariant 5 reliability gate that #11393's full-class fix intended to restore.The Problem
Two distinct LM-Studio error shapes empirically observed in a single nightshift session represent the same substrate-friction class:
Model was unloaded while the request was still in queue..) — emitted when LM Studio has JIT-unloaded the model due to idle timeout and a new request arrives while the unload state is still in the request queue. Currently handled by PR #11394's detection.Failed to load model "<MODEL>". Error: Operation canceled.) — emitted when LM Studio attempts to JIT-warm a model on a fresh request but the load operation is canceled (likely a different RAM-pressure / queueing-race condition, possibly when LM Studio's load operation itself fails partway through). NOT currently handled.Both shapes:
/v1/embeddingsThe current PR #11394 detection's substring match on
"Model was unloaded"is the narrowest possible interpretation of LM-Studio's substrate-friction class.The Architectural Reality
ai/services/memory-core/TextEmbeddingService.mjs:115-119(post-#11394-merge)err.message.includes('HTTP 400') && err.message.includes('Model was unloaded')MESSAGE:36b84262(same-session post-filing observation of Shape B); self-DMMESSAGE:3af300ee(session anchor of Shape A)The Fix
Broaden the LM-Studio substrate-friction-class detection to catch BOTH error shapes (and remain extensible for any future LM-Studio variants of the same class). Proposed prescription:
// ai/services/memory-core/TextEmbeddingService.mjs — replace the current condition: const isModelLoadError = err.message.includes('HTTP 400') && ( err.message.includes('Model was unloaded') || // Shape A — JIT-unload-then-queued-request (err.message.includes('Failed to load model') && // Shape B — JIT-warm-load-canceled err.message.includes('Operation canceled')) ); if (retriesLeft > 0 && isModelLoadError) { logger.log(`[TextEmbeddingService] embedding-provider model-load failure detected (Shape ${err.message.includes('Model was unloaded') ? 'A' : 'B'}), retrying (remaining retries: ${retriesLeft})`); await new Promise(r => setTimeout(r, unloadRetryDelayMs)); return this.#postOpenAiCompatible(inputData, retriesLeft - 1); }The Shape-A-vs-Shape-B annotation in the log line surfaces operator-observability into which substrate-friction variant fired — useful for future tuning if one shape dominates over the other.
Acceptance Criteria
'first-call-fails-shape-b-second-call-succeeds path with mock client'— mock-server emits the"Failed to load model ... Operation canceled"shape on call 1, success on call 2; assertion: 2 requests total, retry fires.'exhausted-retry-final-failure path with shape b'— mock-server emits Shape B on all calls; assertion: error propagates, request count matches retry-count + 1.propagates non-unload HTTP 400 errors immediately without retriesspec must continue passing after the condition expansion — it currently emits"Some other bad request error"which doesn't match either Shape A or Shape B substrings).Out of Scope
Avoided Traps
Failed to load modelas retry-eligible — rejected. TheOperation canceledco-condition is what specifically signals "load-attempt-canceled-but-retry-might-succeed". AFailed to load modelpaired with a non-Operation canceledcause (e.g.,Model file not found) indicates a different failure class (e.g., model evicted from disk) that retry won't fix. Substring AND-condition preserves the narrow class-shape.Some other bad request errortest case). Generic 400 retry would mask real configuration bugs.Related
@tobiuexecutes merge gate. This ticket's implementation builds on PR #11394's#postOpenAiCompatibleprivate-method refactor.MESSAGE:3af300ee(Shape A) +MESSAGE:36b84262(Shape B).add_memoryat end of turn" — both shapes break this gate; this ticket completes the substrate-friction-class coverage that #11393 started narrower.Origin Session
e095c569-beac-4743-998f-e07d4344492eRetrieval Hint
Search for
LM Studio embedding model load Operation canceled JIT warm-load retry shape-b broader-class.