Problem
ai/provider/Ollama.mjs:108 hardcodes keep_alive: "1h" in generate(). With aiConfig.orchestrator.intervals.dreamMs: HOUR_MS (60min cadence), the timing edge-case is exactly: cycle N completes → keep_alive expires at next cycle boundary → cycle N+1 is cold. Zero KV-cache reuse by design.
Empirical anchor (PR #12076 benchmark on local LM Studio gemma-4-31b-it):
- Cold call TTFT: 18,846ms
- Warm call TTFT: 193ms
- 99% delta — the leverage point keep_alive is meant to capture
- With
keep_alive == cadence, this delta only applies WITHIN a cycle, not across cycles
Operator framing 2026-05-27 ~08:30Z:
"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again."
And symmetry mandate 2026-05-27 ~08:40Z:
"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry."
Fix
Mirror the ChromaDB service-primitive pattern (model is long-lived, not per-call) across both provider families.
keep_alive semantics
Per Ollama / OpenAI-compat-with-Ollama-extension:
- Duration string:
"5m", "1h", "24h"
-1: load and don't unload until explicit override or OOM
0: unload immediately after request
Default switches from "1h" (cycle-boundary-broken) to -1 (service-primitive) on BOTH providers.
Contract Ledger
| Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Ollama.generate() line 108 default |
This ticket |
Default keep_alive: -1 (forever) — replaces hardcoded "1h" |
Caller-supplied options.keep_alive overrides (existing if (!payload.keep_alive) guard preserved) |
Ollama.stream() default (post-PR-#12083 promotion) |
PR #12083 + this ticket |
Same default-fallback shape applied for symmetry — currently stream() has NO default, only caller-supplied propagation. This ticket adds the same -1 default if caller omits |
Caller-supplied wins; if absent, default -1 |
OpenAiCompatible.generate() default |
This ticket — symmetry mandate |
Default keep_alive: -1 propagated through preparePayload → top-level JSON payload |
Caller-supplied wins; server-honor variance (LM Studio honors, llama.cpp may vary, vLLM TBD) is server's behavior — Neo provider sends consistently |
OpenAiCompatible.stream() default |
This ticket — symmetry mandate |
Same as OpenAiCompatible.generate() |
Same |
Configurable via aiConfig.ollama.keep_alive + aiConfig.openAiCompatible.keep_alive |
This ticket |
NEW config fields on both provider blocks; default -1; env-overridable via NEO_OLLAMA_KEEP_ALIVE + NEO_OPENAI_COMPATIBLE_KEEP_ALIVE |
If unset, falls back to module-level default -1 |
| Downstream-deployment alignment |
Operator-side compose.yml change in the downstream-deployment repo |
OLLAMA_KEEP_ALIVE=-1 + OLLAMA_CONTEXT_LENGTH=262144 + OLLAMA_MEMORY_LIMIT=24g (operator-side parallel) |
Override only if RAM pressure or eviction signal proven |
Acceptance Criteria
Avoided Traps
- ❌ Don't compute default from
aiConfig.orchestrator.intervals.dreamMs * factor — too coupled across substrates. Static -1 is cleaner + matches the ChromaDB analogy.
- ❌ Don't enforce
OLLAMA_MEMORY_LIMIT adjustments via Neo PR — downstream-deployment memory cap is operator-policy; flagged in the downstream-side MR as related-but-out-of-scope.
- ❌ Don't scope to one provider — per operator symmetry mandate, BOTH Ollama and OpenAiCompatible ship same default. Server-honor variance is the server's problem; Neo provider SENDS consistently.
Related
- Epic #12065 (Orchestrator-as-SSOT REM pipeline) — this fix prevents cold-start regression class
- #12080 / PR #12083 — Ollama.stream() top-level keep_alive promotion (companion SDK fix; this ticket completes the keep_alive substrate-evolution arc by adding the DEFAULT layer)
- PR #12076 — gemma4 benchmark surfacing the 99% TTFT-delta empirical anchor
- Downstream-deployment compose.yml changes (operator-side, parallel to this neo-side fix)
Operator anchors
"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again." — @tobiu 2026-05-27 ~08:30Z
"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry." — @tobiu 2026-05-27 ~08:40Z
Problem
ai/provider/Ollama.mjs:108hardcodeskeep_alive: "1h"ingenerate(). WithaiConfig.orchestrator.intervals.dreamMs: HOUR_MS(60min cadence), the timing edge-case is exactly: cycle N completes → keep_alive expires at next cycle boundary → cycle N+1 is cold. Zero KV-cache reuse by design.Empirical anchor (PR #12076 benchmark on local LM Studio gemma-4-31b-it):
keep_alive == cadence, this delta only applies WITHIN a cycle, not across cyclesOperator framing 2026-05-27 ~08:30Z:
And symmetry mandate 2026-05-27 ~08:40Z:
Fix
Mirror the ChromaDB service-primitive pattern (model is long-lived, not per-call) across both provider families.
keep_alive semantics
Per Ollama / OpenAI-compat-with-Ollama-extension:
"5m","1h","24h"-1: load and don't unload until explicit override or OOM0: unload immediately after requestDefault switches from
"1h"(cycle-boundary-broken) to-1(service-primitive) on BOTH providers.Contract Ledger
Ollama.generate()line 108 defaultkeep_alive: -1(forever) — replaces hardcoded"1h"options.keep_aliveoverrides (existingif (!payload.keep_alive)guard preserved)Ollama.stream()default (post-PR-#12083 promotion)stream()has NO default, only caller-supplied propagation. This ticket adds the same-1default if caller omits-1OpenAiCompatible.generate()defaultkeep_alive: -1propagated throughpreparePayload→ top-level JSON payloadOpenAiCompatible.stream()defaultOpenAiCompatible.generate()aiConfig.ollama.keep_alive+aiConfig.openAiCompatible.keep_alive-1; env-overridable viaNEO_OLLAMA_KEEP_ALIVE+NEO_OPENAI_COMPATIBLE_KEEP_ALIVE-1OLLAMA_KEEP_ALIVE=-1+OLLAMA_CONTEXT_LENGTH=262144+OLLAMA_MEMORY_LIMIT=24g(operator-side parallel)Acceptance Criteria
Ollama.generate()defaultkeep_alive: -1(replaces hardcoded"1h")Ollama.stream()honors same default — caller-supplied wins; absence falls to-1OpenAiCompatible.generate()defaultskeep_alive: -1(symmetric with Ollama)OpenAiCompatible.stream()defaultskeep_alive: -1(symmetric)ollama.keep_aliveinai/config.template.mjsdefaulting to-1; env-overridable viaNEO_OLLAMA_KEEP_ALIVEopenAiCompatible.keep_aliveinai/config.template.mjsdefaulting to-1; env-overridable viaNEO_OPENAI_COMPATIBLE_KEEP_ALIVElearn/agentos/operator-cookbook describing the service-primitive semantic + RAM-pressure tradeoff + symmetry across providers + cadence-vs-keep_alive invariant"1h"default — explicit opt-in to shorter keep_alive requires setting env var or config overrideOLLAMA_KEEP_ALIVE=-1+OLLAMA_MEMORY_LIMIT=24gchange merged operator-side in parallel (separate downstream-side MR; tracked here as related-deployment-side fix)Avoided Traps
aiConfig.orchestrator.intervals.dreamMs * factor— too coupled across substrates. Static-1is cleaner + matches the ChromaDB analogy.OLLAMA_MEMORY_LIMITadjustments via Neo PR — downstream-deployment memory cap is operator-policy; flagged in the downstream-side MR as related-but-out-of-scope.Related
Operator anchors