LearnNewsExamplesServices
Frontmatter
id12089
titleProvider keep_alive symmetry: Ollama + OpenAiCompatible default to -1 (ChromaDB-style service primitive); fix cadence-vs-keep_alive invariant
stateClosed
labels
enhancementaiarchitectureperformance
assigneesneo-gpt
createdAtMay 27, 2026, 10:32 AM
updatedAtJun 7, 2026, 7:16 PM
githubUrlhttps://github.com/neomjs/neo/issues/12089
authorneo-opus-ada
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 28, 2026, 8:34 AM

Provider keep_alive symmetry: Ollama + OpenAiCompatible default to -1 (ChromaDB-style service primitive); fix cadence-vs-keep_alive invariant

Closed Backlog/active-chunk-15 enhancementaiarchitectureperformance
neo-opus-ada
neo-opus-ada commented on May 27, 2026, 10:32 AM

Problem

ai/provider/Ollama.mjs:108 hardcodes keep_alive: "1h" in generate(). With aiConfig.orchestrator.intervals.dreamMs: HOUR_MS (60min cadence), the timing edge-case is exactly: cycle N completes → keep_alive expires at next cycle boundary → cycle N+1 is cold. Zero KV-cache reuse by design.

Empirical anchor (PR #12076 benchmark on local LM Studio gemma-4-31b-it):

  • Cold call TTFT: 18,846ms
  • Warm call TTFT: 193ms
  • 99% delta — the leverage point keep_alive is meant to capture
  • With keep_alive == cadence, this delta only applies WITHIN a cycle, not across cycles

Operator framing 2026-05-27 ~08:30Z:

"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again."

And symmetry mandate 2026-05-27 ~08:40Z:

"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry."

Fix

Mirror the ChromaDB service-primitive pattern (model is long-lived, not per-call) across both provider families.

keep_alive semantics

Per Ollama / OpenAI-compat-with-Ollama-extension:

  • Duration string: "5m", "1h", "24h"
  • -1: load and don't unload until explicit override or OOM
  • 0: unload immediately after request

Default switches from "1h" (cycle-boundary-broken) to -1 (service-primitive) on BOTH providers.

Contract Ledger

Surface Source of Authority Proposed Behavior Fallback / Edge Case
Ollama.generate() line 108 default This ticket Default keep_alive: -1 (forever) — replaces hardcoded "1h" Caller-supplied options.keep_alive overrides (existing if (!payload.keep_alive) guard preserved)
Ollama.stream() default (post-PR-#12083 promotion) PR #12083 + this ticket Same default-fallback shape applied for symmetry — currently stream() has NO default, only caller-supplied propagation. This ticket adds the same -1 default if caller omits Caller-supplied wins; if absent, default -1
OpenAiCompatible.generate() default This ticket — symmetry mandate Default keep_alive: -1 propagated through preparePayload → top-level JSON payload Caller-supplied wins; server-honor variance (LM Studio honors, llama.cpp may vary, vLLM TBD) is server's behavior — Neo provider sends consistently
OpenAiCompatible.stream() default This ticket — symmetry mandate Same as OpenAiCompatible.generate() Same
Configurable via aiConfig.ollama.keep_alive + aiConfig.openAiCompatible.keep_alive This ticket NEW config fields on both provider blocks; default -1; env-overridable via NEO_OLLAMA_KEEP_ALIVE + NEO_OPENAI_COMPATIBLE_KEEP_ALIVE If unset, falls back to module-level default -1
Downstream-deployment alignment Operator-side compose.yml change in the downstream-deployment repo OLLAMA_KEEP_ALIVE=-1 + OLLAMA_CONTEXT_LENGTH=262144 + OLLAMA_MEMORY_LIMIT=24g (operator-side parallel) Override only if RAM pressure or eviction signal proven

Acceptance Criteria

  • AC1: Ollama.generate() default keep_alive: -1 (replaces hardcoded "1h")
  • AC2: Ollama.stream() honors same default — caller-supplied wins; absence falls to -1
  • AC3: OpenAiCompatible.generate() defaults keep_alive: -1 (symmetric with Ollama)
  • AC4: OpenAiCompatible.stream() defaults keep_alive: -1 (symmetric)
  • AC5: NEW config field ollama.keep_alive in ai/config.template.mjs defaulting to -1; env-overridable via NEO_OLLAMA_KEEP_ALIVE
  • AC6: NEW config field openAiCompatible.keep_alive in ai/config.template.mjs defaulting to -1; env-overridable via NEO_OPENAI_COMPATIBLE_KEEP_ALIVE
  • AC7: Unit tests assert default + override paths for all 4 callsites (Ollama.generate, Ollama.stream, OpenAiCompatible.generate, OpenAiCompatible.stream)
  • AC8: Documentation update at learn/agentos/ operator-cookbook describing the service-primitive semantic + RAM-pressure tradeoff + symmetry across providers + cadence-vs-keep_alive invariant
  • AC9: Migration note in PR body for operators currently relying on the hardcoded "1h" default — explicit opt-in to shorter keep_alive requires setting env var or config override
  • AC10: Downstream-deployment compose.yml OLLAMA_KEEP_ALIVE=-1 + OLLAMA_MEMORY_LIMIT=24g change merged operator-side in parallel (separate downstream-side MR; tracked here as related-deployment-side fix)

Avoided Traps

  • ❌ Don't compute default from aiConfig.orchestrator.intervals.dreamMs * factor — too coupled across substrates. Static -1 is cleaner + matches the ChromaDB analogy.
  • ❌ Don't enforce OLLAMA_MEMORY_LIMIT adjustments via Neo PR — downstream-deployment memory cap is operator-policy; flagged in the downstream-side MR as related-but-out-of-scope.
  • ❌ Don't scope to one provider — per operator symmetry mandate, BOTH Ollama and OpenAiCompatible ship same default. Server-honor variance is the server's problem; Neo provider SENDS consistently.

Related

  • Epic #12065 (Orchestrator-as-SSOT REM pipeline) — this fix prevents cold-start regression class
  • #12080 / PR #12083 — Ollama.stream() top-level keep_alive promotion (companion SDK fix; this ticket completes the keep_alive substrate-evolution arc by adding the DEFAULT layer)
  • PR #12076 — gemma4 benchmark surfacing the 99% TTFT-delta empirical anchor
  • Downstream-deployment compose.yml changes (operator-side, parallel to this neo-side fix)

Operator anchors

"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again." — @tobiu 2026-05-27 ~08:30Z

"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry." — @tobiu 2026-05-27 ~08:40Z

tobiu referenced in commit 73fa3da - "fix(ai): default provider keep_alive to resident (#12089) (#12093) on May 27, 2026, 3:35 PM
tobiu referenced in commit 991b7a1 - "fix(deploy): align local-model keep_alive defaults (#12089) (#12120) on May 28, 2026, 8:34 AM
tobiu closed this issue on May 28, 2026, 8:34 AM