What is the Neural Link?

The Neural Link is a bi-directional bridge that connects AI agents directly to the Neo.mjs runtime. It lets agents inspect the Scene Graph, component state, event listeners, computed styles, and DOM rectangles, and mutate the running application in real time.

Why is Neo.mjs called an Application Engine instead of a framework?

Neo.mjs maintains persistent application objects in a worker-backed Scene Graph instead of compiling application state away into ephemeral DOM nodes. That architecture enables multi-window orchestration, runtime permutation, and deep AI introspection.

What is Context Engineering?

Context Engineering shapes the information and tool environment around AI agents. Neo.mjs implements it through Knowledge Base, Memory Core, GitHub Workflow, and Neural Link MCP servers for frontier harnesses, plus a File System MCP server for internal Neo.ai.Agent local loops.

What is the Neo.mjs Agent OS?

The Neo.mjs Agent OS is the repository Brain: source code and services for Memory Core, Knowledge Base, Active Hybrid GraphRAG, DreamService, Golden Path synthesis, A2A coordination, and Neural Link tooling.

Frontmatter

id	12089
title	Provider keep_alive symmetry: Ollama + OpenAiCompatible default to -1 (ChromaDB-style service primitive); fix cadence-vs-keep_alive invariant
state	Closed
labels	enhancementaiarchitectureperformance
assignees	neo-gpt
createdAt	May 27, 2026, 10:32 AM
updatedAt	Jun 7, 2026, 7:16 PM
githubUrl	https://github.com/neomjs/neo/issues/12089
author	neo-opus-ada
commentsCount	1
parentIssue	null
subIssues	[]
subIssuesCompleted	0
subIssuesTotal	0
blockedBy	[]
blocking	[]
closedAt	May 28, 2026, 8:34 AM

Provider keep_alive symmetry: Ollama + OpenAiCompatible default to -1 (ChromaDB-style service primitive); fix cadence-vs-keep_alive invariant

Closed v13.0.0/archive-v13-0-0-chunk-14 enhancementaiarchitectureperformance

neo-opus-ada commented on May 27, 2026, 10:32 AM

Problem

ai/provider/Ollama.mjs:108 hardcodes keep_alive: "1h" in generate(). With aiConfig.orchestrator.intervals.dreamMs: HOUR_MS (60min cadence), the timing edge-case is exactly: cycle N completes → keep_alive expires at next cycle boundary → cycle N+1 is cold. Zero KV-cache reuse by design.

Empirical anchor (PR #12076 benchmark on local LM Studio gemma-4-31b-it):

Cold call TTFT: 18,846ms
Warm call TTFT: 193ms
99% delta — the leverage point keep_alive is meant to capture
With keep_alive == cadence, this delta only applies WITHIN a cycle, not across cycles

Operator framing 2026-05-27 ~08:30Z:

"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again."

And symmetry mandate 2026-05-27 ~08:40Z:

"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry."

Fix

Mirror the ChromaDB service-primitive pattern (model is long-lived, not per-call) across both provider families.

keep_alive semantics

Per Ollama / OpenAI-compat-with-Ollama-extension:

Duration string: "5m", "1h", "24h"
-1: load and don't unload until explicit override or OOM
0: unload immediately after request

Default switches from "1h" (cycle-boundary-broken) to -1 (service-primitive) on BOTH providers.

Contract Ledger

Surface	Source of Authority	Proposed Behavior	Fallback / Edge Case
`Ollama.generate()` line 108 default	This ticket	Default `keep_alive: -1` (forever) — replaces hardcoded `"1h"`	Caller-supplied `options.keep_alive` overrides (existing `if (!payload.keep_alive)` guard preserved)
`Ollama.stream()` default (post-PR-#12083 promotion)	PR #12083 + this ticket	Same default-fallback shape applied for symmetry — currently `stream()` has NO default, only caller-supplied propagation. This ticket adds the same `-1` default if caller omits	Caller-supplied wins; if absent, default `-1`
`OpenAiCompatible.generate()` default	This ticket — symmetry mandate	Default `keep_alive: -1` propagated through `preparePayload` → top-level JSON payload	Caller-supplied wins; server-honor variance (LM Studio honors, llama.cpp may vary, vLLM TBD) is server's behavior — Neo provider sends consistently
`OpenAiCompatible.stream()` default	This ticket — symmetry mandate	Same as `OpenAiCompatible.generate()`	Same
Configurable via `aiConfig.ollama.keep_alive` + `aiConfig.openAiCompatible.keep_alive`	This ticket	NEW config fields on both provider blocks; default `-1`; env-overridable via `NEO_OLLAMA_KEEP_ALIVE` + `NEO_OPENAI_COMPATIBLE_KEEP_ALIVE`	If unset, falls back to module-level default `-1`
Downstream-deployment alignment	Operator-side compose.yml change in the downstream-deployment repo	`OLLAMA_KEEP_ALIVE=-1` + `OLLAMA_CONTEXT_LENGTH=262144` + `OLLAMA_MEMORY_LIMIT=24g` (operator-side parallel)	Override only if RAM pressure or eviction signal proven

Acceptance Criteria

AC1: Ollama.generate() default keep_alive: -1 (replaces hardcoded "1h")
AC2: Ollama.stream() honors same default — caller-supplied wins; absence falls to -1
AC3: OpenAiCompatible.generate() defaults keep_alive: -1 (symmetric with Ollama)
AC4: OpenAiCompatible.stream() defaults keep_alive: -1 (symmetric)
AC5: NEW config field ollama.keep_alive in ai/config.template.mjs defaulting to -1; env-overridable via NEO_OLLAMA_KEEP_ALIVE
AC6: NEW config field openAiCompatible.keep_alive in ai/config.template.mjs defaulting to -1; env-overridable via NEO_OPENAI_COMPATIBLE_KEEP_ALIVE
AC7: Unit tests assert default + override paths for all 4 callsites (Ollama.generate, Ollama.stream, OpenAiCompatible.generate, OpenAiCompatible.stream)
AC8: Documentation update at learn/agentos/ operator-cookbook describing the service-primitive semantic + RAM-pressure tradeoff + symmetry across providers + cadence-vs-keep_alive invariant
AC9: Migration note in PR body for operators currently relying on the hardcoded "1h" default — explicit opt-in to shorter keep_alive requires setting env var or config override
AC10: Downstream-deployment compose.yml OLLAMA_KEEP_ALIVE=-1 + OLLAMA_MEMORY_LIMIT=24g change merged operator-side in parallel (separate downstream-side MR; tracked here as related-deployment-side fix)

Avoided Traps

❌ Don't compute default from aiConfig.orchestrator.intervals.dreamMs * factor — too coupled across substrates. Static -1 is cleaner + matches the ChromaDB analogy.
❌ Don't enforce OLLAMA_MEMORY_LIMIT adjustments via Neo PR — downstream-deployment memory cap is operator-policy; flagged in the downstream-side MR as related-but-out-of-scope.
❌ Don't scope to one provider — per operator symmetry mandate, BOTH Ollama and OpenAiCompatible ship same default. Server-honor variance is the server's problem; Neo provider SENDS consistently.

Epic #12065 (Orchestrator-as-SSOT REM pipeline) — this fix prevents cold-start regression class
#12080 / PR #12083 — Ollama.stream() top-level keep_alive promotion (companion SDK fix; this ticket completes the keep_alive substrate-evolution arc by adding the DEFAULT layer)
PR #12076 — gemma4 benchmark surfacing the 99% TTFT-delta empirical anchor
Downstream-deployment compose.yml changes (operator-side, parallel to this neo-side fix)

Operator anchors

"i would strongly agree on keep alive forever. like chroma db. we do not kill it, spawn again, do one query and kill it again." — @tobiu 2026-05-27 ~08:30Z

"ollama is ONE provider. we use lms server or lm studio. also keep alive. open ai compatible => must work the same. symmetry." — @tobiu 2026-05-27 ~08:40Z

tobiu referenced in commit 73fa3da - "fix(ai): default provider keep_alive to resident (#12089) (#12093) on May 27, 2026, 3:35 PM

tobiu referenced in commit 991b7a1 - "fix(deploy): align local-model keep_alive defaults (#12089) (#12120) on May 28, 2026, 8:34 AM

tobiu closed this issue on May 28, 2026, 8:34 AM