LearnNewsExamplesServices
Frontmatter
id10911
titleChroma healthcheck still fails: `python` binary not in PATH (chromadb slim image uses `python3`)
stateClosed
labels
bugairegressionbuild
assigneesneo-opus-4-7
createdAtMay 7, 2026, 6:10 PM
updatedAtMay 9, 2026, 11:15 PM
githubUrlhttps://github.com/neomjs/neo/issues/10911
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 6:17 PM

Chroma healthcheck still fails: python binary not in PATH (chromadb slim image uses python3)

Closedbugairegressionbuild
neo-opus-4-7
neo-opus-4-7 commented on May 7, 2026, 6:10 PM

Context

Iteration on #10908 (Chroma healthcheck curl-missing). The PR #10909 merged the curlpython urllib switch, but Lane C #10899 integration row STILL fails with the same 60s healthcheck timeout. Empirical anchor: Lane C run 25507191916 — same Chroma banner shown, same dependency failed to start: container neo-integration-test-chroma-1 is unhealthy after exactly 60s.

The Problem

The current healthcheck in ai/deploy/docker-compose.test.yml:10:

test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"]

The chromadb/chroma:1.5.9 image is built on python:3.11-slim-bookworm. In Debian slim Python images, the python symlink may not be present — only python3 (and python3.11) is guaranteed in PATH. When Docker runs the healthcheck command, python -c '...' fails with exit 127 (command not found) silently — Docker reports the container as unhealthy without surfacing the underlying error in the compose log.

The Chroma server itself runs fine (banner output observed at the canonical timestamp on every run) — chromadb's entrypoint uses python3 internally, so the lack of python symlink doesn't break the server, just our healthcheck.

The Architectural Reality

  • ai/deploy/docker-compose.test.yml:10 — current healthcheck (after #10909).
  • chromadb/chroma:1.5.9 base: python:3.11-slim-bookworm. Slim variants typically lack the python shim.
  • The probe target /api/v2/heartbeat is the canonical Chroma 1.5.x endpoint; URL is correct.
  • The Python expression itself is sound — it's the binary path that doesn't resolve.

The Fix

Two surgical changes:

-    test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"]
+    test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat', timeout=2)"]
  1. pythonpython3 — addresses the binary path issue.
  2. Drop the sys.exit(0 if ... else 1) wrapper — redundant. If urlopen raises (any HTTP error, connection refused, timeout), the exception propagates and Python exits non-zero naturally. If it succeeds, Python exits 0 implicitly. Same semantic, less code, easier to debug.

Acceptance Criteria

  • ai/deploy/docker-compose.test.yml:10 uses python3 and the simplified expression.
  • Inline comment updated to clarify python3 rationale.
  • Lane C #10899 Tests / integration matrix row passes on next rebase + CI run.

Out of Scope

  • Switching to a different probe entirely (e.g., chroma CLI, TCP probe). python3 + urllib is the cleanest HTTP-readiness probe for this image; only the binary path needed correction.
  • Increasing start_period for cold-start tolerance — the 60s × 12 retries window with a working probe should be ample on ubuntu-latest. Re-evaluate only if this fix doesn't pass.

Avoided Traps / Gold Standards Rejected

  • Rejected: drop the healthcheck and use service_started. Race condition: kb/mc may start before Chroma's HTTP server is ready, causing connection-refused on first MCP request. Healthcheck is the right substrate.
  • Rejected: switch to bash /dev/tcp TCP probe. chromadb slim image isn't guaranteed to have bash — sh-only.
  • Rejected: install python symlink in a custom Dockerfile. Adds image-build complexity for a 1-character fix.

Related

  • Predecessor: #10908 → PR #10909 (curl-missing fix; established the Python urllib pattern but used wrong binary name).
  • Surfacing context: Lane C CI run 25507191916 integration job — 3rd Lane-C-exposed substrate-config bug in this lineage (after #10902 Dockerfile prepare-lifecycle and #10908 curl-missing).
  • Downstream impact: Lane C #10899 integration row blocked until this lands.

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="chroma healthcheck python3 binary path Lane C CI substrate fix iteration")

tobiu referenced in commit 2a897c9 - "fix(deploy): use python3 binary in Chroma healthcheck (#10911) (#10912) on May 7, 2026, 6:17 PM
tobiu closed this issue on May 7, 2026, 6:17 PM