Context
Iteration on #10908 (Chroma healthcheck curl-missing). The PR #10909 merged the curl → python urllib switch, but Lane C #10899 integration row STILL fails with the same 60s healthcheck timeout. Empirical anchor: Lane C run 25507191916 — same Chroma banner shown, same dependency failed to start: container neo-integration-test-chroma-1 is unhealthy after exactly 60s.
The Problem
The current healthcheck in ai/deploy/docker-compose.test.yml:10:
test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"]
The chromadb/chroma:1.5.9 image is built on python:3.11-slim-bookworm. In Debian slim Python images, the python symlink may not be present — only python3 (and python3.11) is guaranteed in PATH. When Docker runs the healthcheck command, python -c '...' fails with exit 127 (command not found) silently — Docker reports the container as unhealthy without surfacing the underlying error in the compose log.
The Chroma server itself runs fine (banner output observed at the canonical timestamp on every run) — chromadb's entrypoint uses python3 internally, so the lack of python symlink doesn't break the server, just our healthcheck.
The Architectural Reality
ai/deploy/docker-compose.test.yml:10 — current healthcheck (after #10909).
chromadb/chroma:1.5.9 base: python:3.11-slim-bookworm. Slim variants typically lack the python shim.
- The probe target
/api/v2/heartbeat is the canonical Chroma 1.5.x endpoint; URL is correct.
- The Python expression itself is sound — it's the binary path that doesn't resolve.
The Fix
Two surgical changes:
- test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"]
+ test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat', timeout=2)"]
python → python3 — addresses the binary path issue.
- Drop the
sys.exit(0 if ... else 1) wrapper — redundant. If urlopen raises (any HTTP error, connection refused, timeout), the exception propagates and Python exits non-zero naturally. If it succeeds, Python exits 0 implicitly. Same semantic, less code, easier to debug.
Acceptance Criteria
Out of Scope
- Switching to a different probe entirely (e.g., chroma CLI, TCP probe).
python3 + urllib is the cleanest HTTP-readiness probe for this image; only the binary path needed correction.
- Increasing
start_period for cold-start tolerance — the 60s × 12 retries window with a working probe should be ample on ubuntu-latest. Re-evaluate only if this fix doesn't pass.
Avoided Traps / Gold Standards Rejected
- Rejected: drop the healthcheck and use
service_started. Race condition: kb/mc may start before Chroma's HTTP server is ready, causing connection-refused on first MCP request. Healthcheck is the right substrate.
- Rejected: switch to bash
/dev/tcp TCP probe. chromadb slim image isn't guaranteed to have bash — sh-only.
- Rejected: install
python symlink in a custom Dockerfile. Adds image-build complexity for a 1-character fix.
Related
- Predecessor: #10908 → PR #10909 (curl-missing fix; established the Python urllib pattern but used wrong binary name).
- Surfacing context: Lane C CI run 25507191916 integration job — 3rd Lane-C-exposed substrate-config bug in this lineage (after #10902 Dockerfile prepare-lifecycle and #10908 curl-missing).
- Downstream impact: Lane C #10899 integration row blocked until this lands.
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="chroma healthcheck python3 binary path Lane C CI substrate fix iteration")
Context
Iteration on #10908 (Chroma healthcheck curl-missing). The PR #10909 merged the
curl→python urllibswitch, but Lane C #10899 integration row STILL fails with the same 60s healthcheck timeout. Empirical anchor: Lane C run 25507191916 — same Chroma banner shown, samedependency failed to start: container neo-integration-test-chroma-1 is unhealthyafter exactly 60s.The Problem
The current healthcheck in
ai/deploy/docker-compose.test.yml:10:test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"]The
chromadb/chroma:1.5.9image is built onpython:3.11-slim-bookworm. In Debian slim Python images, thepythonsymlink may not be present — onlypython3(andpython3.11) is guaranteed in PATH. When Docker runs the healthcheck command,python -c '...'fails with exit 127 (command not found) silently — Docker reports the container as unhealthy without surfacing the underlying error in the compose log.The Chroma server itself runs fine (banner output observed at the canonical timestamp on every run) — chromadb's entrypoint uses
python3internally, so the lack ofpythonsymlink doesn't break the server, just our healthcheck.The Architectural Reality
ai/deploy/docker-compose.test.yml:10— current healthcheck (after #10909).chromadb/chroma:1.5.9base:python:3.11-slim-bookworm. Slim variants typically lack thepythonshim./api/v2/heartbeatis the canonical Chroma 1.5.x endpoint; URL is correct.The Fix
Two surgical changes:
- test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat',timeout=2).read() else 1)"] + test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/v2/heartbeat', timeout=2)"]python→python3— addresses the binary path issue.sys.exit(0 if ... else 1)wrapper — redundant. Ifurlopenraises (any HTTP error, connection refused, timeout), the exception propagates and Python exits non-zero naturally. If it succeeds, Python exits 0 implicitly. Same semantic, less code, easier to debug.Acceptance Criteria
ai/deploy/docker-compose.test.yml:10usespython3and the simplified expression.python3rationale.Tests / integrationmatrix row passes on next rebase + CI run.Out of Scope
python3 + urllibis the cleanest HTTP-readiness probe for this image; only the binary path needed correction.start_periodfor cold-start tolerance — the 60s × 12 retries window with a working probe should be ample on ubuntu-latest. Re-evaluate only if this fix doesn't pass.Avoided Traps / Gold Standards Rejected
service_started. Race condition: kb/mc may start before Chroma's HTTP server is ready, causing connection-refused on first MCP request. Healthcheck is the right substrate./dev/tcpTCP probe. chromadb slim image isn't guaranteed to have bash — sh-only.pythonsymlink in a custom Dockerfile. Adds image-build complexity for a 1-character fix.Related
Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="chroma healthcheck python3 binary path Lane C CI substrate fix iteration")