Context
@tobiu surfaced this gap during 2026-05-07 lead-coordination handoff: "a full integration test for deployment pipelines and heartbeats would add immense value." The deployment-pipeline portion is now staged via #10895 (Lane A residuals — tenant isolation + auth rejection). The heartbeat portion — sustained liveness over time, the property no single-shot probe catches — has no current coverage.
Lane A's healthcheck.spec.mjs (#10893) verifies the initial /healthcheck shape across KB and MC. It does not verify that the deployed stack stays healthy under sustained load, or that providers (Chroma connection, embedding API, summary credentials) remain valid past warmup.
The Problem
Several real failure modes are silently uncovered by single-shot health probes:
- Connection-pool exhaustion / chroma client leak. A misbehaving provider that never releases connections fails after N requests, not on the first one. Single-shot specs miss this.
- Token / credential expiry post-warmup. The
summary.credential.configured: true assertion in the healthcheck shape is a static snapshot. A regression that loses the credential mid-process (e.g., env-var stripped on reload, OIDC token expiry without refresh) fails later, not at boot.
- Provider degradation without status-flip. Embedding endpoint goes slow but doesn't error; healthcheck still reports
healthy because the connection is fine. Detection requires latency-over-time observation.
- Cross-process clock skew. Sustained-liveness across KB + MC means each server's
uptime field grows monotonically per call — a spec that asserts this catches process-restart bugs that single-shot misses entirely.
- Memory leaks visible only at scale. No spec currently observes that the deployed stack's resource consumption stays bounded under repeated MCP calls.
The single-shot pattern is structurally inadequate for liveness; liveness is a property over time, not a state at one timestamp. The current substrate has no helper for time-spanning assertions, so each spec author would re-derive sleep/poll/aggregate logic — predictable copy-paste rot.
The Architectural Reality
- Existing surface (
test/playwright/integration/):
healthcheck.spec.mjs — single-shot, calls each server once via StreamableHTTPClientTransport + client.callTool({name: 'healthcheck'}).
playwright.config.integration.mjs has timeout: 120000 (120s) — sufficient for short sustained windows; longer would need per-test override.
- Healthcheck payload structure (KB + MC, observed at boot today):
status: 'healthy' | 'degraded' | ...
database.connection.connected: boolean
database.connection.collections.{...}.count: number — should monotonically grow or stay stable, never decrease unexpectedly.
providers.embedding.{active, error?} — error field is the canary.
providers.summary.credential.configured: boolean — should remain true after warmup.
uptime: number — monotonic per server process.
- No existing time-spanning helpers in
test/playwright/integration/ or shared test/playwright/util/. KB confirmed via ask_knowledge_base — no precedent.
The Fix
Two deliveries in one PR (substrate primitive + first consumer):
1. Sustained-liveness assertion helper
New file: test/playwright/integration/util/assertSustainedHealth.mjs
export async function assertSustainedHealth(client, options) {
}
The helper is the canonical primitive every future heartbeat-style integration spec re-uses. Returns the aggregate so specs can layer additional property assertions on top.
2. Heartbeat propagation spec
New file: test/playwright/integration/HeartbeatPropagation.integration.spec.mjs
- Connect MCP clients to KB and MC (re-uses Lane A's
composeWebServer.mjs readiness).
- Run
assertSustainedHealth against each server in parallel:
- Window: 30s (well under the 120s test timeout, leaves headroom).
- Interval: 1s (30 samples per server).
- Thresholds:
successRate >= 1.0, p95LatencyMs < 500, maxConsecutiveFailures: 0.
- Cross-server property assertions:
- Monotonic uptime — each server's
uptime grows monotonically across samples (catches mid-test process restarts).
- Provider stability — no
providers.*.error field appears at any sample.
- Credential persistence —
providers.summary.credential.configured stays true across all 30 MC samples.
- Connection persistence —
database.connection.connected stays true across all 30 samples on both servers.
3. Optional refactor (out-of-PR if cleanup grows scope)
Extend healthcheck.spec.mjs to add a follow-up test() block invoking assertSustainedHealth with a 5s/1s window — proves the helper composes with the existing single-shot spec. Defer if it pushes the PR over a clean threshold; file as quick follow-up.
Contract Ledger (T3)
| Target Surface |
Source of Authority |
Proposed Behavior |
Fallback / Edge Case |
Docs |
Evidence |
test/playwright/integration/util/assertSustainedHealth.mjs (new) |
This ticket; substrate-symmetric to existing composeWebServer.mjs fixture pattern; @tobiu's "deployment pipelines + heartbeats" framing |
Async helper that samples MCP healthcheck tool over a configurable duration/interval window. Aggregates latency percentiles, success rate, monotonic uptime, provider error appearance. Throws AssertionError if thresholds violated; returns {samples, summary} on success for caller-side property layering. |
If MCP transport drops mid-window, helper records the failure in samples and continues — surfaces the rate, doesn't abort early. Caller may opt-in to fast-fail via maxConsecutiveFailures: 1. Throws aggregated diagnostic context (timestamps, latency curve, provider state at each violation) for actionable failure reports. |
Cross-link from cookbook (learn/agentos/DeploymentCookbook.md) Section 8 once the helper is the canonical liveness primitive. |
L2 — helper exercised by HeartbeatPropagation.integration.spec.mjs; pattern verified across multiple consumers (heartbeat spec + optional healthcheck.spec.mjs extension). |
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs (new) |
This ticket; @tobiu's lead-coordination directive 2026-05-07; structural complement to single-shot healthcheck.spec.mjs |
Connects MCP clients to KB+MC; invokes assertSustainedHealth with 30s window / 1s interval / successRate>=1.0, p95<500ms, maxConsecutiveFailures:0. Layers cross-server property assertions: monotonic uptime, provider stability, credential persistence, connection persistence. Catches connection-pool leaks, token expiry, provider degradation, mid-test process restarts. |
Skip-with-warning when Docker daemon unavailable per existing readiness gate. Window-size and threshold tunable via env vars (NEO_HEARTBEAT_WINDOW_MS, NEO_HEARTBEAT_INTERVAL_MS) for CI vs local-dev cadence. Default window safely under playwright.config.integration.mjs timeout:120000. |
Cross-link from SharedDeployment.md §Healthcheck Verification — adds reference to this spec as the canonical liveness contract. |
L2 — 30 samples per server captured per run; threshold violations surface aggregated diagnostic; spec runs in npm run test-integration. |
Acceptance Criteria
Out of Scope
- Tenant-isolation scenarios — covered by sibling Lane A ticket #10895.
- CI workflow execution of
npm run test-integration — covered by sibling Lane C ticket (filed concurrently).
- Bridge-daemon wake heartbeat verification — different layer (in-process wake substrate per ADR-0002, if filed; cross-process delivery, not deployed-server liveness). Future ticket if value emerges.
- Real cloud staging environment heartbeat — operator-territory infra; this ticket is the local Docker-stack proof.
- Memory profiling / process metrics observation — needs OS-level instrumentation (RSS, file descriptors). The four cross-server property assertions cover the protocol-layer leak signals; OS-layer sampling is a separate substrate.
Avoided Traps / Gold Standards Rejected
- Rejected: a one-shot
HeartbeatPropagation spec that just runs a few sequential calls. Defeats the purpose. The whole point is that liveness is a property over time — the helper-shape is the canonical move that lets future specs compose new properties.
- Rejected: extend
healthcheck.spec.mjs directly. Would mix single-shot and sustained-shape concerns in one file. Sibling specs with shared helper is the elegant separation.
- Rejected: poll forever with no upper bound. Would block CI indefinitely on a pathological failure. Bounded window with explicit thresholds is the testable shape.
- Rejected: use
setInterval inside the helper. Race-prone with assertion-throwing flow. Async loop with await new Promise(r => setTimeout(r, interval)) is the correct primitive — every iteration explicitly observed.
- Rejected: hardcode thresholds. Env-var tunability allows CI to ratchet stricter than local-dev without code change. Cookbook precedent (
NEO_INTEGRATION_* env-var family).
Related
- Sibling lanes (filed concurrently): #10895 Lane A (tenant isolation + auth rejection), Lane C (CI test-matrix workflow).
- Substrate dependencies: PR #10880 (Docker artifacts), PR #10893 (integration harness Lane A vertical slice).
- Cookbook cross-link target:
learn/agentos/DeploymentCookbook.md Section 8.
- Architecture cross-link target:
learn/agentos/SharedDeployment.md §Healthcheck Verification.
- Operator framing: lead-coordination handoff 2026-05-07 — "deployment pipelines + heartbeats" → this ticket covers the heartbeat dimension.
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="sustained-liveness heartbeat helper integration spec deployed-stack property over time")
Context
@tobiu surfaced this gap during 2026-05-07 lead-coordination handoff: "a full integration test for deployment pipelines and heartbeats would add immense value." The deployment-pipeline portion is now staged via #10895 (Lane A residuals — tenant isolation + auth rejection). The heartbeat portion — sustained liveness over time, the property no single-shot probe catches — has no current coverage.
Lane A's
healthcheck.spec.mjs(#10893) verifies the initial/healthcheckshape across KB and MC. It does not verify that the deployed stack stays healthy under sustained load, or that providers (Chroma connection, embedding API, summary credentials) remain valid past warmup.The Problem
Several real failure modes are silently uncovered by single-shot health probes:
summary.credential.configured: trueassertion in the healthcheck shape is a static snapshot. A regression that loses the credential mid-process (e.g., env-var stripped on reload, OIDC token expiry without refresh) fails later, not at boot.healthybecause the connection is fine. Detection requires latency-over-time observation.uptimefield grows monotonically per call — a spec that asserts this catches process-restart bugs that single-shot misses entirely.The single-shot pattern is structurally inadequate for liveness; liveness is a property over time, not a state at one timestamp. The current substrate has no helper for time-spanning assertions, so each spec author would re-derive sleep/poll/aggregate logic — predictable copy-paste rot.
The Architectural Reality
test/playwright/integration/):healthcheck.spec.mjs— single-shot, calls each server once viaStreamableHTTPClientTransport+client.callTool({name: 'healthcheck'}).playwright.config.integration.mjshastimeout: 120000(120s) — sufficient for short sustained windows; longer would need per-test override.status: 'healthy' | 'degraded' | ...database.connection.connected: booleandatabase.connection.collections.{...}.count: number— should monotonically grow or stay stable, never decrease unexpectedly.providers.embedding.{active, error?}—errorfield is the canary.providers.summary.credential.configured: boolean— should remaintrueafter warmup.uptime: number— monotonic per server process.test/playwright/integration/or sharedtest/playwright/util/. KB confirmed viaask_knowledge_base— no precedent.The Fix
Two deliveries in one PR (substrate primitive + first consumer):
1. Sustained-liveness assertion helper
New file:
test/playwright/integration/util/assertSustainedHealth.mjs/** * @param {Client} client - connected MCP client * @param {Object} options * @param {number} options.duration - total observation window in ms * @param {number} options.interval - delay between probes in ms * @param {Object} options.thresholds - { successRate, p95LatencyMs, maxConsecutiveFailures } * @returns {Promise<{samples, summary}>} */ export async function assertSustainedHealth(client, options) { // Sample healthcheck on a fixed cadence; aggregate latency p50/p95/p99, // success rate, monotonic uptime, provider error appearance. // Throws AssertionError if any threshold violated; returns aggregate on success. }The helper is the canonical primitive every future heartbeat-style integration spec re-uses. Returns the aggregate so specs can layer additional property assertions on top.
2. Heartbeat propagation spec
New file:
test/playwright/integration/HeartbeatPropagation.integration.spec.mjscomposeWebServer.mjsreadiness).assertSustainedHealthagainst each server in parallel:successRate >= 1.0,p95LatencyMs < 500,maxConsecutiveFailures: 0.uptimegrows monotonically across samples (catches mid-test process restarts).providers.*.errorfield appears at any sample.providers.summary.credential.configuredstaystrueacross all 30 MC samples.database.connection.connectedstaystrueacross all 30 samples on both servers.3. Optional refactor (out-of-PR if cleanup grows scope)
Extend
healthcheck.spec.mjsto add a follow-uptest()block invokingassertSustainedHealthwith a 5s/1s window — proves the helper composes with the existing single-shot spec. Defer if it pushes the PR over a clean threshold; file as quick follow-up.Contract Ledger (T3)
test/playwright/integration/util/assertSustainedHealth.mjs(new)composeWebServer.mjsfixture pattern; @tobiu's "deployment pipelines + heartbeats" framinghealthchecktool over a configurable duration/interval window. Aggregates latency percentiles, success rate, monotonic uptime, provider error appearance. ThrowsAssertionErrorif thresholds violated; returns{samples, summary}on success for caller-side property layering.maxConsecutiveFailures: 1. Throws aggregated diagnostic context (timestamps, latency curve, provider state at each violation) for actionable failure reports.learn/agentos/DeploymentCookbook.md) Section 8 once the helper is the canonical liveness primitive.HeartbeatPropagation.integration.spec.mjs; pattern verified across multiple consumers (heartbeat spec + optional healthcheck.spec.mjs extension).test/playwright/integration/HeartbeatPropagation.integration.spec.mjs(new)healthcheck.spec.mjsassertSustainedHealthwith 30s window / 1s interval /successRate>=1.0, p95<500ms, maxConsecutiveFailures:0. Layers cross-server property assertions: monotonic uptime, provider stability, credential persistence, connection persistence. Catches connection-pool leaks, token expiry, provider degradation, mid-test process restarts.NEO_HEARTBEAT_WINDOW_MS,NEO_HEARTBEAT_INTERVAL_MS) for CI vs local-dev cadence. Default window safely underplaywright.config.integration.mjstimeout:120000.SharedDeployment.md§Healthcheck Verification — adds reference to this spec as the canonical liveness contract.npm run test-integration.Acceptance Criteria
test/playwright/integration/util/assertSustainedHealth.mjsexists, exports the helper per Ledger row 1, with JSDoc per.github/CODING_GUIDELINES.md.test/playwright/integration/HeartbeatPropagation.integration.spec.mjsships with 30s/1s sustained-liveness window covering all four cross-server property assertions per Ledger row 2.npm run test-integration.learn/agentos/SharedDeployment.md§Healthcheck Verification cross-links to the new spec (one-line ref).Out of Scope
npm run test-integration— covered by sibling Lane C ticket (filed concurrently).Avoided Traps / Gold Standards Rejected
HeartbeatPropagationspec that just runs a few sequential calls. Defeats the purpose. The whole point is that liveness is a property over time — the helper-shape is the canonical move that lets future specs compose new properties.healthcheck.spec.mjsdirectly. Would mix single-shot and sustained-shape concerns in one file. Sibling specs with shared helper is the elegant separation.setIntervalinside the helper. Race-prone with assertion-throwing flow. Async loop withawait new Promise(r => setTimeout(r, interval))is the correct primitive — every iteration explicitly observed.NEO_INTEGRATION_*env-var family).Related
learn/agentos/DeploymentCookbook.mdSection 8.learn/agentos/SharedDeployment.md§Healthcheck Verification.Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="sustained-liveness heartbeat helper integration spec deployed-stack property over time")