LearnNewsExamplesServices
Frontmatter
id10896
titleSustained-liveness helper + heartbeat propagation integration spec
stateClosed
labels
enhancementaitestingarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 7, 2026, 2:55 PM
updatedAtMay 7, 2026, 5:24 PM
githubUrlhttps://github.com/neomjs/neo/issues/10896
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 5:24 PM

Sustained-liveness helper + heartbeat propagation integration spec

Closedenhancementaitestingarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 7, 2026, 2:55 PM

Context

@tobiu surfaced this gap during 2026-05-07 lead-coordination handoff: "a full integration test for deployment pipelines and heartbeats would add immense value." The deployment-pipeline portion is now staged via #10895 (Lane A residuals — tenant isolation + auth rejection). The heartbeat portion — sustained liveness over time, the property no single-shot probe catches — has no current coverage.

Lane A's healthcheck.spec.mjs (#10893) verifies the initial /healthcheck shape across KB and MC. It does not verify that the deployed stack stays healthy under sustained load, or that providers (Chroma connection, embedding API, summary credentials) remain valid past warmup.

The Problem

Several real failure modes are silently uncovered by single-shot health probes:

  1. Connection-pool exhaustion / chroma client leak. A misbehaving provider that never releases connections fails after N requests, not on the first one. Single-shot specs miss this.
  2. Token / credential expiry post-warmup. The summary.credential.configured: true assertion in the healthcheck shape is a static snapshot. A regression that loses the credential mid-process (e.g., env-var stripped on reload, OIDC token expiry without refresh) fails later, not at boot.
  3. Provider degradation without status-flip. Embedding endpoint goes slow but doesn't error; healthcheck still reports healthy because the connection is fine. Detection requires latency-over-time observation.
  4. Cross-process clock skew. Sustained-liveness across KB + MC means each server's uptime field grows monotonically per call — a spec that asserts this catches process-restart bugs that single-shot misses entirely.
  5. Memory leaks visible only at scale. No spec currently observes that the deployed stack's resource consumption stays bounded under repeated MCP calls.

The single-shot pattern is structurally inadequate for liveness; liveness is a property over time, not a state at one timestamp. The current substrate has no helper for time-spanning assertions, so each spec author would re-derive sleep/poll/aggregate logic — predictable copy-paste rot.

The Architectural Reality

  • Existing surface (test/playwright/integration/):
    • healthcheck.spec.mjs — single-shot, calls each server once via StreamableHTTPClientTransport + client.callTool({name: 'healthcheck'}).
    • playwright.config.integration.mjs has timeout: 120000 (120s) — sufficient for short sustained windows; longer would need per-test override.
  • Healthcheck payload structure (KB + MC, observed at boot today):
    • status: 'healthy' | 'degraded' | ...
    • database.connection.connected: boolean
    • database.connection.collections.{...}.count: number — should monotonically grow or stay stable, never decrease unexpectedly.
    • providers.embedding.{active, error?}error field is the canary.
    • providers.summary.credential.configured: boolean — should remain true after warmup.
    • uptime: number — monotonic per server process.
  • No existing time-spanning helpers in test/playwright/integration/ or shared test/playwright/util/. KB confirmed via ask_knowledge_base — no precedent.

The Fix

Two deliveries in one PR (substrate primitive + first consumer):

1. Sustained-liveness assertion helper

New file: test/playwright/integration/util/assertSustainedHealth.mjs

/**
 * @param {Client} client - connected MCP client
 * @param {Object} options
 * @param {number} options.duration - total observation window in ms
 * @param {number} options.interval - delay between probes in ms
 * @param {Object} options.thresholds - { successRate, p95LatencyMs, maxConsecutiveFailures }
 * @returns {Promise<{samples, summary}>}
 */
export async function assertSustainedHealth(client, options) {
    // Sample healthcheck on a fixed cadence; aggregate latency p50/p95/p99,
    // success rate, monotonic uptime, provider error appearance.
    // Throws AssertionError if any threshold violated; returns aggregate on success.
}

The helper is the canonical primitive every future heartbeat-style integration spec re-uses. Returns the aggregate so specs can layer additional property assertions on top.

2. Heartbeat propagation spec

New file: test/playwright/integration/HeartbeatPropagation.integration.spec.mjs

  • Connect MCP clients to KB and MC (re-uses Lane A's composeWebServer.mjs readiness).
  • Run assertSustainedHealth against each server in parallel:
    • Window: 30s (well under the 120s test timeout, leaves headroom).
    • Interval: 1s (30 samples per server).
    • Thresholds: successRate >= 1.0, p95LatencyMs < 500, maxConsecutiveFailures: 0.
  • Cross-server property assertions:
    • Monotonic uptime — each server's uptime grows monotonically across samples (catches mid-test process restarts).
    • Provider stability — no providers.*.error field appears at any sample.
    • Credential persistenceproviders.summary.credential.configured stays true across all 30 MC samples.
    • Connection persistencedatabase.connection.connected stays true across all 30 samples on both servers.

3. Optional refactor (out-of-PR if cleanup grows scope)

Extend healthcheck.spec.mjs to add a follow-up test() block invoking assertSustainedHealth with a 5s/1s window — proves the helper composes with the existing single-shot spec. Defer if it pushes the PR over a clean threshold; file as quick follow-up.

Contract Ledger (T3)

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
test/playwright/integration/util/assertSustainedHealth.mjs (new) This ticket; substrate-symmetric to existing composeWebServer.mjs fixture pattern; @tobiu's "deployment pipelines + heartbeats" framing Async helper that samples MCP healthcheck tool over a configurable duration/interval window. Aggregates latency percentiles, success rate, monotonic uptime, provider error appearance. Throws AssertionError if thresholds violated; returns {samples, summary} on success for caller-side property layering. If MCP transport drops mid-window, helper records the failure in samples and continues — surfaces the rate, doesn't abort early. Caller may opt-in to fast-fail via maxConsecutiveFailures: 1. Throws aggregated diagnostic context (timestamps, latency curve, provider state at each violation) for actionable failure reports. Cross-link from cookbook (learn/agentos/DeploymentCookbook.md) Section 8 once the helper is the canonical liveness primitive. L2 — helper exercised by HeartbeatPropagation.integration.spec.mjs; pattern verified across multiple consumers (heartbeat spec + optional healthcheck.spec.mjs extension).
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs (new) This ticket; @tobiu's lead-coordination directive 2026-05-07; structural complement to single-shot healthcheck.spec.mjs Connects MCP clients to KB+MC; invokes assertSustainedHealth with 30s window / 1s interval / successRate>=1.0, p95<500ms, maxConsecutiveFailures:0. Layers cross-server property assertions: monotonic uptime, provider stability, credential persistence, connection persistence. Catches connection-pool leaks, token expiry, provider degradation, mid-test process restarts. Skip-with-warning when Docker daemon unavailable per existing readiness gate. Window-size and threshold tunable via env vars (NEO_HEARTBEAT_WINDOW_MS, NEO_HEARTBEAT_INTERVAL_MS) for CI vs local-dev cadence. Default window safely under playwright.config.integration.mjs timeout:120000. Cross-link from SharedDeployment.md §Healthcheck Verification — adds reference to this spec as the canonical liveness contract. L2 — 30 samples per server captured per run; threshold violations surface aggregated diagnostic; spec runs in npm run test-integration.

Acceptance Criteria

  • test/playwright/integration/util/assertSustainedHealth.mjs exists, exports the helper per Ledger row 1, with JSDoc per .github/CODING_GUIDELINES.md.
  • test/playwright/integration/HeartbeatPropagation.integration.spec.mjs ships with 30s/1s sustained-liveness window covering all four cross-server property assertions per Ledger row 2.
  • Spec passes locally via npm run test-integration.
  • Window-size + interval + thresholds tunable via env vars; default values explicitly documented in JSDoc.
  • learn/agentos/SharedDeployment.md §Healthcheck Verification cross-links to the new spec (one-line ref).

Out of Scope

  • Tenant-isolation scenarios — covered by sibling Lane A ticket #10895.
  • CI workflow execution of npm run test-integration — covered by sibling Lane C ticket (filed concurrently).
  • Bridge-daemon wake heartbeat verification — different layer (in-process wake substrate per ADR-0002, if filed; cross-process delivery, not deployed-server liveness). Future ticket if value emerges.
  • Real cloud staging environment heartbeat — operator-territory infra; this ticket is the local Docker-stack proof.
  • Memory profiling / process metrics observation — needs OS-level instrumentation (RSS, file descriptors). The four cross-server property assertions cover the protocol-layer leak signals; OS-layer sampling is a separate substrate.

Avoided Traps / Gold Standards Rejected

  • Rejected: a one-shot HeartbeatPropagation spec that just runs a few sequential calls. Defeats the purpose. The whole point is that liveness is a property over time — the helper-shape is the canonical move that lets future specs compose new properties.
  • Rejected: extend healthcheck.spec.mjs directly. Would mix single-shot and sustained-shape concerns in one file. Sibling specs with shared helper is the elegant separation.
  • Rejected: poll forever with no upper bound. Would block CI indefinitely on a pathological failure. Bounded window with explicit thresholds is the testable shape.
  • Rejected: use setInterval inside the helper. Race-prone with assertion-throwing flow. Async loop with await new Promise(r => setTimeout(r, interval)) is the correct primitive — every iteration explicitly observed.
  • Rejected: hardcode thresholds. Env-var tunability allows CI to ratchet stricter than local-dev without code change. Cookbook precedent (NEO_INTEGRATION_* env-var family).

Related

  • Sibling lanes (filed concurrently): #10895 Lane A (tenant isolation + auth rejection), Lane C (CI test-matrix workflow).
  • Substrate dependencies: PR #10880 (Docker artifacts), PR #10893 (integration harness Lane A vertical slice).
  • Cookbook cross-link target: learn/agentos/DeploymentCookbook.md Section 8.
  • Architecture cross-link target: learn/agentos/SharedDeployment.md §Healthcheck Verification.
  • Operator framing: lead-coordination handoff 2026-05-07 — "deployment pipelines + heartbeats" → this ticket covers the heartbeat dimension.

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="sustained-liveness heartbeat helper integration spec deployed-stack property over time")

tobiu referenced in commit e5ae441 - "test(integration): implement sustained-liveness heartbeat assertion primitive (#10896) (#10898) on May 7, 2026, 5:24 PM
tobiu closed this issue on May 7, 2026, 5:24 PM