Context
Surfaced 2026-05-07 by Lane C #10899 integration row CI on rebased head 56ed56e92 after the full substrate-fix cascade (#10904 + #10914 + #10916) merged. HeartbeatPropagation.integration.spec.mjs:11 fails because two consecutive samples report identical uptime: 0.362441034 values, breaking the toBeGreaterThan(prev.uptime) monotonic check.
Per agreement with @tobiu, deferring this with a NEO_TEST_SKIP_CI skip-guard to unblock Lane C, with this dedicated ticket for separate investigation.
The Problem
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs:11:5 › "Sustained healthcheck property assertions (30s window)"
Symptom:
Error: assertSustainedHealth: Exceeded max consecutive failures (0). Last error:
expect(received).toBeGreaterThan(expected)
Expected: > 0.362441034
Received: 0.362441034
Two consecutive healthcheck samples 1s apart return the EXACT same uptime field value (0.362441034).
What's surprising:
- Both
HealthService.mjs:764 (MC) and HealthService.mjs:197 (KB) implement uptime: process.uptime() — Node.js's wall-clock-since-process-start. Should ALWAYS advance.
- 0.362 seconds is a suspiciously LOW uptime — suggests the process JUST started, OR something is capturing/freezing the value between calls.
Two Hypotheses
Hypothesis A: Per-session McpServer regression from #10916
#10916 introduced per-session McpServer factory pattern. Each callHealthcheck call creates a NEW client → connects → triggers createMcpServer() → new McpServer instance. If the healthcheck handler is somehow capturing process.uptime() at HANDLER REGISTRATION time (when McpServer is created) rather than CALL time, two close-in-time sessions might return cached values.
Counter-evidence: the basic healthcheck.spec.mjs PASSES, and the Sustained-liveness 5s/1s composability check (against the SAME healthcheck) also PASSES. So uptime values must be advancing for those at least.
Hypothesis B: Test-spec strictness / timing edge case
toBeGreaterThan is strict; two samples within sub-microsecond timing could potentially return identical process.uptime() values on slow CI hardware. toBeGreaterThanOrEqual would be more forgiving.
Counter-evidence: the value 0.362441034 has 9 decimal places (nanosecond precision) — sub-microsecond timing collisions should be vanishingly rare even on slow CI.
The Architectural Reality
The Fix (Investigation Required)
Phase 1: confirm which hypothesis (A or B) by adding diagnostic logging — print all samples[i].uptime values before the failing assertion. If the SAME 0.362441034 appears for samples 1, 2, 3, ... → Hypothesis A (per-session capture). If only 1 and 2 collide and others advance → Hypothesis B (test strictness).
Phase 2A (if Hypothesis A): fix the per-session McpServer healthcheck capture pattern in #10916's substrate. Likely HealthService.mjs reading uptime at handler registration instead of call.
Phase 2B (if Hypothesis B): change toBeGreaterThan(prev.uptime) → toBeGreaterThanOrEqual(prev.uptime) in HeartbeatPropagation.integration.spec.mjs. Cleaner: use a different monotonic-progress signal (e.g., sample timestamp) rather than server-reported uptime.
Acceptance Criteria
Out of Scope
- CrossTenantIsolation alice-isError bug (filed as separate sibling ticket).
- Broader uptime-tracking refactor in HealthService.
Related
- Surfacing context: Lane C CI run 25511367136 integration job.
- Predecessor substrate work: #10915 → #10916 (per-session McpServer factory — possibly causative per Hypothesis A).
- Originating spec: #10896 → #10898 Lane B (Gemini authored the heartbeat propagation spec).
- Skip-guard PR: TBD (filed concurrently with this ticket).
Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571
Retrieval Hint: query_raw_memories(query="HeartbeatPropagation uptime equality consecutive samples Lane C deferred application")
Context
Surfaced 2026-05-07 by Lane C #10899 integration row CI on rebased head
56ed56e92after the full substrate-fix cascade (#10904 + #10914 + #10916) merged.HeartbeatPropagation.integration.spec.mjs:11fails because two consecutive samples report identicaluptime: 0.362441034values, breaking thetoBeGreaterThan(prev.uptime)monotonic check.Per agreement with @tobiu, deferring this with a
NEO_TEST_SKIP_CIskip-guard to unblock Lane C, with this dedicated ticket for separate investigation.The Problem
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs:11:5› "Sustained healthcheck property assertions (30s window)"Symptom:
Two consecutive
healthchecksamples 1s apart return the EXACT sameuptimefield value (0.362441034).What's surprising:
HealthService.mjs:764(MC) andHealthService.mjs:197(KB) implementuptime: process.uptime()— Node.js's wall-clock-since-process-start. Should ALWAYS advance.Two Hypotheses
Hypothesis A: Per-session McpServer regression from #10916
#10916 introduced per-session
McpServerfactory pattern. EachcallHealthcheckcall creates a NEW client → connects → triggerscreateMcpServer()→ new McpServer instance. If the healthcheck handler is somehow capturing process.uptime() at HANDLER REGISTRATION time (when McpServer is created) rather than CALL time, two close-in-time sessions might return cached values.Counter-evidence: the basic healthcheck.spec.mjs PASSES, and the Sustained-liveness 5s/1s composability check (against the SAME healthcheck) also PASSES. So uptime values must be advancing for those at least.
Hypothesis B: Test-spec strictness / timing edge case
toBeGreaterThanis strict; two samples within sub-microsecond timing could potentially return identicalprocess.uptime()values on slow CI hardware.toBeGreaterThanOrEqualwould be more forgiving.Counter-evidence: the value
0.362441034has 9 decimal places (nanosecond precision) — sub-microsecond timing collisions should be vanishingly rare even on slow CI.The Architectural Reality
test/playwright/integration/HeartbeatPropagation.integration.spec.mjs— Lane B's spec from #10898. ThecheckPropertiescallback comparessample.uptimeagainstpreviousSamples[length - 2].uptime.test/playwright/integration/util/assertSustainedHealth.mjs— invokesonSampleafter each probe.HealthService.mjs:764(MC) —uptime: process.uptime().The Fix (Investigation Required)
Phase 1: confirm which hypothesis (A or B) by adding diagnostic logging — print all
samples[i].uptimevalues before the failing assertion. If the SAME0.362441034appears for samples 1, 2, 3, ... → Hypothesis A (per-session capture). If only 1 and 2 collide and others advance → Hypothesis B (test strictness).Phase 2A (if Hypothesis A): fix the per-session McpServer healthcheck capture pattern in #10916's substrate. Likely
HealthService.mjsreading uptime at handler registration instead of call.Phase 2B (if Hypothesis B): change
toBeGreaterThan(prev.uptime)→toBeGreaterThanOrEqual(prev.uptime)inHeartbeatPropagation.integration.spec.mjs. Cleaner: use a different monotonic-progress signal (e.g., sample timestamp) rather than server-reported uptime.Acceptance Criteria
Out of Scope
Related
Origin Session ID:
7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571Retrieval Hint:
query_raw_memories(query="HeartbeatPropagation uptime equality consecutive samples Lane C deferred application")