LearnNewsExamplesServices
Frontmatter
id10918
titleHeartbeatPropagation uptime equality: consecutive samples report identical process.uptime() values
stateClosed
labels
bugaitesting
assigneesneo-opus-4-7
createdAtMay 7, 2026, 7:31 PM
updatedAtMay 9, 2026, 11:15 PM
githubUrlhttps://github.com/neomjs/neo/issues/10918
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 7, 2026, 8:20 PM

HeartbeatPropagation uptime equality: consecutive samples report identical process.uptime() values

Closedbugaitesting
neo-opus-4-7
neo-opus-4-7 commented on May 7, 2026, 7:31 PM

Context

Surfaced 2026-05-07 by Lane C #10899 integration row CI on rebased head 56ed56e92 after the full substrate-fix cascade (#10904 + #10914 + #10916) merged. HeartbeatPropagation.integration.spec.mjs:11 fails because two consecutive samples report identical uptime: 0.362441034 values, breaking the toBeGreaterThan(prev.uptime) monotonic check.

Per agreement with @tobiu, deferring this with a NEO_TEST_SKIP_CI skip-guard to unblock Lane C, with this dedicated ticket for separate investigation.

The Problem

test/playwright/integration/HeartbeatPropagation.integration.spec.mjs:11:5"Sustained healthcheck property assertions (30s window)"

Symptom:

Error: assertSustainedHealth: Exceeded max consecutive failures (0). Last error:
expect(received).toBeGreaterThan(expected)
Expected: > 0.362441034
Received:   0.362441034

Two consecutive healthcheck samples 1s apart return the EXACT same uptime field value (0.362441034).

What's surprising:

  • Both HealthService.mjs:764 (MC) and HealthService.mjs:197 (KB) implement uptime: process.uptime() — Node.js's wall-clock-since-process-start. Should ALWAYS advance.
  • 0.362 seconds is a suspiciously LOW uptime — suggests the process JUST started, OR something is capturing/freezing the value between calls.

Two Hypotheses

Hypothesis A: Per-session McpServer regression from #10916

#10916 introduced per-session McpServer factory pattern. Each callHealthcheck call creates a NEW client → connects → triggers createMcpServer() → new McpServer instance. If the healthcheck handler is somehow capturing process.uptime() at HANDLER REGISTRATION time (when McpServer is created) rather than CALL time, two close-in-time sessions might return cached values.

Counter-evidence: the basic healthcheck.spec.mjs PASSES, and the Sustained-liveness 5s/1s composability check (against the SAME healthcheck) also PASSES. So uptime values must be advancing for those at least.

Hypothesis B: Test-spec strictness / timing edge case

toBeGreaterThan is strict; two samples within sub-microsecond timing could potentially return identical process.uptime() values on slow CI hardware. toBeGreaterThanOrEqual would be more forgiving.

Counter-evidence: the value 0.362441034 has 9 decimal places (nanosecond precision) — sub-microsecond timing collisions should be vanishingly rare even on slow CI.

The Architectural Reality

The Fix (Investigation Required)

Phase 1: confirm which hypothesis (A or B) by adding diagnostic logging — print all samples[i].uptime values before the failing assertion. If the SAME 0.362441034 appears for samples 1, 2, 3, ... → Hypothesis A (per-session capture). If only 1 and 2 collide and others advance → Hypothesis B (test strictness).

Phase 2A (if Hypothesis A): fix the per-session McpServer healthcheck capture pattern in #10916's substrate. Likely HealthService.mjs reading uptime at handler registration instead of call.

Phase 2B (if Hypothesis B): change toBeGreaterThan(prev.uptime)toBeGreaterThanOrEqual(prev.uptime) in HeartbeatPropagation.integration.spec.mjs. Cleaner: use a different monotonic-progress signal (e.g., sample timestamp) rather than server-reported uptime.

Acceptance Criteria

  • Phase 1 diagnostic completes: identify which hypothesis is correct.
  • Phase 2 fix lands based on diagnostic evidence.
  • HeartbeatPropagation integration spec passes in CI.

Out of Scope

  • CrossTenantIsolation alice-isError bug (filed as separate sibling ticket).
  • Broader uptime-tracking refactor in HealthService.

Related

  • Surfacing context: Lane C CI run 25511367136 integration job.
  • Predecessor substrate work: #10915#10916 (per-session McpServer factory — possibly causative per Hypothesis A).
  • Originating spec: #10896#10898 Lane B (Gemini authored the heartbeat propagation spec).
  • Skip-guard PR: TBD (filed concurrently with this ticket).

Origin Session ID: 7e897a0b-33ce-4d6c-b1a9-a1ff93e4e571

Retrieval Hint: query_raw_memories(query="HeartbeatPropagation uptime equality consecutive samples Lane C deferred application")

tobiu referenced in commit ad04351 - "test(integration): defer 2 application-spec failures pending investigation (#10917) (#10919) on May 7, 2026, 7:58 PM
tobiu referenced in commit 4fb4bca - "feat(ci): test matrix workflow gating PRs on unit + integration suites (#10897) (#10899) on May 7, 2026, 8:19 PM
tobiu referenced in commit 2e66670 - "test: fix HeartbeatPropagation test strict uptime assertion (#10918) (#10920) on May 7, 2026, 8:20 PM
tobiu closed this issue on May 7, 2026, 8:20 PM