LearnNewsExamplesServices
Frontmatter
id10681
titleMock osascript in resumeHarness/bridge-daemon unit tests (host-environment side-effect bug)
stateClosed
labels
bugaitestingregressionarchitecture
assigneesneo-opus-4-7
createdAtMay 4, 2026, 11:17 AM
updatedAtMay 4, 2026, 11:48 AM
githubUrlhttps://github.com/neomjs/neo/issues/10681
authorneo-opus-4-7
commentsCount2
parentIssue10671
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 4, 2026, 11:48 AM

Mock osascript in resumeHarness/bridge-daemon unit tests (host-environment side-effect bug)

Closedbugaitestingregressionarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 4, 2026, 11:17 AM

Context

Sub-issue of #10671. Unit tests for resumeHarness.mjs (and likely bridge-daemon.mjs) invoke the real script via execFileSync / spawnSync, which calls real osascript against the live host. On a Mac with Claude Desktop / Antigravity / Codex installed AND System Events accessibility permission granted, osascript succeeds and actually pastes the boot-grounding prompt into the live app, spawning real new sessions.

Empirical anchor (2026-05-04): @neo-gemini-3-1-pro ran npm run test-unit test/playwright/unit/ai/scripts/resumeHarness.spec.mjs while implementing PR #10680 (her #10678 Antigravity terminal-restart investigation track). The test suite spawned 2 phantom Claude Desktop sessions on @tobiu's host with the boot-grounding prompt as their first user message ("Recovery context: test" — matching the 'test' reason argument the test passes). Confirmed in same-session A2A.

Operator framing: "imagine a real night shift. one of you starts 'all' unit tests and game over." Autonomous overnight operation + this test = catastrophic spawn storm.

The Problem

The test pattern at test/playwright/unit/ai/scripts/resumeHarness.spec.mjs:

  • Line 64: execFileSync('node', [scriptPath, '@neo-unknown', 'test'], {env: overrideEnv}) — unknown identity exits early before osascript, safe
  • Line 74: execFileSync('node', [scriptPath, '@neo-opus-4-7', 'test'], {env: overrideEnv})UNSAFE: known identity → osascript fires → real Claude Desktop paste
  • Line 94: spawnSync('node', [scriptPath, '@neo-opus-4-7', 'test'], {env: gateOnlyEnv()}) — gate-tripped path, safe (gate blocks before osascript)
  • Line 113: spawnSync('node', [scriptPath, '@neo-opus-4-7', 'test'], {env: <gate-enabled>})UNSAFE: gate-enabled path → osascript fires → real paste

overrideEnv (line 45) explicitly sets WAKE_GATE_OVERRIDE: '1' to bypass the gate, which is correct for testing the post-gate code paths but unsafe when those paths hit real osascript.

The expected output check expect(output).toContain('Failed to resume...') on line 78 only triggers if osascript FAILS — on a host where it succeeds, the test passes silently and the side effect lands.

bridge-daemon.spec.mjs exists at the same directory level and likely has analogous patterns (audit needed in implementation).

The Architectural Reality

Test isolation primitive options:

  1. Stub spawnAsync at the resumeHarness module level — inject a mock dispatcher in unit-test mode (NEO_UNIT_TEST_MODE=1) so osascript args are captured + asserted but not executed
  2. Mock at the subprocess level — replace osascript PATH with a recording stub for the test duration
  3. Gate behind RUN_LIVE_OSASCRIPT=1 env var — skip the dangerous test path by default; CI / explicit isolation runs can opt in
  4. Move the osascript-routing assertion to a non-execution layer — the test on line 84 (expect(scriptContent).toContain("...freshSessionShortcut: 'n'...")) already covers config-routing structurally without needing to run osascript

Option 4 (eliminate the unsafe execution entirely; rely on structural assertion) is the cleanest. Options 1-3 retain the live-execution variant for a CI-only mode.

The Fix

  • Audit all execFileSync / spawnSync / spawnAsync invocations in test/playwright/unit/ai/scripts/ and test/playwright/unit/ai/daemons/ that call resumeHarness/bridge-daemon scripts
  • For each unsafe site: replace real-osascript execution with mock capture, OR move the assertion to structural inspection of script source (option 4 pattern)
  • Add a test-suite-level invariant: NO test path may invoke real osascript/open/pkill against host applications without RUN_LIVE_OSASCRIPT=1 explicit opt-in
  • Document the test-isolation discipline in learn/agentos/ or equivalent

Acceptance Criteria

  • (AC1) npm run test-unit against resumeHarness.spec.mjs produces ZERO host-environment side effects (no Claude Desktop paste, no Antigravity paste, no Codex thread spawn) on a Mac with full accessibility permissions
  • (AC2) bridge-daemon.spec.mjs audited; same invariant holds
  • (AC3) Test suite coverage for the osascript-routing logic is preserved (post-fix coverage matches or exceeds pre-fix)
  • (AC4) Test invariant documented: real-host execution requires explicit RUN_LIVE_OSASCRIPT=1 opt-in; CI isolation / production guard documented
  • (AC5) PR includes a short prose note in the body explaining the host-side-effect failure mode + the empirical anchor (this ticket's reproducer)

Out of Scope

  • Rewriting the test runner architecture
  • Migrating other Playwright unit tests not related to harness/daemon scripts
  • Adding RUN_LIVE_OSASCRIPT runner support to CI/CD pipelines (separate work; this ticket only ensures default-off behavior)

Avoided Traps

  • Trap: "the test EXPECTS osascript to fail anyway." Empirically false — on hosts where osascript succeeds, the test passes silently while the side effect lands; the failure-expectation only catches restricted-permission CI environments.
  • Trap: "just run tests in CI, never on developer hosts." Doesn't survive contact with reality — agents working on this substrate empirically run unit tests during investigation (#10678 / #10679 ongoing), and developer-host spawn storms compound across the trio.
  • Trap: "structural assertion alone is insufficient." Line 84 already does structural assertion of the config; the live-osascript invocation adds little additional coverage but enormous side-effect risk.
  • Trap: "WAKE_GATE_OVERRIDE in test-mode is the bug." It's not — the gate-override is correct for testing the post-gate code paths. The bug is calling REAL osascript while bypassing the gate. Fix is at the osascript layer, not the gate-override layer.

Related

  • Parent: #10671
  • Sibling forensic: #10672 (this issue extends the runaway-spawn forensic record with the test-suite vector)
  • Empirical trigger: PR #10680 (@neo-gemini-3-1-pro's #10678 Antigravity track) — running resumeHarness.spec.mjs during her implementation work caused the 2026-05-04 09:03Z spawn event
  • Adjacent in-flight lock validation: Gemini's antigravity chat -n natively crashed her MCP servers via parallel-init port collisions — empirically validates #10674 in-flight lock primitive

Origin Session ID: cce1fea5-32ff-410c-b820-2e9a27b3cd51

Retrieval Hint: query_summaries("mock osascript unit test host environment side effects spawn storm") + query_raw_memories("test/playwright/unit/ai/scripts/resumeHarness.spec real osascript paste live Claude Desktop")

tobiu referenced in commit 66adb76 - "fix(ai): default-skip live osascript runtime tests (#10681) (#10682) on May 4, 2026, 11:48 AM
tobiu closed this issue on May 4, 2026, 11:48 AM
tobiu referenced in commit 2156026 - "docs(ai): forensic record for 2026-05-03 runaway-spawn pattern (#10672) (#10688) on May 4, 2026, 3:41 PM
tobiu referenced in commit 255f9ef - "feat(ai): claude-cli adapter for Claude Desktop terminal-restart (#10677) (#10696) on May 4, 2026, 7:20 PM