LearnNewsExamplesServices
Frontmatter
id11110
titlegithub-workflow MCP needs systematic boot + wiring + integration test coverage
stateClosed
labels
enhancementaimodel-experience
assigneesneo-opus-ada
createdAtMay 10, 2026, 3:35 PM
updatedAtJun 3, 2026, 3:17 AM
githubUrlhttps://github.com/neomjs/neo/issues/11110
authorneo-opus-ada
commentsCount3
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtJun 3, 2026, 3:17 AM

github-workflow MCP needs systematic boot + wiring + integration test coverage

Closed Backlog/active-chunk-11 enhancementaimodel-experience
neo-opus-ada
neo-opus-ada commented on May 10, 2026, 3:35 PM

Context

PR #10997 (M6 github-workflow SDK migration, merged 2026-05-08) introduced TWO broken relative-import paths in ai/services/github-workflow/toolService.mjs that survived all 39 cited unit tests:

  1. ToolService import (line 11): stale 3-dot path resolving to repo-root → caught + fixed in PR #11104 (#11103) merged 12:57Z 2026-05-10
  2. openapi.yaml path (line 16): stale 1-up path resolving to non-existent ai/services/openapi.yaml → caught + fixed in PR #11106 (operator-approved piggyback under #11105 scope) currently APPROVED + CLEAN

Two co-resident bugs, same migration, same file. CI green throughout. The bugs only surfaced empirically when:

  • Operator @tobiu detected mcp__neo-mjs-github-workflow__* tools missing post-harness-restart
  • I invoked /self-repair Phase 1 step 4 (native terminal boot-test) — surfaced bug #1 in <30 seconds
  • Operator surfaced bug #2 from his post-merge review of the substrate path-resolution

The Problem

test/playwright/unit/ai/services/github-workflow/ has 6 spec files for individual reusable services (DiscussionService, IssueService, IssueSyncer, LabelService, PullRequestService, SyncService.Stage2). None test the wiring layer (toolService.mjs aggregator) NOR the MCP server boot path.

Compare to sibling structure: test/playwright/unit/ai/mcp/server/ has BaseServer.spec.mjs (23KB) + per-server subdirs (memory-core has 8 spec files). github-workflow has NO subdir under test/playwright/unit/ai/mcp/server/.

This systematic test gap lets wiring bugs survive PR review:

  • Per-service unit tests import service classes directly (IssueService, etc.) — they don't exercise the aggregator
  • Static diff review can miss path-resolution semantics that fail only at module-load time
  • Boot smoke test (just-running node mcp-server.mjs) catches BOOT crashes but misses lazy-load failures (the openapi.yaml bug fell into THIS gap — ToolService loads openapi lazily on first listTools()/callTool())

@tobiu's M6 4-item retrospective (2026-05-10) framed it directly:

"if a breaking import path keeps our CI tests green, we are missing tests for gh workflow"

The Fix

Add 3 layers of test coverage to test/playwright/unit/ai/mcp/server/github-workflow/:

  1. Boot smoke test (Boot.spec.mjs): spawn node ai/mcp/server/github-workflow/mcp-server.mjs as child process, assert it survives a 5-second alive-probe. Catches the ToolService import bug class (boot-time module-resolution failures).

  2. Wiring test (ToolService.spec.mjs): import toolService.mjs, call listTools(), assert the returned list matches the serviceMapping keys + the openapi.yaml content. Catches the openapi path bug class (lazy-load module-resolution failures + serviceMapping/openapi drift).

  3. Integration test (MCPHandshake.spec.mjs): spawn the MCP server, perform a JSON-RPC handshake via stdio, call a representative listTools + callTool round-trip. Catches downstream-of-wiring failures (e.g., openapi parse errors, serviceMapping handler binding failures).

Mirror sibling pattern: see test/playwright/unit/ai/mcp/server/memory-core/ for the convention. Same 3-layer shape would have caught BOTH PR #10997 bugs at PR-review time.

Acceptance Criteria

  • test/playwright/unit/ai/mcp/server/github-workflow/Boot.spec.mjs — 5s alive-probe + clean exit on SIGTERM
  • test/playwright/unit/ai/mcp/server/github-workflow/ToolService.spec.mjslistTools() returns expected tool names; openapi.yaml resolution matches ai/mcp/server/github-workflow/openapi.yaml
  • test/playwright/unit/ai/mcp/server/github-workflow/MCPHandshake.spec.mjs — JSON-RPC tools/list + at least one tools/call round-trip succeeds
  • All 3 specs run as part of npm run test-unit (no special harness configuration)
  • Validation: temporarily revert PR #11104's import fix → confirm Boot.spec.mjs fails. Temporarily revert PR #11106's openapi fix → confirm ToolService.spec.mjs fails. Both bugs would have been caught.

Out of Scope

  • Test coverage for OTHER M6-migrated MCP servers (memory-core, knowledge-base, neural-link) — they have existing test directories + their toolService.mjs files weren't broken in the M6 migration. Consider parallel audit in a separate ticket if needed.
  • E2E / whitebox tests (those run via whitebox-e2e skill; this ticket is unit-test-substrate scope)
  • Refactoring the ToolService aggregator pattern (see #11107 for the architectural-move follow-up)
  • CI gating policy changes (e.g., requiring all 3 spec files to pass before allowing merge)

Avoided Traps / Gold Standards Rejected

Decision Matrix

  1. 3-layer test coverage at test/playwright/unit/ai/mcp/server/github-workflow/ (Selected): Mirrors the sibling pattern (memory-core has 8 spec files, knowledge-base has its own subdir). Each layer catches a distinct bug class (boot vs lazy-load vs handshake). Lowest-risk addition; uses existing test harness conventions.

  2. Boot smoke test only: Rejected. Catches the ToolService import bug class but misses the openapi lazy-load bug class. PR #11104 demonstrates the gap: my 3s alive-probe was a primitive boot smoke test that passed even though tool calls would fail.

  3. Integration test only (MCP handshake): Rejected. Comprehensive but slow; high-friction for fast-CI cycles. Layered approach (boot + wiring + integration) lets fast unit-style probes catch the common wiring bugs without paying integration-test latency on every PR.

  4. CI lint for relative-import-path-correctness: Rejected as substrate-quality CI scope-creep. Real bugs slip through ANY lint; empirical execution is the gold-standard verifier. Linting could be a follow-up but doesn't substitute for these 3 tests.

  5. Add tests at test/playwright/unit/ai/services/github-workflow/: Rejected. The bugs are at the MCP-server-wiring layer, not the SDK-service-class layer. Sibling pattern places wiring tests under test/playwright/unit/ai/mcp/server/<name>/. Per-service unit tests exist already at the SDK layer; they're not the gap.

  • Trap: Treating CI-passing as boot-correctness. Rejection: PR #10997 had 39 unit tests passing while shipping 2 broken paths. Per-service tests don't exercise wiring; static diff review missed the path-resolution semantics. Empirical boot-test is the gate that closes this gap.

  • Trap: Adding tests AFTER bugs surface, not before. Rejection: True for #11103/#11104/#11106 — bugs already merged + fixed. But the discipline is: once a bug class is identified, capture it as a test BEFORE the next migration. Per @tobiu's "if a breaking import path keeps our CI tests green, we are missing tests" framing — this ticket converts that friction to gold for the next M-migration cycle.

Related

  • M6 migration PR #10997 (origin of both bugs)
  • PR #11104 (#11103) — first surgical fix (ToolService import)
  • PR #11106 (#11105) — second surgical fix (openapi.yaml piggyback)
  • #11107 — architectural follow-up (move toolService.mjs back to MCP server dirs; obsoletes the surgical fixes)
  • AGENTS.md §3.5 V-B-A core value (the empirical boot-test pattern is the V-B-A primitive applied to MCP-server-wiring)
  • @tobiu's M6 4-item retrospective (this session, 2026-05-10)
  • feedback_symmetric_spec_cleanup MEMORY.md entry (sibling pattern: cross-singleton mutation symmetric resets — same family of test-discipline)
  • feedback_run_related_tests_in_pr_review (sibling: empirical-checkout substrate-contention rationale; this ticket adds the substrate that makes empirical-checkout meaningful)

Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1 Retrieval Hint: "github-workflow MCP test gap", "wiring-layer aggregator test", "boot smoke test + lazy-load test + handshake test"