Context
PR #10997 (M6 github-workflow SDK migration, merged 2026-05-08) introduced TWO broken relative-import paths in ai/services/github-workflow/toolService.mjs that survived all 39 cited unit tests:
ToolService import (line 11): stale 3-dot path resolving to repo-root → caught + fixed in PR #11104 (#11103) merged 12:57Z 2026-05-10
openapi.yaml path (line 16): stale 1-up path resolving to non-existent ai/services/openapi.yaml → caught + fixed in PR #11106 (operator-approved piggyback under #11105 scope) currently APPROVED + CLEAN
Two co-resident bugs, same migration, same file. CI green throughout. The bugs only surfaced empirically when:
- Operator @tobiu detected
mcp__neo-mjs-github-workflow__* tools missing post-harness-restart
- I invoked
/self-repair Phase 1 step 4 (native terminal boot-test) — surfaced bug #1 in <30 seconds
- Operator surfaced bug #2 from his post-merge review of the substrate path-resolution
The Problem
test/playwright/unit/ai/services/github-workflow/ has 6 spec files for individual reusable services (DiscussionService, IssueService, IssueSyncer, LabelService, PullRequestService, SyncService.Stage2). None test the wiring layer (toolService.mjs aggregator) NOR the MCP server boot path.
Compare to sibling structure: test/playwright/unit/ai/mcp/server/ has BaseServer.spec.mjs (23KB) + per-server subdirs (memory-core has 8 spec files). github-workflow has NO subdir under test/playwright/unit/ai/mcp/server/.
This systematic test gap lets wiring bugs survive PR review:
- Per-service unit tests import service classes directly (IssueService, etc.) — they don't exercise the aggregator
- Static diff review can miss path-resolution semantics that fail only at module-load time
- Boot smoke test (just-running
node mcp-server.mjs) catches BOOT crashes but misses lazy-load failures (the openapi.yaml bug fell into THIS gap — ToolService loads openapi lazily on first listTools()/callTool())
@tobiu's M6 4-item retrospective (2026-05-10) framed it directly:
"if a breaking import path keeps our CI tests green, we are missing tests for gh workflow"
The Fix
Add 3 layers of test coverage to test/playwright/unit/ai/mcp/server/github-workflow/:
Boot smoke test (Boot.spec.mjs): spawn node ai/mcp/server/github-workflow/mcp-server.mjs as child process, assert it survives a 5-second alive-probe. Catches the ToolService import bug class (boot-time module-resolution failures).
Wiring test (ToolService.spec.mjs): import toolService.mjs, call listTools(), assert the returned list matches the serviceMapping keys + the openapi.yaml content. Catches the openapi path bug class (lazy-load module-resolution failures + serviceMapping/openapi drift).
Integration test (MCPHandshake.spec.mjs): spawn the MCP server, perform a JSON-RPC handshake via stdio, call a representative listTools + callTool round-trip. Catches downstream-of-wiring failures (e.g., openapi parse errors, serviceMapping handler binding failures).
Mirror sibling pattern: see test/playwright/unit/ai/mcp/server/memory-core/ for the convention. Same 3-layer shape would have caught BOTH PR #10997 bugs at PR-review time.
Acceptance Criteria
Out of Scope
- Test coverage for OTHER M6-migrated MCP servers (memory-core, knowledge-base, neural-link) — they have existing test directories + their toolService.mjs files weren't broken in the M6 migration. Consider parallel audit in a separate ticket if needed.
- E2E / whitebox tests (those run via
whitebox-e2e skill; this ticket is unit-test-substrate scope)
- Refactoring the
ToolService aggregator pattern (see #11107 for the architectural-move follow-up)
- CI gating policy changes (e.g., requiring all 3 spec files to pass before allowing merge)
Avoided Traps / Gold Standards Rejected
Decision Matrix
3-layer test coverage at test/playwright/unit/ai/mcp/server/github-workflow/ (Selected): Mirrors the sibling pattern (memory-core has 8 spec files, knowledge-base has its own subdir). Each layer catches a distinct bug class (boot vs lazy-load vs handshake). Lowest-risk addition; uses existing test harness conventions.
Boot smoke test only: Rejected. Catches the ToolService import bug class but misses the openapi lazy-load bug class. PR #11104 demonstrates the gap: my 3s alive-probe was a primitive boot smoke test that passed even though tool calls would fail.
Integration test only (MCP handshake): Rejected. Comprehensive but slow; high-friction for fast-CI cycles. Layered approach (boot + wiring + integration) lets fast unit-style probes catch the common wiring bugs without paying integration-test latency on every PR.
CI lint for relative-import-path-correctness: Rejected as substrate-quality CI scope-creep. Real bugs slip through ANY lint; empirical execution is the gold-standard verifier. Linting could be a follow-up but doesn't substitute for these 3 tests.
Add tests at test/playwright/unit/ai/services/github-workflow/: Rejected. The bugs are at the MCP-server-wiring layer, not the SDK-service-class layer. Sibling pattern places wiring tests under test/playwright/unit/ai/mcp/server/<name>/. Per-service unit tests exist already at the SDK layer; they're not the gap.
Trap: Treating CI-passing as boot-correctness. Rejection: PR #10997 had 39 unit tests passing while shipping 2 broken paths. Per-service tests don't exercise wiring; static diff review missed the path-resolution semantics. Empirical boot-test is the gate that closes this gap.
Trap: Adding tests AFTER bugs surface, not before. Rejection: True for #11103/#11104/#11106 — bugs already merged + fixed. But the discipline is: once a bug class is identified, capture it as a test BEFORE the next migration. Per @tobiu's "if a breaking import path keeps our CI tests green, we are missing tests" framing — this ticket converts that friction to gold for the next M-migration cycle.
Related
- M6 migration PR #10997 (origin of both bugs)
- PR #11104 (#11103) — first surgical fix (
ToolService import)
- PR #11106 (#11105) — second surgical fix (openapi.yaml piggyback)
- #11107 — architectural follow-up (move toolService.mjs back to MCP server dirs; obsoletes the surgical fixes)
- AGENTS.md §3.5 V-B-A core value (the empirical boot-test pattern is the V-B-A primitive applied to MCP-server-wiring)
- @tobiu's M6 4-item retrospective (this session, 2026-05-10)
feedback_symmetric_spec_cleanup MEMORY.md entry (sibling pattern: cross-singleton mutation symmetric resets — same family of test-discipline)
feedback_run_related_tests_in_pr_review (sibling: empirical-checkout substrate-contention rationale; this ticket adds the substrate that makes empirical-checkout meaningful)
Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1
Retrieval Hint: "github-workflow MCP test gap", "wiring-layer aggregator test", "boot smoke test + lazy-load test + handshake test"
Context
PR #10997 (M6 github-workflow SDK migration, merged 2026-05-08) introduced TWO broken relative-import paths in
ai/services/github-workflow/toolService.mjsthat survived all 39 cited unit tests:ToolServiceimport (line 11): stale 3-dot path resolving to repo-root → caught + fixed in PR #11104 (#11103) merged 12:57Z 2026-05-10openapi.yamlpath (line 16): stale 1-up path resolving to non-existentai/services/openapi.yaml→ caught + fixed in PR #11106 (operator-approved piggyback under #11105 scope) currently APPROVED + CLEANTwo co-resident bugs, same migration, same file. CI green throughout. The bugs only surfaced empirically when:
mcp__neo-mjs-github-workflow__*tools missing post-harness-restart/self-repairPhase 1 step 4 (native terminal boot-test) — surfaced bug #1 in <30 secondsThe Problem
test/playwright/unit/ai/services/github-workflow/has 6 spec files for individual reusable services (DiscussionService, IssueService, IssueSyncer, LabelService, PullRequestService, SyncService.Stage2). None test the wiring layer (toolService.mjsaggregator) NOR the MCP server boot path.Compare to sibling structure:
test/playwright/unit/ai/mcp/server/hasBaseServer.spec.mjs(23KB) + per-server subdirs (memory-core has 8 spec files). github-workflow has NO subdir undertest/playwright/unit/ai/mcp/server/.This systematic test gap lets wiring bugs survive PR review:
node mcp-server.mjs) catches BOOT crashes but misses lazy-load failures (the openapi.yaml bug fell into THIS gap —ToolServiceloads openapi lazily on firstlistTools()/callTool())@tobiu's M6 4-item retrospective (2026-05-10) framed it directly:
The Fix
Add 3 layers of test coverage to
test/playwright/unit/ai/mcp/server/github-workflow/:Boot smoke test (
Boot.spec.mjs): spawnnode ai/mcp/server/github-workflow/mcp-server.mjsas child process, assert it survives a 5-second alive-probe. Catches theToolServiceimport bug class (boot-time module-resolution failures).Wiring test (
ToolService.spec.mjs): importtoolService.mjs, calllistTools(), assert the returned list matches theserviceMappingkeys + theopenapi.yamlcontent. Catches the openapi path bug class (lazy-load module-resolution failures + serviceMapping/openapi drift).Integration test (
MCPHandshake.spec.mjs): spawn the MCP server, perform a JSON-RPC handshake via stdio, call a representativelistTools+callToolround-trip. Catches downstream-of-wiring failures (e.g., openapi parse errors, serviceMapping handler binding failures).Mirror sibling pattern: see
test/playwright/unit/ai/mcp/server/memory-core/for the convention. Same 3-layer shape would have caught BOTH PR #10997 bugs at PR-review time.Acceptance Criteria
test/playwright/unit/ai/mcp/server/github-workflow/Boot.spec.mjs— 5s alive-probe + clean exit on SIGTERMtest/playwright/unit/ai/mcp/server/github-workflow/ToolService.spec.mjs—listTools()returns expected tool names;openapi.yamlresolution matchesai/mcp/server/github-workflow/openapi.yamltest/playwright/unit/ai/mcp/server/github-workflow/MCPHandshake.spec.mjs— JSON-RPCtools/list+ at least onetools/callround-trip succeedsnpm run test-unit(no special harness configuration)Boot.spec.mjsfails. Temporarily revert PR #11106's openapi fix → confirmToolService.spec.mjsfails. Both bugs would have been caught.Out of Scope
whitebox-e2eskill; this ticket is unit-test-substrate scope)ToolServiceaggregator pattern (see #11107 for the architectural-move follow-up)Avoided Traps / Gold Standards Rejected
Decision Matrix
3-layer test coverage at
test/playwright/unit/ai/mcp/server/github-workflow/(Selected): Mirrors the sibling pattern (memory-core has 8 spec files, knowledge-base has its own subdir). Each layer catches a distinct bug class (boot vs lazy-load vs handshake). Lowest-risk addition; uses existing test harness conventions.Boot smoke test only: Rejected. Catches the
ToolServiceimport bug class but misses the openapi lazy-load bug class. PR #11104 demonstrates the gap: my 3s alive-probe was a primitive boot smoke test that passed even though tool calls would fail.Integration test only (MCP handshake): Rejected. Comprehensive but slow; high-friction for fast-CI cycles. Layered approach (boot + wiring + integration) lets fast unit-style probes catch the common wiring bugs without paying integration-test latency on every PR.
CI lint for relative-import-path-correctness: Rejected as substrate-quality CI scope-creep. Real bugs slip through ANY lint; empirical execution is the gold-standard verifier. Linting could be a follow-up but doesn't substitute for these 3 tests.
Add tests at
test/playwright/unit/ai/services/github-workflow/: Rejected. The bugs are at the MCP-server-wiring layer, not the SDK-service-class layer. Sibling pattern places wiring tests undertest/playwright/unit/ai/mcp/server/<name>/. Per-service unit tests exist already at the SDK layer; they're not the gap.Trap: Treating CI-passing as boot-correctness. Rejection: PR #10997 had 39 unit tests passing while shipping 2 broken paths. Per-service tests don't exercise wiring; static diff review missed the path-resolution semantics. Empirical boot-test is the gate that closes this gap.
Trap: Adding tests AFTER bugs surface, not before. Rejection: True for #11103/#11104/#11106 — bugs already merged + fixed. But the discipline is: once a bug class is identified, capture it as a test BEFORE the next migration. Per @tobiu's "if a breaking import path keeps our CI tests green, we are missing tests" framing — this ticket converts that friction to gold for the next M-migration cycle.
Related
ToolServiceimport)feedback_symmetric_spec_cleanupMEMORY.md entry (sibling pattern: cross-singleton mutation symmetric resets — same family of test-discipline)feedback_run_related_tests_in_pr_review(sibling: empirical-checkout substrate-contention rationale; this ticket adds the substrate that makes empirical-checkout meaningful)Origin Session ID: c2912891-b459-4a03-b2af-154d5e264df1 Retrieval Hint: "github-workflow MCP test gap", "wiring-layer aggregator test", "boot smoke test + lazy-load test + handshake test"