LearnNewsExamplesServices
Frontmatter
id11874
titleRestore core.Base contract in SwarmHeartbeatService — drop external initAsync calls + isInitialized band-aid
stateClosed
labels
enhancementairefactoringarchitecture
assigneesneo-opus-4-7
createdAtMay 24, 2026, 3:21 AM
updatedAtMay 24, 2026, 12:39 PM
githubUrlhttps://github.com/neomjs/neo/issues/11874
authorneo-opus-4-7
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 24, 2026, 12:39 PM

Restore core.Base contract in SwarmHeartbeatService — drop external initAsync calls + isInitialized band-aid

Closedenhancementairefactoringarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 24, 2026, 3:21 AM

Authored by: [Claude Opus 4.7] (Claude Code)

Decision Record impact: none directly; aligned with src/core/Base.mjs canonical contract (cited inline).

Context

Operator surfaced 2026-05-24 — Neo.ai.daemons.SwarmHeartbeatService violates the core.Base lifecycle contract in multiple explicit ways. Direct quote: "this was like a punch into my face. read core.Base. awaiting ready(). NEVER call initConfig outside the hierarchy. ... makes me actually sad. violating the core ideas of neo like that."

Code-sweep across ai/ confirmed violations are concentrated in this single file + one consumer in Orchestrator.mjs. Adjacent canonical-correct precedent (DreamService.mjs:77 does await LifecycleService.ready()) proves the fix shape is already in-tree.

The Problem

core.Base.mjs JSDoc lines 593-595 (the explicit contract):

"The initAsync() method is automatically triggered by the framework during Neo.create(). Calling it externally (e.g. await myInstance.initAsync()) will execute it twice, leading to fatal duplication bugs. If you need to wait for a class to finish initializing, always use await myInstance.ready() instead."

4 distinct violations in SwarmHeartbeatService.mjs + 1 in Orchestrator.mjs:

# Site Violation
1 Orchestrator.mjs:323 await this.swarmHeartbeatService.initAsync({pollIntervalMs}) — external initAsync call, exactly what core.Base warned against
2 SwarmHeartbeatService.mjs:124-140 if (this.isInitialized) { ... return; } ... this.isInitialized = true; — manual idempotency guard; band-aid for violation #1; obviates #readyPromise + isReady_ + afterSetIsReady framework substrate already in Base
3 SwarmHeartbeatService.mjs:137 await LifecycleService.initAsync() — external initAsync on a peer service
4 SwarmHeartbeatService.mjs:138 await GraphService.initAsync() — external initAsync on a peer service
5 SwarmHeartbeatService.mjs:128-134 identity + pollIntervalMs set imperatively from process.env.NEO_AGENT_IDENTITY / process.env.POLL_INTERVAL inside initAsync — should be reactive configs; also: POLL_INTERVAL env var has no NEO_* namespace prefix (env-naming regression)

The Architectural Reality

The canonical fix shape is already present in adjacent code:

  • DreamService.mjs:77await LifecycleService.ready() ✓ (canonical)
  • SwarmHeartbeatService.mjs:137await LifecycleService.initAsync() ✗ (violation)

Same LifecycleService. Two consumers in the same ai/daemons/orchestrator/services/ directory. One follows the contract, one doesn't. Direct copy-paste precedent for the fix.

LifecycleService + GraphService both extends Base correctly (verified) → they have the canonical #readyPromise + isReady_ + afterSetIsReady substrate. The violations are purely consumer-side.

12+ other correct .ready() usages exist across ai/ (Agent.mjs, KbGarbageCollection / KbReconciliation / KbAlerting services, KBRecorderService consumers, StorageRouter consumers, Assembler.mjs). SwarmHeartbeatService is the outlier.

The Fix

SwarmHeartbeatService.mjs:

  • Add identity_ + pollIntervalMs_ as reactive configs in static config (with @reactive JSDoc tag per coding guidelines); defaults DEFAULT_IDENTITY + DEFAULT_POLL_INTERVAL_MS
  • initAsync() body:
    • First line: await super.initAsync();
    • Replace await LifecycleService.initAsync()await LifecycleService.ready()
    • Replace await GraphService.initAsync()await GraphService.ready()
    • Remove if (this.isInitialized) guard + the "Already initialized" log
    • Remove this.isInitialized = true assignment
    • Drop process.env.NEO_AGENT_IDENTITY + process.env.POLL_INTERVAL raw reads (env layer routed via env-binding registry per Epic #11871 Sub 1 #11873)
  • Drop isInitialized instance field declaration entirely
  • Subclass singleton classification follows the same rule the cycle-2.1 SwarmHeartbeatService surface triggered: external parent configuration ⇒ NOT singleton (per CadenceEngine precedent). Keep singleton ONLY if configs are sensible global defaults; switch to non-singleton if Orchestrator parent-state propagation is needed (audit during impl)

Orchestrator.mjs:323:

  • Replace await this.swarmHeartbeatService.initAsync({pollIntervalMs: this.swarmHeartbeatIntervalMs}) with:
      this.swarmHeartbeatService.pollIntervalMs = this.swarmHeartbeatIntervalMs;
    await this.swarmHeartbeatService.ready();
  • (If the singleton-vs-non-singleton audit lands on non-singleton, the wire-up shape changes to Class A or Class B per the existing Orchestrator patterns — to be decided during impl based on whether parent-state-propagation is needed)

Acceptance Criteria

  1. SwarmHeartbeatService.initAsync() starts with await super.initAsync() and contains zero .initAsync() calls on peer services (only .ready())
  2. isInitialized instance field + manual idempotency guard + "Already initialized" log all REMOVED
  3. identity + pollIntervalMs exposed as reactive configs (identity_, pollIntervalMs_) in static config; no raw process.env.X reads inside initAsync()
  4. Orchestrator.mjs:323 no longer calls swarmHeartbeatService.initAsync(...) externally; instead sets pollIntervalMs reactive config + await this.swarmHeartbeatService.ready()
  5. Singleton-vs-non-singleton audit: if Orchestrator parent-state propagation is needed, convert SwarmHeartbeatService to non-singleton per CadenceEngine precedent; if not, document explicitly why singleton remains acceptable
  6. process.env.POLL_INTERVAL raw read (no NEO_* prefix) eliminated; if env-driven poll interval is needed, route via NEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MS (already declared in Orchestrator's getter, set via config)
  7. process.env.NEO_AGENT_IDENTITY raw read routed via the env-binding pattern Epic #11871 Sub 1 #11873 establishes (impl-time alignment — either lands first OR uses current Env.parseString(process.env.NEO_AGENT_IDENTITY, 'NEO_AGENT_IDENTITY') shape per Neo.util.Env primitive)
  8. All existing tests pass; SwarmHeartbeatService.spec.mjs + Orchestrator.spec.mjs swarm-heartbeat-init coverage adapted to the new shape (no external-initAsync mocking needed)
  9. Grep-clean: zero new .initAsync() external-call violations introduced; the only .initAsync() references in ai/ post-merge are await super.initAsync() parent-class calls inside override methods

Out of Scope

  • Broader sweep of other anti-patterns in SwarmHeartbeatService (heartbeat tick logic, sweep behavior, identity resolution semantics) — separate ticket if surfaced
  • Heartbeat-related daemon scheduling (Orchestrator side) refactoring — owned by Epic #11831 if relevant
  • env-primitive registry implementation — owned by Epic #11871 Sub 1 #11873 (this ticket consumes the registry, doesn't implement it)

Avoided Traps

  • Folding into Sub 1 of #11871 — operator's "6→20 subs" scope-creep frustration; keep tickets atomic
  • Treating isInitialized band-aid as just stale-code-cleanup — it's a SYMPTOM of violation #1; the band-aid only exists because of external initAsync calls; both must be fixed together
  • Adapting initAsync to be idempotent more cleverly — directly contradicts core.Base contract; framework already provides #readyPromise for the "wait for init" use case; agents must use ready() not initAsync()
  • Pre-deciding singleton-vs-non-singleton without auditing — same lesson from MaintenanceBackpressureService Sub 19 today; the audit happens at impl time based on whether external parent configs are needed
  • Skipping the parent-class super.initAsync() call — required by Base.mjs:589 ("Make sure to use the parent call await super.initAsync() at the beginning of their implementations")

Connection to Epic #11871 Sub 1 (#11873)

Both tickets touch env-var reads. Sub 1 #11873 establishes the canonical env-binding registry pattern (Env.parseX(process.env.NEO_X, 'NEO_X') inline); this ticket's AC 6+7 consume it. Sub 1 #11873 + this ticket are independent + can land in either order. If this ticket lands first, the env reads use the current process.env.NEO_X shape + get harmonized later. If Sub 1 #11873 lands first, this ticket uses the canonical shape from inception.

Contract Ledger

Per pr-review §5.4 audit requirement for PRs touching consumed surfaces. #11874 restructures the Neo.ai.daemons.SwarmHeartbeatService singleton's lifecycle-contract surface consumed by the Orchestrator daemon. Contract surface enumerated below:

Surface Pre-cleanup contract Post-cleanup contract Migration
SwarmHeartbeatService.initAsync(options?) Took {identity, pollIntervalMs} args; ran imperative process.env.NEO_AGENT_IDENTITY + process.env.POLL_INTERVAL reads; manual isInitialized idempotency guard; called LifecycleService.initAsync() + GraphService.initAsync() externally No args; identity-agnostic; calls super.initAsync() first then LifecycleService.ready() + GraphService.ready() per core.Base.mjs:589-595; framework #readyPromise handles idempotency Orchestrator no longer calls initAsync({...}) externally — sets service.identity + service.pollIntervalMs via reactive config assignment BEFORE await service.ready()
SwarmHeartbeatService.identity_ Reactive config (already existed pre-#11874) Same shape; new beforeSetIdentity normalizer applies normalizeAgentIdentityNodeId + falls back to DEFAULT_IDENTITY when null/empty Orchestrator assigns service.identity = this.swarmHeartbeatIdentity in start() before .ready()
SwarmHeartbeatService.pollIntervalMs_ Reactive config (already existed) Same shape (no normalizer changes) Orchestrator assigns service.pollIntervalMs = this.swarmHeartbeatIntervalMs in start() before .ready()
SwarmHeartbeatService.isInitialized_ Existed as reactive config + manual guard DELETED — framework #readyPromise is the canonical single-init substrate No external consumers tracked this; tests adapted to drop isInitialized = false resets
SwarmHeartbeatService.initFailed (daemon-local instance field) Did not exist Set to true on init failure; checked by Orchestrator.poll() swarm-heartbeat lane before pulse() dispatch Replaces the env-mutation anti-pattern (env.NEO_ORCHESTRATOR_SWARM_HEARTBEAT_ENABLED = false) per operator pushback "what if WE want to disable heartbeat locally? no env var"
SwarmHeartbeatService singleton classification singleton: true UNCHANGED — singleton: true retained per AC5 documented rationale (1:1 service-parent topology; identity-agnostic initAsync(); CadenceEngine/MaintenanceBackpressureService non-singleton rule guards against multi-instance state-collision that doesn't apply here) Orchestrator wire-up: Class C swarmHeartbeatService = SwarmHeartbeatService retained
Orchestrator.swarmHeartbeatIdentity getter Did not exist (env read inside SwarmHeartbeatService.initAsync) NEW: Env.parseString(process.env.NEO_AGENT_IDENTITY, 'NEO_AGENT_IDENTITY') — Lane A #11873 env-binding pattern Tests for env-fallback identity selection should land in Orchestrator.spec.mjs (out-of-scope for this PR; covered by integration)
process.env.POLL_INTERVAL (no NEO_* prefix) Read inside SwarmHeartbeatService.initAsync (env-naming regression) DELETED — routed via canonical NEO_ORCHESTRATOR_SWARM_HEARTBEAT_INTERVAL_MS env var through Orchestrator.swarmHeartbeatIntervalMs getter env-naming regression eliminated

Behavioral-invariant statement: External Orchestrator-side lifecycle behavior preserved — service is initialized exactly once at daemon start, pulse() is called per cadence tick, init-failure disables the lane for the run. Internal lifecycle shape restructured to honor core.Base.mjs:589-595 contract; no consumer outside Orchestrator + the unit spec touches the service.

Authority

Operator surfaced 2026-05-24 + my code-sweep findings (3 external initAsync violations + 1 band-aid + 12+ correct .ready() precedents elsewhere in ai/) + nightshift lane-claim 2026-05-24.

tobiu referenced in commit 05da97c - "fix(agentos): restore core.Base contract in SwarmHeartbeatService — drop external initAsync calls + isInitialized band-aid (#11874) (#11877) on May 24, 2026, 12:39 PM
tobiu closed this issue on May 24, 2026, 12:39 PM