LearnNewsExamplesServices
Frontmatter
id12068
titleSub 2: 5-axis observability primitive + REM run/stage state model
stateClosed
labels
enhancementaiarchitecturemodel-experience
assigneesneo-opus-ada
createdAtMay 27, 2026, 3:42 AM
updatedAtMay 28, 2026, 5:06 AM
githubUrlhttps://github.com/neomjs/neo/issues/12068
authorneo-opus-ada
commentsCount2
parentIssue12065
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 28, 2026, 5:06 AM

Sub 2: 5-axis observability primitive + REM run/stage state model

Closed Backlog/active-chunk-15 enhancementaiarchitecturemodel-experience
neo-opus-ada
neo-opus-ada commented on May 27, 2026, 3:42 AM

Parent Epic

#12065 — Sub 2 of 9. Parallel to Sub 1.

Premise

Per Discussion #12062 §2.6: REM pipeline state is measurable across 5 distinct axes (Chroma summary count / graph SESSION nodes / per-session ENTITY-RELATION counts / topology-conflict counts / Chroma graphDigested:true flag) which CAN AND DO diverge silently. Currently only axis A is exposed via MCP. GPT live V-B-A 2026-05-27 ~01:19Z showed 76× axis-A-vs-axis-B divergence (1,069 Chroma summaries vs only 14 graph SESSION nodes).

Per Discussion #12062 §2.11: current code uses single graphDigested boolean as completion gate — insufficient since TopologyInferenceEngine returns void (silent failure invisible to flag).

Prescription

Two coupled deliverables:

Part A — 5-axis observability primitive:

  • ChromaManager helpers: getUndigestedSessionCount(), getGraphDigestedCount()
  • GraphService helpers: getSessionNodeCount(), getSessionEntityCount(sessionId)
  • TopologyInferenceEngine: getTopologyConflictCount() (grep sandman_handoff.md for source-session entries)
  • MCP exposure: extend manage_database OR add new get_rem_pipeline_state tool
  • Per-cycle telemetry log emitting all 5 axis counts

Part B — REM run/stage state model:

Per Discussion #12062 §2.11 schema:

{
  runId, reason, startedAt, completedAt, outcome,
  failurePhase, failureReason,
  perSessionStates: [{sessionId, payloadSizeTokens, memorySessionIngest, triVector, topology, gapSession, graphDigestedFlag, failureReasons}],
  lastSuccessfulPhase, cycleScopePhases
}

Storage: JSONL at .neo-ai-data/rem-runs/<runId>.jsonl (append-only durable; queryable post-restart).

Acceptance Criteria

  • AC1: 5 axis-count helpers shipped on ChromaManager + GraphService + TopologyInferenceEngine
  • AC2: MCP exposure of axis counts (get_rem_pipeline_state tool OR extension of existing tool)
  • AC3: REM run/stage state model JSONL writer integrated into processUndigestedSessions (or into Sub 3's unified method, whichever lands first)
  • AC4: Per-phase state tracking includes topology outcome (closes the silent-failure surface of hypothesis #10)
  • AC5: Documentation at learn/agentos/rem-state-model.md describing schema + retention policy + consumer patterns
  • AC6: Live re-measurement of 5 axes against current production substrate (snapshot before Sub 3 lands)

Avoided Traps

  • ❌ Conflate Chroma flag with "session processed" — operator-explicit challenge per Discussion §2.6 update
  • ❌ Replace JSONL with graph-nodes-only — graph mutations during REM run risk active-control-plane collateral damage per AC9
  • ❌ Skip Part B because hot-fix #12063 helped — state model is independent of cap-raise; still load-bearing for ongoing observability

Related

  • Epic #12065
  • Discussion #12062 §2.6 (5-axis divergence) + §2.11 (state model schema)
  • GPT STEP_BACK empirical anchor: 1069/14/132 axis-divergence at 2026-05-27 ~01:19Z
tobiu referenced in commit 1b30fbd - "feat(ai): 5-axis REM observability primitive helpers + 22-case unit coverage (#12068 Sub 2 Phase 1a) (#12081) on May 27, 2026, 2:17 PM