Problem (A2A Context — Claude Opus 4.6 via Antigravity)
The Golden Path synthesis (DreamService.synthesizeGoldenPath()) computes strategic priorities using a hybrid (semanticScore * 2.0) + (struct_score * 1.0) formula, but edge weights are static after computation. There is no feedback loop from task execution outcomes back into the scoring algorithm.
When an agent completes a Golden Path task:
- If successful: the ticket closes, the node leaves the OPEN filter, but the structural edges that led to its prioritization retain their weights unchanged
- If failed: the ticket remains open, the agent moves on, but no penalty signal adjusts the weights
This means the system cannot learn from its own execution history. A task that consistently fails keeps getting re-prioritized. A task category where the agent excels doesn't get proportionally reinforced.
Solution
Implement a RewardService in ai/mcp/server/memory-core/services/RewardService.mjs:
1. Outcome Capture (in Loop.reflect())
After each completed cycle, compute a quantitative outcome signal:
const signal = {
taskId : event.goldenPathNodeId,
outcome : 'success' | 'partial' | 'failure',
metrics : {
testsPassed : Number,
testsTotal : Number,
filesModified : Number,
commitSha : String|null,
executionTimeMs: Number
}
};
2. Edge Weight Mutation
Feed the signal back into GraphService:
- Success:
linkNodes(completedNodeId, relatedNodes, 'REWARD_POSITIVE', outcome.score) — strengthens the structural pathways that led to this task
- Failure: Apply decay multiplier (e.g., 0.7) to the inbound
GUIDES edges from the frontier to the failed node — deprioritizes repeated failure
- Partial: No weight change (neutral signal)
3. Golden Path Recalibration
The next synthesizeGoldenPath() run will automatically pick up the mutated edge weights via the existing struct_score SQL aggregation, naturally deprioritizing failed tasks and reinforcing successful patterns.
Architectural Context
ai/agent/Loop.mjs (L441-469): reflect() — the existing reflection hook where outcome capture should be injected
ai/mcp/server/memory-core/services/GraphService.mjs (L173-228): linkNodes() — the Hebbian edge weight accumulation that the reward signal should target
ai/daemons/DreamService.mjs (L929-988): The Golden Path SQL query that consumes struct_score — already wired to pick up edge weight changes
Avoided Pitfalls
- Do NOT create a separate database for reward signals — use the existing Native Edge Graph topology. The whole point is that rewards flow through the same edges the Golden Path traverses.
- Do NOT use semantic reward (LLM judging quality) — use quantitative metrics only. Semantic reward introduces hallucination risk into the feedback loop.
- Protect against reward hacking: cap maximum positive reinforcement per cycle (already handled by
Math.min(..., 5.0) in linkNodes)
- The
REWARD_POSITIVE edge type should NOT be decayed by decayGlobalTopology() — add it to the protected list alongside IMPLEMENTS, EXTENDS, SYSTEM_TENET
Verification
- Unit test:
test/playwright/unit/ai/RewardService.spec.mjs
- Assert: Success signal increases target edge weight
- Assert: Failure signal decays target edge weight
- Assert: Repeated failure eventually drops node below Golden Path threshold
Problem (A2A Context — Claude Opus 4.6 via Antigravity)
The Golden Path synthesis (
DreamService.synthesizeGoldenPath()) computes strategic priorities using a hybrid(semanticScore * 2.0) + (struct_score * 1.0)formula, but edge weights are static after computation. There is no feedback loop from task execution outcomes back into the scoring algorithm.When an agent completes a Golden Path task:
This means the system cannot learn from its own execution history. A task that consistently fails keeps getting re-prioritized. A task category where the agent excels doesn't get proportionally reinforced.
Solution
Implement a
RewardServiceinai/mcp/server/memory-core/services/RewardService.mjs:1. Outcome Capture (in
Loop.reflect())After each completed cycle, compute a quantitative outcome signal:
const signal = { taskId : event.goldenPathNodeId, // The graph node that spawned this task outcome : 'success' | 'partial' | 'failure', metrics : { testsPassed : Number, // If QA sub-agent ran tests testsTotal : Number, filesModified : Number, commitSha : String|null, executionTimeMs: Number } };2. Edge Weight Mutation
Feed the signal back into GraphService:
linkNodes(completedNodeId, relatedNodes, 'REWARD_POSITIVE', outcome.score)— strengthens the structural pathways that led to this taskGUIDESedges from the frontier to the failed node — deprioritizes repeated failure3. Golden Path Recalibration
The next
synthesizeGoldenPath()run will automatically pick up the mutated edge weights via the existingstruct_scoreSQL aggregation, naturally deprioritizing failed tasks and reinforcing successful patterns.Architectural Context
ai/agent/Loop.mjs(L441-469):reflect()— the existing reflection hook where outcome capture should be injectedai/mcp/server/memory-core/services/GraphService.mjs(L173-228):linkNodes()— the Hebbian edge weight accumulation that the reward signal should targetai/daemons/DreamService.mjs(L929-988): The Golden Path SQL query that consumesstruct_score— already wired to pick up edge weight changesAvoided Pitfalls
Math.min(..., 5.0)inlinkNodes)REWARD_POSITIVEedge type should NOT be decayed bydecayGlobalTopology()— add it to the protected list alongsideIMPLEMENTS,EXTENDS,SYSTEM_TENETVerification
test/playwright/unit/ai/RewardService.spec.mjs