Problem
There is no way to answer: "Is the agent system actually getting better over time?"
We have 7,789 memories and 740 session summaries but no trend analysis. Are sessions getting more productive? Are PR acceptance rates improving? Is the knowledge base keeping pace with the codebase? Without these metrics, "self-evolving system" is an aspiration, not a measurable claim.
Proposal
Generate an agent_health_metrics.json file (or extend sandman_handoff.md) with longitudinal metrics produced by DreamService during the REM cycle.
Proposed Metrics
| Metric |
Source |
What It Measures |
| Session Quality Trend |
30-day moving average of quality scores from session summaries |
Are agents producing higher-quality work? |
| Productivity Trend |
30-day moving average of productivity scores |
Are agents getting more done per session? |
| Memory Retrieval Hit Rate |
% of query_raw_memories calls that return results with distance < 0.5 |
Is the memory actually useful? |
| PR Acceptance Rate |
% of agent PRs merged without requested changes (requires #PR_OUTCOME_TRACKER) |
Is the code production quality improving? |
| Knowledge Base Coverage |
count(indexed_files) / count(total_source_files) |
Is the KB keeping pace? |
| Graph Density |
edges / nodes ratio over time |
Is structural understanding growing or decaying? |
| Summarization Health |
% of sessions successfully summarized within 24h |
Is the REM pipeline reliable? |
Implementation
- Data Collection: All metrics derive from existing data sources (ChromaDB collections, SQLite graph, GitHub API).
- Computation: Add a
computeHealthMetrics() method to DreamService that runs at the end of the REM cycle.
- Output: Write to
resources/content/agent_health_metrics.json — this makes it available to the Sandman handoff dashboard (#9952) and to agents via the knowledge base.
- Alerting: If any metric drops below a threshold (e.g., summarization health < 80%), inject a
[SYSTEM_ALERT] into the handoff file.
A2A Context
Origin Session ID: fff6dc5b-ca7f-4c9b-8eca-41bd8a97ad5d
Problem
There is no way to answer: "Is the agent system actually getting better over time?"
We have 7,789 memories and 740 session summaries but no trend analysis. Are sessions getting more productive? Are PR acceptance rates improving? Is the knowledge base keeping pace with the codebase? Without these metrics, "self-evolving system" is an aspiration, not a measurable claim.
Proposal
Generate an
agent_health_metrics.jsonfile (or extendsandman_handoff.md) with longitudinal metrics produced byDreamServiceduring the REM cycle.Proposed Metrics
qualityscores from session summariesproductivityscoresquery_raw_memoriescalls that return results with distance < 0.5count(indexed_files) / count(total_source_files)edges / nodesratio over timeImplementation
computeHealthMetrics()method toDreamServicethat runs at the end of the REM cycle.resources/content/agent_health_metrics.json— this makes it available to the Sandman handoff dashboard (#9952) and to agents via the knowledge base.[SYSTEM_ALERT]into the handoff file.A2A Context
Origin Session ID:
fff6dc5b-ca7f-4c9b-8eca-41bd8a97ad5d