LearnNewsExamplesServices
Frontmatter
id11791
titleOperator docs + health/telemetry for tenant-repo ingestion
stateClosed
labels
documentationenhancementai
assigneesneo-opus-ada
createdAtMay 22, 2026, 11:35 PM
updatedAtJun 7, 2026, 7:14 PM
githubUrlhttps://github.com/neomjs/neo/issues/11791
authorneo-opus-ada
commentsCount3
parentIssue11731
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[x] 11790 tenant-repo-sync scheduler lane — periodic + manual
blocking[]
closedAtMay 28, 2026, 2:37 AM

Operator docs + health/telemetry for tenant-repo ingestion

Closed Backlog/active-chunk-14 documentationenhancementai
neo-opus-ada
neo-opus-ada commented on May 22, 2026, 11:35 PM

Context

Sub 6 of Epic #11731 (Server-side tenant-repo ingestion for cloud Agent OS deployments), graduated from Discussion #11782. This sub makes the new pull-based mode operable + observable.

The Problem

A deployment operator needs to (a) configure tenant repos for server-side ingestion, and (b) see repo-freshness without reading raw Chroma rows. Discussion #11782's §5.2 Step-Back sweep flagged the consumer surfaces this mode touches.

The Fix

  • Operator docs: extend the cloud-deployment guides (learn/agentos/cloud-deployment/TenantIngestionModel.md and adjacent) for the server-side pull mode — config, the credentialed-reference contract, the periodic/manual triggers, the repo-mirror volume.
  • Health/telemetry: surface per-repo freshness (last-ingested revision, last successful sync time, active/disabled/quarantined status) on the health/readiness surface + operator logging/summary output.
  • Consumer-sweep surfaces (Step-Back pt 2): deployment compose/volumes, health/readiness, backup/redeploy-survival docs, tenant config storage docs, parser/source-family docs, deletion telemetry.

Acceptance Criteria

  • Cloud-deployment docs cover server-side tenant-repo ingestion config + triggers + the credentialed-reference contract + the repo-mirror volume.
  • Health/telemetry surfaces per-repo freshness (revision, last-sync-time, status) without requiring raw Chroma reads.
  • Operator logging/summary output for refresh cycles.
  • The consumer-sweep surfaces enumerated above are each addressed or explicitly deferred with rationale.

Out of Scope

  • The implementation of the service/lane/contract (subs 2–5 / #11731).

Contract Ledger

Per the Contract Completeness Gate for tickets introducing public or consumed surfaces. #11791 introduces operator-visible docs plus health/telemetry surfaces that consume #11787 tenant config, #11788 mirror state, #11789 envelope semantics, and #11790 scheduler state. Contract surface is enumerated below.

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
TenantIngestionModel.md pull-mode operator section Discussion #11782 Step-Back consumer sweep + ADR 0014 tenant-repo amendment Add a dedicated server-side pull section covering tenantRepos[], credential-reference rules, periodic/manual triggers, mirror volume, redeploy posture, and push-vs-pull coexistence. Cross-link from the existing push-mode model so operators see the pull path as additive, not a push replacement. TenantIngestionModel.md + cloud-deployment overview links Markdown link check or targeted doc grep proving every sibling sub is linked
Deployed health/readiness payload #11791 AC2 + existing HealthService.recordTaskOutcome() / healthcheck.orchestrator.tasks pattern Surface per-repo freshness through the existing Memory Core healthcheck orchestrator task block, e.g. orchestrator.tasks['tenant-repo-sync'].details.repos[] with {tenantId, repoSlug, lastIngestedRev, lastSyncAt, status, lastErrorCode?, lastSyncDeletedCount?}. Empty tenantRepos[] -> repos: [], not omitted; scheduler unavailable -> task status skipped or failed with details explaining unavailable state; no raw Chroma reads. Health payload JSDoc + TenantIngestionModel.md verification section Unit test for recordTaskOutcome('tenant-repo-sync', ..., details) projection and payload shape
Repo freshness status enum #11791 AC2 + #11790 scheduler state active = normal cadence and last success; disabled = operator-disabled config; degraded = last cycle failed but retry budget remains; quarantined = backoff threshold exceeded and operator action is needed. Recovery after success returns to active; transient per-tick states are not persisted as long-lived health status. JSDoc on enum + operator runbook Unit test for success/failure/backoff threshold transitions
Operator logging output for refresh cycles #11791 AC3 + Orchestrator write-log pattern + #11787 redaction helper Emit one per-repo refresh summary with tenant/repo, result, short head revision, duration, changed-file count, deleted count, and stable error code when failed. Credential material and git stderr pass through redactTenantRepoSecrets() before logging; degraded retryable failures use warning-level logs unless the lane itself is unhealthy. JSDoc on log format + operator log-reading section Unit test for success, skipped, and failed log lines; fake secret absent
Deployment compose / mirror volume operator note Discussion #11782 Step-Back consumer sweep + #11788 compose addition Document the tenant-repo-mirrors named volume and NEO_TENANT_REPO_MIRROR_ROOT mount as the persistent mirror store used by server-side pull ingestion. Missing/unmounted mirror root -> service reports degraded/disabled state without crashing the Orchestrator; operator docs point to the volume and env var. TenantIngestionModel.md + DeploymentCookbook.md tenant repo boundary Doc-presence check for volume/env names
Backup / redeploy survival posture Step-Back consumer sweep + #11788 mirror volume contract Document the chosen posture: mirrors are reproducible cache from upstream git unless implementation intentionally includes the volume in a backup bundle; redeploy recovery re-clones missing mirrors through GitMirror.cloneIfMissing() on the next sync. If operator wants faster recovery, docs may name the volume for optional external backup, but must not imply Chroma/MC backup requires mirror bytes for correctness. Cloud-deployment backup/redeploy section Doc check; runtime test only if backup code is changed
Tenant config storage operator note Step-Back consumer sweep + #11787 tenantRepos[] contract Document how tenantRepos[] is persisted through KnowledgeBaseIngestionService.setTenantConfig({tenantId, config}) or any operator surface implemented by the PR; do not claim an existing MCP tool unless the implementation adds one. Cross-tenant config writes remain rejected by the existing RLS gate; missing/invalid credentialRef or credential-bearing cloneUrl surfaces #11787 stable errors. Configuration.md + TenantIngestionModel.md Doc check; existing #11787 tests cover normalization/RLS unless a new operator surface is added
Parser/source-family cross-link Step-Back consumer sweep + existing Source/Parser docs Clarify that pull-mode repo files enter the same parser/source-family model as push/bulk ingestion; no new parser contract is introduced by server-side Git acquisition. If a source family is unsupported, docs route to existing customSources / customParsers guidance instead of defining a pull-specific parser path. CustomSources.md, CustomParsers.md, TenantIngestionModel.md Doc link check
Deletion telemetry surfacing Step-Back consumer sweep + existing ingestSourceFiles({deleted, manifestSnapshot, baseRevision, headRevision}) contract Surface deletion count in operator logs and the health payload details (lastSyncDeletedCount) based on the ingestion summary for the sync run. Partial ingest or manifest update failure leaves revision state unchanged so the next cycle re-detects and retries deletion idempotently. JSDoc + operator section on deleted files Unit test simulating deleted path count in service summary and health/log projection
Upstream dependency boundary #11789 / #11790 Contract Ledgers #11791 consumes the final scheduler/envelope contract; it does not invent scheduler state fields before #11790 defines them. If implemented before #11790 merges, the PR must explicitly target a stacked branch and declare the stack. Upstream contract drift -> update this ledger before PR open rather than papering over mismatch in review. PR body dependency declaration Live PR base/head evidence + ledger audit in review

Behavioral invariants:

  • Docs-only edits do not need runtime tests; health payload or logging behavior changes do.
  • Operator-facing surfaces consume existing sibling contracts unless the PR explicitly introduces and tests a new surface.
  • Every new doc section links both ways across the relevant guide tree and ADR 0014 amendment.
  • quarantined must be operator-actionable and point to a runbook section, not just a label.
  • No operator surface may expose credential-bearing URLs, resolved secrets, or raw git stderr.

Out of contract scope:

  • Repo acquisition / GitMirror clone/fetch/diff -> sub 3 #11788.
  • Envelope builder bootstrap/incremental/force-push fallback -> sub 4 #11789.
  • Scheduler service + lane -> sub 5 #11790.
  • Tenant config + no-secret boundary -> sub 2 #11787.
  • Lane classification in ADR 0014 -> sub 1 #11740.
  • Webhook-driven triggering -> deferred per Discussion #11782 OQ2.
  • Per-tenant rate-limit policy -> deployment-tier operator policy, not this sub.

Related

  • Parent epic: #11731.
  • Origin Discussion: #11782 (§5.2 Step-Back sweep pts 2, 5).

Origin Session ID

39185c66-a107-46ea-b0bf-eb4fa1137257

tobiu referenced in commit 521f18d - "feat(cloud-deployment): operator docs + health/telemetry for tenant-repo ingestion (#11791) (#11951) on May 25, 2026, 8:37 AM
tobiu referenced in commit 3b2fd16 - "feat(orchestrator): stable KB_TENANT_REPO_SYNC_* error code taxonomy (#11942 AC3+AC4) (#11952) on May 25, 2026, 9:46 AM
tobiu removed the agent-task:blocked label on May 28, 2026, 12:15 AM
tobiu closed this issue on May 28, 2026, 2:37 AM