LearnNewsExamplesServices
Frontmatter
id11726
titleTenant-repo ingestion operational model for cloud deployment
stateClosed
labels
enhancementaiarchitecture
assigneesneo-gpt
createdAtMay 21, 2026, 7:24 PM
updatedAtJun 7, 2026, 7:14 PM
githubUrlhttps://github.com/neomjs/neo/issues/11726
authorneo-opus-ada
commentsCount0
parentIssue11720
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[x] 11731 Server-side tenant-repo ingestion for cloud Agent OS deployments
closedAtMay 21, 2026, 11:57 PM

Tenant-repo ingestion operational model for cloud deployment

Closed v13.0.0/archive-v13-0-0-chunk-12 enhancementaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 21, 2026, 7:24 PM

Context

Sub E of Epic #11720 (Cloud Agent OS Deployment Readiness). MVP-critical. Independently startable (no hard D0 dependency).

The Problem

The operational model for an external tenant ingesting its repo content into the cloud-deployed KB is undefined: how a tenant repo is identified (repoSlug, rootKind, sourcePath, plus operational branch metadata), how git credentials are handled, how per-file-type parsers are dispatched, how ingested content is validated. #11624 delivered the cross-repo KB ingestion substrate; this sub defines the cloud-deployment operational model on top of it.

The Fix

Define the tenant-repo ingestion operational model: push-based ingestion is the default (ingest_source_files / ai:ingest-tenant); repoSlug / rootKind / sourcePath identity plus operational branch metadata (URL-safe / deterministic); the git-credential handling boundary; per-file-type parser dispatch (default + client-side parsers for custom / untrusted formats emitting parsed-chunk-v1); ingestion validation checks. Server-side repo cloning is a separate high-blast exploration (D3) — flagged, out of scope here.

Contract Ledger

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
Tenant ingestion entrypoint #11720 adoption ladder; #11624 cross-repo KB ingestion substrate; #11635 ai:ingest-tenant; KnowledgeBaseIngestionService.ingestSourceFiles Push-based ingestion is the MVP default. External tenants push raw file deltas or parsed-chunk-v1 records through ingest_source_files for bounded interactive batches or npm run ai:ingest-tenant -- ... for bulk/backfill. Server-side clone/pull is deferred to #11731 unless push ingestion is falsified as insufficient. Invalid or over-volume MCP payloads route to structured errors / bulk CLI. Yes — Sub E operational model plus Sub F2 tutorial handoff. L1 docs contract; add L2/L3 tests only if the implementation touches a real seam.
Repo identity: {tenantId, repoSlug, rootKind, sourcePath} plus operational branch metadata #11718/#11720 lost-item audit; #11623 path-identity tuple; parsed-chunk-v1 schema; merged tenant-aware ingestion work repoSlug, rootKind, sourcePath, and branch/run metadata are deterministic, URL-safe, secret-free, and stable per tenant repo/source root. Manifests, tombstones, retention/GC, telemetry, and source-family inventory stay isolated per {tenantId, repoSlug}. Invalid, ambiguous, missing-rootKind, or secret-bearing repo identifiers are rejected or normalized without persisting secrets. Missing manifest data must not erase prior claimed state. Yes — Sub E operational model; references for HookWiring/CustomParsers where relevant. L1 docs contract; L2 focused unit coverage if parser/manifest/repoSlug code is changed.
Credential-bearing Git URL policy #11720 lost-item audit; #11731 clone exploration boundary MVP push ingestion does not require the KB server to store clone credentials. Credential-bearing Git URLs are either rejected for the MVP path or represented as deferred #11731 clone inputs; secrets are never persisted in repoSlug, logs, manifests, tutorial snippets, or graph-visible config. If server-side clone becomes necessary later, #11731 must define the credential transport/storage contract before implementation. Yes — Sub E operational model; Sub F2 must not teach secret-bearing examples. L1 docs contract; redaction/rejection tests only if code changes touch URL handling.
Parser dispatch model #11623 parser locality decision; parsed-chunk-v1 schema; #11624 substrate Supported source families may use Neo-shipped/server-side parsing for raw deltas; custom or untrusted formats use client-side parsers emitting parsed-chunk-v1. The KB validates parsed chunks and owns embeddings. Invalid parsed-chunk-v1 records are rejected; records carrying embeddings are rejected outside restore paths. Unknown file families require client-side parser output or explicit unsupported status. Yes — Sub E model; feeds Sub F2 tutorial and source-family checklist. L1 docs contract; cite existing schema/service tests, add coverage only if new behavior is introduced.
Tenant source-family inventory checklist #11718 completeness audit; #11720 Sub E refinement The operational model enumerates source families an external repo must classify before ingestion, including JS-global, test-suite, IDE/header/test-library equivalents, docs, configs, and custom parser-owned formats. Unknown or unsupported source families are recorded as unsupported/client-parser-required rather than silently skipped. Yes — checklist in Sub E output; Sub F2 consumes it. L1 checklist; optional fixture proof if needed for tutorial executability.

Acceptance Criteria

  • Push-based ingestion documented as the operational default; the repoSlug / rootKind / sourcePath identity contract plus branch metadata is defined + deterministic.
  • Git-credential handling boundary specified, including the secret-free repoSlug / log / manifest / tutorial invariant.
  • Multi-repo-per-tenant semantics specified: manifests, tombstones, retention/GC, telemetry, and source-family inventory remain isolated per {tenantId, repoSlug}.
  • Per-file-type parser dispatch + the validation model defined.
  • The operational model is documented for an external dev team (feeds Sub F2 tutorial).
  • A tenant source-family inventory checklist exists — the external repo source families (JS-global, test-suite, IDE/header/test-library equivalents, etc.) that the ingestion + parser-dispatch model must enumerate. [scope refinement — #11718 completeness audit]

Out of Scope

  • Server-side repo cloning (a D3 exploration — flagged here, not built).
  • The KB ingestion substrate itself (delivered by #11624).

Related

Parent #11720 · builds on #11624 (cross-repo KB ingestion) · Origin Discussion #11718 §5 Sub E · Post-MVP clone exploration #11731.

Origin Session ID

8e1dc8ca-b5a5-4479-b3cf-31918eb4a5b2

tobiu referenced in commit eef91e4 - "feat(agentos): add tenant ingestion model (#11726) (#11737) on May 21, 2026, 11:57 PM
tobiu closed this issue on May 21, 2026, 11:57 PM
tobiu referenced in commit 7c08d75 - "feat(agentos): amend ADR 0014 for tenant-repo pull-ingestion lane (#11740) (#11794) on May 23, 2026, 1:17 AM