| Tenant ingestion entrypoint |
#11720 adoption ladder; #11624 cross-repo KB ingestion substrate; #11635 ai:ingest-tenant; KnowledgeBaseIngestionService.ingestSourceFiles |
Push-based ingestion is the MVP default. External tenants push raw file deltas or parsed-chunk-v1 records through ingest_source_files for bounded interactive batches or npm run ai:ingest-tenant -- ... for bulk/backfill. |
Server-side clone/pull is deferred to #11731 unless push ingestion is falsified as insufficient. Invalid or over-volume MCP payloads route to structured errors / bulk CLI. |
Yes — Sub E operational model plus Sub F2 tutorial handoff. |
L1 docs contract; add L2/L3 tests only if the implementation touches a real seam. |
Repo identity: {tenantId, repoSlug, rootKind, sourcePath} plus operational branch metadata |
#11718/#11720 lost-item audit; #11623 path-identity tuple; parsed-chunk-v1 schema; merged tenant-aware ingestion work |
repoSlug, rootKind, sourcePath, and branch/run metadata are deterministic, URL-safe, secret-free, and stable per tenant repo/source root. Manifests, tombstones, retention/GC, telemetry, and source-family inventory stay isolated per {tenantId, repoSlug}. |
Invalid, ambiguous, missing-rootKind, or secret-bearing repo identifiers are rejected or normalized without persisting secrets. Missing manifest data must not erase prior claimed state. |
Yes — Sub E operational model; references for HookWiring/CustomParsers where relevant. |
L1 docs contract; L2 focused unit coverage if parser/manifest/repoSlug code is changed. |
| Credential-bearing Git URL policy |
#11720 lost-item audit; #11731 clone exploration boundary |
MVP push ingestion does not require the KB server to store clone credentials. Credential-bearing Git URLs are either rejected for the MVP path or represented as deferred #11731 clone inputs; secrets are never persisted in repoSlug, logs, manifests, tutorial snippets, or graph-visible config. |
If server-side clone becomes necessary later, #11731 must define the credential transport/storage contract before implementation. |
Yes — Sub E operational model; Sub F2 must not teach secret-bearing examples. |
L1 docs contract; redaction/rejection tests only if code changes touch URL handling. |
| Parser dispatch model |
#11623 parser locality decision; parsed-chunk-v1 schema; #11624 substrate |
Supported source families may use Neo-shipped/server-side parsing for raw deltas; custom or untrusted formats use client-side parsers emitting parsed-chunk-v1. The KB validates parsed chunks and owns embeddings. |
Invalid parsed-chunk-v1 records are rejected; records carrying embeddings are rejected outside restore paths. Unknown file families require client-side parser output or explicit unsupported status. |
Yes — Sub E model; feeds Sub F2 tutorial and source-family checklist. |
L1 docs contract; cite existing schema/service tests, add coverage only if new behavior is introduced. |
| Tenant source-family inventory checklist |
#11718 completeness audit; #11720 Sub E refinement |
The operational model enumerates source families an external repo must classify before ingestion, including JS-global, test-suite, IDE/header/test-library equivalents, docs, configs, and custom parser-owned formats. |
Unknown or unsupported source families are recorded as unsupported/client-parser-required rather than silently skipped. |
Yes — checklist in Sub E output; Sub F2 consumes it. |
L1 checklist; optional fixture proof if needed for tutorial executability. |
Context
Sub E of Epic #11720 (Cloud Agent OS Deployment Readiness). MVP-critical. Independently startable (no hard D0 dependency).
The Problem
The operational model for an external tenant ingesting its repo content into the cloud-deployed KB is undefined: how a tenant repo is identified (
repoSlug,rootKind,sourcePath, plus operational branch metadata), how git credentials are handled, how per-file-type parsers are dispatched, how ingested content is validated. #11624 delivered the cross-repo KB ingestion substrate; this sub defines the cloud-deployment operational model on top of it.The Fix
Define the tenant-repo ingestion operational model: push-based ingestion is the default (
ingest_source_files/ai:ingest-tenant);repoSlug/rootKind/sourcePathidentity plus operational branch metadata (URL-safe / deterministic); the git-credential handling boundary; per-file-type parser dispatch (default + client-side parsers for custom / untrusted formats emittingparsed-chunk-v1); ingestion validation checks. Server-side repo cloning is a separate high-blast exploration (D3) — flagged, out of scope here.Contract Ledger
ai:ingest-tenant;KnowledgeBaseIngestionService.ingestSourceFilesparsed-chunk-v1records throughingest_source_filesfor bounded interactive batches ornpm run ai:ingest-tenant -- ...for bulk/backfill.{tenantId, repoSlug, rootKind, sourcePath}plus operationalbranchmetadataparsed-chunk-v1schema; merged tenant-aware ingestion workrepoSlug,rootKind,sourcePath, and branch/run metadata are deterministic, URL-safe, secret-free, and stable per tenant repo/source root. Manifests, tombstones, retention/GC, telemetry, and source-family inventory stay isolated per{tenantId, repoSlug}.rootKind, or secret-bearing repo identifiers are rejected or normalized without persisting secrets. Missing manifest data must not erase prior claimed state.repoSlug, logs, manifests, tutorial snippets, or graph-visible config.parsed-chunk-v1schema; #11624 substrateparsed-chunk-v1. The KB validates parsed chunks and owns embeddings.parsed-chunk-v1records are rejected; records carrying embeddings are rejected outside restore paths. Unknown file families require client-side parser output or explicit unsupported status.Acceptance Criteria
repoSlug/rootKind/sourcePathidentity contract plus branch metadata is defined + deterministic.repoSlug/ log / manifest / tutorial invariant.{tenantId, repoSlug}.Out of Scope
Related
Parent #11720 · builds on #11624 (cross-repo KB ingestion) · Origin Discussion #11718 §5 Sub E · Post-MVP clone exploration #11731.
Origin Session ID
8e1dc8ca-b5a5-4479-b3cf-31918eb4a5b2