LearnNewsExamplesServices
Frontmatter
id11743
titleSpecify repo-push receiver and auth flow for KB ingestion envelopes
stateClosed
labels
enhancementaiarchitecture
assigneesneo-gpt
createdAtMay 22, 2026, 4:19 AM
updatedAtMay 22, 2026, 10:58 AM
githubUrlhttps://github.com/neomjs/neo/issues/11743
authorneo-gpt
commentsCount2
parentIssue11720
subIssues
11744 Gate ingest_source_files exposure by transport
subIssuesCompleted1
subIssuesTotal1
blockedBy[]
blocking[]
closedAtMay 22, 2026, 10:58 AM

Specify repo-push receiver and auth flow for KB ingestion envelopes

Closedenhancementaiarchitecture
neo-gpt
neo-gpt commented on May 22, 2026, 4:19 AM

Context

Operator surfaced a concept challenge during the active #11720 night-shift: #11720/#11728 need a deployable tenant repo-push ingestion path, not just the existence of KnowledgeBaseIngestionService.ingestSourceFiles().

The original ticket framing incorrectly treated MCP as disqualified for the repo-push receiver. Live source review corrected that premise:

  • The KB MCP server has a real StreamableHTTP /mcp transport for cloud deployments.
  • The deployment profile includes OAuth/OIDC auth in front of that MCP endpoint.
  • The Neo MCP client supports remote sse / streamable-http transports.
  • Therefore, an MCP-client-over-StreamableHTTP repo-push path is viable in principle.

The unresolved MVP gap is the deployable invocation/auth model: concrete tenant hook/CI client, token acquisition/storage, OAuth audience/resource setup, volume policy, and failure signatures.

Duplicate sweep:

  • ask_knowledge_base(query='ticket duplicate repo-push receiver auth flow KB ingestion envelopes MCP StreamableHTTP', type='ticket') surfaced #11726 and older related endpoint/auth issues, but no equivalent open ticket for the tenant-side repo-push invocation model.
  • rg "repo-push|StreamableHTTP|NEO_KB_MCP_URL|tenant push receiver|KB receiver|HTTP/streaming endpoint" resources/content/issues resources/content/discussions learn/agentos/cloud-deployment ai/services/knowledge-base ai/mcp/server/knowledge-base found #11626/#11635/#11679 deferrals and #11731's related clone path, not this implementation surface.
  • #11731 is related but not duplicate: it owns server-side ref-only webhook/clone/credential exploration, whereas this ticket keeps the MVP content-bearing push path where tenant hook/CI reads files and submits an ingestion envelope.

Architecture pre-flight routing: selected discipline is ticket-create for an actionable missing implementation surface under an already-graduated workstream. Nearest alternative was Ideation Sandbox; rejected because the upstream Discussion #11623 already graduated and the live source review narrowed the missing piece to concrete invocation/auth wiring, not abstract architecture. Blast radius is bounded: tenant-side CLI/hook, cloud deployment docs, and tests.

The Problem

#11624 delivered the ingestion service, contracts, tenant metadata, MCP facade, bulk CLI, docs, observability, and tests. It did not deliver a day-0 operator primitive that an external tenant repo hook or CI job can run against a deployed KB endpoint.

Current substrate answers two different questions:

  1. How does ingestion logic run once called? KnowledgeBaseIngestionService.ingestSourceFiles().
  2. Which current facades call it? MCP ingest_source_files and co-located CLI npm run ai:ingest-tenant.

It does not fully answer the deployment question:

A tenant repo push happened. Which runnable client does the hook/CI call, where does it get its token, what OAuth audience/resource must that token target, how is the envelope built, and how does the job fail when volume/auth/parser errors occur?

An envelope is data, not an invocation model. A service method is not an operator-facing trigger. The cloud deployment needs a tenant-side push client plus documented auth and failure semantics.

Without that surface, #11720's adoption ladder has a hidden gap at tenant KB ingestion and #11728's day-0 tutorial risks teaching tacit maintainer knowledge instead of a repeatable operator journey.

The Architectural Reality

Evidence verified before implementation:

Surface Current reality Gap
KnowledgeBaseIngestionService.ingestSourceFiles() Service-layer method exists and validates/stamps/embeds payloads. It is not a trigger; it needs a caller and auth context.
ai/mcp/server/knowledge-base/toolService.mjs Maps ingest_source_files through an MCP facade with the #10572 work-volume gate; remote cloud transport is viable. Tool existence alone is not a tenant hook/CI invocation model.
KB MCP StreamableHTTP /mcp endpoint Deployed profile can expose MCP over HTTP behind OAuth/OIDC. Tenant repo-push needs the token/audience/resource contract and client wrapper.
Neo MCP client Supports remote sse / streamable-http transport strings. Tenant hook/CI needs a small executable wrapper around that client.
buildScripts/ai/ingestTenant.mjs CLI imports services directly and calls ingestSourceFiles({viaMcp:false}). Runs as a co-located shell process for bulk/backfill; not the remote tenant push default.
learn/agentos/cloud-deployment/HookWiring.md Documents MCP and bulk CLI facades. Needs the concrete ai:kb-push-client path, token env contract, and failure signatures.
#11731 Owns server-side clone/credential exploration. Ref-only webhook payloads remain out of scope for this ticket.

Critical boundary: if the receiver gets only {repo, ref, sha} metadata, the server must fetch/clone the repo and therefore needs credentials. That is #11731 territory. This ticket keeps the MVP receiver content-bearing: the tenant-side hook/CI reads changed files or emits parsed-chunk-v1, then sends the envelope to the cloud KB MCP endpoint.

The Fix

Define and implement the repo-push invocation/auth model for content-bearing KB ingestion envelopes.

MVP shape:

  1. Add a tenant-side CLI wrapper, npm run ai:kb-push-client, that connects to the remote KB /mcp endpoint with the Neo MCP client over streamable-http or sse.
  2. The wrapper reads one JSON ingestion envelope from stdin or a file and calls ingest_source_files.
  3. The wrapper accepts NEO_KB_MCP_URL, NEO_KB_MCP_TRANSPORT, NEO_KB_INGEST_TOKEN, NEO_KB_TOKEN_ENV, NEO_KB_TENANT_ID, and NEO_KB_REPO_SLUG.
  4. The reference pre-push hook builds a content-bearing envelope and uses the remote MCP client when NEO_KB_MCP_URL is configured, while retaining the local bulk CLI fallback for co-located demos/imports.
  5. Preserve #10572 semantics: remote MCP calls remain volume-gated and must fail visibly with KB_INGEST_VOLUME_EXCEEDED instead of freezing an agent turn.
  6. Document OAuth/OIDC setup: the repo-push automation identity token is stored in the tenant hook/CI secret store, targets the KB public MCP resource/audience, and is not a Git credential.
  7. Update HookWiring.md, TenantIngestionModel.md, Configuration.md, examples, and #11728 expectations so repo-push ingestion uses the concrete client/auth model.
  8. Add tests proving the CLI parser, auth header wiring, envelope defaults, tool call, cleanup, and structured failure semantics without a live KB server.

A non-MCP HTTP/queue receiver that shares KnowledgeBaseIngestionService remains a future alternative if the MCP-over-StreamableHTTP client path proves operationally awkward. It is not required for this MVP. Server-side ref-only webhook/clone ingestion remains #11731.

If implementation chooses a new .mjs file, run structural-pre-flight before authoring the file.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
Tenant repo-push client #11720 adoption ladder; live MCP StreamableHTTP/OAuth source review; #11743 correction comments ai:kb-push-client runs in tenant hook/CI, connects to remote KB /mcp, and invokes ingest_source_files with a content-bearing envelope. If MCP-over-StreamableHTTP proves awkward, create a future non-MCP HTTP/queue receiver sharing KnowledgeBaseIngestionService. HookWiring.md, TenantIngestionModel.md, Configuration.md Unit test asserts CLI parsing, transport config, auth header, tool call, and cleanup.
Envelope contract #11624 contracts; parsed-chunk-v1; deletion-signaling contract Client sends raw files, parsed chunks, tombstones, manifests, and revision boundaries using the existing ingest_source_files envelope. Ref-only webhook payloads are rejected/deferred to #11731 because they require server-side clone credentials. Cloud-deployment docs + example hook Tests for defaults and structured payload handling; hook syntax validation.
Auth / tenant stamping #11631 write-side stamping; #11632 read-side isolation; #11726 credential boundary Repo-push token is a KB MCP authorization credential stored in hook/CI secrets; OAuth audience/resource targets the KB public MCP endpoint. Server-authenticated context remains authoritative. Missing/expired/wrong-audience token fails closed through HTTP 401 / auth middleware. Security/configuration/hook docs Docs identify token env, audience/resource, and failure signatures.
Volume policy #10572 MCP gate; #11635 bulk mode MCP client preserves mcpSyncMaxChunks refusal. Oversized incremental pushes fail visibly and instruct split-or-bulk. Initial import/backfill uses ai:ingest-tenant on the deployment host. Hook wiring + tutorial failure signatures Unit tests cover KB_INGEST_VOLUME_EXCEEDED as failure; docs describe operator response.
Day-0 tutorial consumer #11720 adoption ladder; #11728 Fresh operator can wire a tenant hook/CI job using NEO_KB_MCP_URL + token env and run the push client. #11728 provides live deployed proof after this PR lands. New tutorial artifact PR declares L2 evidence here; #11728 supplies L3 live proof.

Acceptance Criteria

  • A tenant-side repo-push invocation surface exists and can call the deployed KB MCP /mcp endpoint over streamable-http or sse.
  • The invocation surface reads a content-bearing ingestion envelope from stdin or file and calls ingest_source_files.
  • CLI/env contract documents NEO_KB_MCP_URL, NEO_KB_MCP_TRANSPORT, NEO_KB_INGEST_TOKEN, NEO_KB_TOKEN_ENV, NEO_KB_TENANT_ID, and NEO_KB_REPO_SLUG.
  • Reference hook/CI example builds an envelope with changed files, tombstones, baseRevision, headRevision, repoSlug, and tenant defaults.
  • OAuth/OIDC guidance explains token storage, audience/resource targeting, and the boundary between KB MCP authorization credentials and Git credentials.
  • Volume behavior preserves the MCP gate; KB_INGEST_VOLUME_EXCEEDED is surfaced as a hook/CI failure with split-or-bulk guidance.
  • Ref-only repo webhook payloads are explicitly rejected or deferred to #11731; no hidden clone credential requirement is introduced.
  • Security boundary is preserved: server-side authenticated context remains authoritative for tenant identity.
  • HookWiring.md, TenantIngestionModel.md, Configuration.md, and examples/cloud-deployment/ distinguish service method, MCP facade, tenant push client, and bulk CLI.
  • #11728 day-0 tutorial can consume this model for a live tenant-ingestion milestone, or explicitly marks live proof blocked until the PR lands.
  • Tests cover parser/env handling, auth header wiring, envelope defaults, MCP tool call, cleanup, and structured failure detection.

Out of Scope

  • Server-side repo cloning, fetch credentials, credential vaulting, or Git provider app installation. Those belong to #11731.
  • Replacing KnowledgeBaseIngestionService or forking ingestion logic away from the existing service layer.
  • Client-side embeddings. The KB server still owns embeddings.
  • A production OAuth provider implementation. This ticket specifies the deployable token/audience/resource contract; deployment-specific identity-provider provisioning remains operator-owned.
  • A separate non-MCP HTTP/queue receiver. That remains a candidate future alternative if this MVP path proves operationally awkward.

Avoided Traps

Trap Why rejected
Claiming MCP is impossible for repo-push ingestion Live source shows StreamableHTTP /mcp, OAuth/OIDC, and a remote-capable Neo MCP client; the missing piece was deployable invocation/auth, not MCP viability.
Treating ingest_source_files existence as an operator journey A tool method is not enough; hook/CI needs a runnable client, token env contract, and failure semantics.
Treating the bulk CLI as the remote tenant push default A co-located shell process is right for onboarding/backfill, not day-0 incremental repo push from a tenant workspace.
Sending only SHA/ref metadata Requires server-side clone credentials, which #11720 deliberately defers to #11731.
Duplicating ingestion logic inside the client Splits validation, stamping, deletion, telemetry, and embedding behavior from the service already delivered by #11624.

Related

Parent: #11720

Depends on / builds from: #11624, #11626, #11633, #11635, #11679, #11726

Blocks / informs: #11728, #11731, #11724/#11725 deployment proof if tenant-ingestion is part of the journey

Origin Discussion: #11623 Q3 push endpoint protocol

Origin Session ID

2741c4bd-92b2-428b-92d3-ab718d9a7c41

Handoff Retrieval Hints

  • query_raw_memories({query: 'repo-push MCP StreamableHTTP KB ingestion envelope #11720 #11743'})
  • query_summaries({query: '#11743 repo-push receiver auth model MCP StreamableHTTP'})
  • ask_knowledge_base({query: 'KnowledgeBaseIngestionService ingestSourceFiles MCP facade bulk CLI HookWiring', type: 'all'})
  • Exact evidence anchors: KB MCP StreamableHTTP /mcp, OAuth/OIDC deployment auth, Neo remote MCP client support, #11731 for ref-only clone/credential exploration.
tobiu closed this issue on May 22, 2026, 10:10 AM
tobiu referenced in commit 33102c2 - "feat(kb): add repo-push MCP client (#11743) (#11749) on May 22, 2026, 10:58 AM
tobiu closed this issue on May 22, 2026, 10:58 AM