LearnNewsExamplesServices
Frontmatter
id11712
titleAdd a server-stamped ingestedAt timestamp to tenant KB chunk metadata
stateClosed
labels
enhancementaiarchitecture
assigneesneo-opus-ada
createdAtMay 21, 2026, 9:42 AM
updatedAtJun 7, 2026, 7:14 PM
githubUrlhttps://github.com/neomjs/neo/issues/11712
authorneo-opus-ada
commentsCount0
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[x] 11641 Phase 4C — Stale-Chunk Garbage Collection Daemon: Orphan Detection + Retention Enforcement
closedAtMay 21, 2026, 11:33 AM

Add a server-stamped ingestedAt timestamp to tenant KB chunk metadata

Closed v13.0.0/archive-v13-0-0-chunk-12 enhancementaiarchitecture
neo-opus-ada
neo-opus-ada commented on May 21, 2026, 9:42 AM

Context

A recurring substrate gap surfaced across three Phase 4 / cloud-KB features: tenant KB chunks carry no ingest timestamp. A stamped chunk's metadata (VectorService.resolveTenantStampapplyTenantStamp) is {tenantId, repoSlug, visibility, tenantConfigVersion, originAgentIdentity, ...parsed-chunk-v1 fields} — there is no ingestedAt / createdAt. parsed-chunk-v1.schema.json has no timestamp field either.

Three features have now hit this gap:

  • #11640 (Phase 4B reconciliation daemon) — wanted a wall-clock orphan-grace window; had to use a tenantConfigVersion version-gap instead because no per-chunk timestamp exists.
  • #11711 (reconciliation V1.x) — manifest-grace reasoning inherits the same gap.
  • #11641 (Phase 4C GC daemon) — blocked: time-based + count-based retention policies ("retain last 90 days" / "retain last N chunks") have no substrate without a per-chunk timestamp; version-based-only retention would merely duplicate #11640.

Three independent consumers wanting the same one-field stamp is a clear substrate insufficiency.

The Problem

Retention, age-based GC, and time-windowed reconciliation all need to answer "when was this chunk ingested?" Today that question is unanswerable from chunk metadata. Each consuming feature has had to work around it (version-gap proxies) or defer scope. The fix is a single additive field, stamped server-side at embed time.

The Architectural Reality

  • ai/services/knowledge-base/VectorService.mjsresolveTenantStamp(tenantContext) builds the server-derived tenant stamp; applyTenantStamp(chunk, stamp) spreads it onto the chunk ({...chunk, ...stamp, hash, id}). The stamp is the natural home for ingestedAt.
  • TENANT_GUARDED_FIELDS (VectorService.mjs:13) — the server-derived fields validated against client spoofing. ingestedAt is purely server-derived → belongs in this list.
  • createTenantAwareChunkId hashes {tenantId, repoSlug, hash, type, name, source} — it does not include the stamp's volatile fields, so adding ingestedAt does not perturb chunk IDs (a same-content re-push resolves to the same ID).

The Fix

Add ingestedAt (epoch ms, Date.now() at stamp time) to resolveTenantStamp's output and to TENANT_GUARDED_FIELDS. It then flows through applyTenantStamp onto every chunk's Chroma metadata, queryable by downstream retention / GC / reconciliation consumers.

Semantics — resolved (PR #11713 Cycle-1, @neo-gpt review): ingestedAt marks when a chunk row is actually embedded / upserted. embed()'s zero-change fast path dedupes already-known content-hash IDs, so a same-content re-push does not re-upsert and the chunk keeps its prior ingestedAt; a content change yields a new content-hash chunk row with its own fresh ingestedAt. This embed-time anchor (≈ "first-embedded-at for this content") is the correct retention semantics — re-pushing identical content must not reset its age.

Contract Ledger

Target Surface Source of Authority Proposed Behavior Fallback / Edge Case Docs Evidence
VectorService.resolveTenantStamp → chunk metadata ingestedAt this ticket; the #11641 / #11640 / #11711 consumers resolveTenantStamp adds ingestedAt: Date.now() (epoch ms); applyTenantStamp spreads it onto every embedded chunk's metadata. It marks the actual embed/upsert — an unchanged re-push (zero-change fast path) retains the prior value. A chunk embedded before this ticket has no ingestedAt — consumers MUST treat a missing ingestedAt as "unknown age" and fail-safe (never expire / action a chunk with no timestamp), mirroring #11640's missing-tenantConfigVersion skip. Yes — JSDoc Unit: resolveTenantStamp includes a finite ingestedAt; applyTenantStamp propagates it; a same-content re-push retains it
TENANT_GUARDED_FIELDS #11631 write-side tenant stamping ingestedAt is added — a client-supplied ingestedAt is rejected / overwritten by the server value (it is server-derived, never client-authored). Per the existing spoofRejectionMode policy. Yes — JSDoc Unit: a client-supplied ingestedAt is overwritten with the server value

Acceptance Criteria

  • resolveTenantStamp stamps ingestedAt (epoch ms) on the tenant stamp.
  • ingestedAt propagates onto embedded chunk metadata via applyTenantStamp.
  • ingestedAt is in TENANT_GUARDED_FIELDS (server-derived; client-spoof-rejected).
  • Re-push semantics documented — ingestedAt is stamped at the actual embed / upsert; an unchanged same-content re-push (the zero-change fast path) retains the prior value.
  • Unit tests: stamp inclusion, propagation, guarded-field spoof rejection, same-content-re-push retention.

Out of Scope

  • Consuming ingestedAt for retention / GC — that is #11641 (which this ticket unblocks).
  • Backfilling ingestedAt onto chunks embedded before this ticket — consumers fail-safe on a missing value; a backfill (if ever needed) is a separate concern.

Related

  • #11641 — Phase 4C GC daemon (blocked by this ticket — time/count-based retention needs ingestedAt).
  • #11640 / #11711 — reconciliation; both worked around the absent timestamp via a version-gap.
  • #11631 — write-side tenant stamping (the TENANT_GUARDED_FIELDS precedent).
  • #11628 — Phase 4 epic (this is a Phase-2-substrate enabler for it).

Origin Session ID

470c38e7-1ffc-4851-867d-d30c1b6fbdb2

Handoff Retrieval Hints

  • The gap was surfaced during #11641 Phase 4C intake (the substrate V-B-A sweep) — see the #11641 intake-update comment.
  • query_raw_memories: "KB chunk ingestedAt stamp retention prerequisite"
tobiu referenced in commit d03179a - "feat(ai): stamp ingestedAt on tenant KB chunks (#11712) (#11713) on May 21, 2026, 11:33 AM
tobiu closed this issue on May 21, 2026, 11:33 AM
tobiu referenced in commit 82ea006 - "feat(ai): KB garbage-collection daemon — Phase 4C (#11641) (#11715) on May 21, 2026, 1:19 PM