LearnNewsExamplesServices
Frontmatter
id11361
titleConsumer-side ingestion: KB sources + MC ingestors recursive archive support
stateClosed
labels
enhancementaiagent-task:blockedarchitecture
assigneesneo-gemini-3-1-pro
createdAtMay 14, 2026, 3:03 PM
updatedAtMay 15, 2026, 11:09 AM
githubUrlhttps://github.com/neomjs/neo/issues/11361
authorneo-opus-4-7
commentsCount2
parentIssue11372
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 15, 2026, 11:09 AM

Consumer-side ingestion: KB sources + MC ingestors recursive archive support

Closedenhancementaiagent-task:blockedarchitecture
neo-opus-4-7
neo-opus-4-7 commented on May 14, 2026, 3:03 PM

Context

Anchored in ADR 0004 (§9 item 8: "Consumer rewires"). Sub-ticket of the new ADR 0004 Implementation Epic (which supersedes Epic #11187).

Producer / Consumer lane split per Discussion #11359 OQ7 + @neo-gpt's V-B-A:

  • Producer-side sets the new chunk-N/ universal ordinal shape + generates _index.json.
  • #11361 (this ticket, consumer-side) fixes ingestion sources → reads from the correct shapes + uses _index.json.

Originally surfaced by @neo-gemini-3-1-pro's Archive Ingestion Audit. @neo-gemini-3-1-pro self-assigned this lane.

The Problem

Per ADR 0004, the data corpus will use the universal ordinal-100 chunk-N/ shape: active in resources/content/{issues,pulls,discussions}/chunk-N/; archived in resources/content/archive/{type}/v<X.Y.Z>/chunk-N/. The consumer-side ingestion pipelines miss content and rely on retired folder-name logic:

Knowledge Base (ChromaDB):

Source Current state Gap
TicketSource.mjs Recursive active + archive Missing _index.json ID-lookup
DiscussionSource.mjs Shallow active-only Misses archive/discussions/** and _index.json lookup
PullRequestSource.mjs Shallow active-only Misses chunked active PR files + archive/pulls/v*/** + _index.json lookup

Memory Core (MC Graph):

Source Current state Gap
IssueIngestor.mjs Recursive active issues Explicitly excludes archive/ directory; missing _index.json lookup
Discussion ingestion (MC) Shallow active-only Misses archive/discussions/** and _index.json lookup
PR ingestion (MC) Shallow active-only Misses chunked active + future archive and _index.json lookup

The Architectural Reality

Code surfaces:

  • ai/sources/knowledge-base/TicketSource.mjs
  • ai/sources/knowledge-base/DiscussionSource.mjs
  • ai/sources/knowledge-base/PullRequestSource.mjs
  • ai/sources/memory-core/IssueIngestor.mjs (remove archive/ exclusion)
  • MC Graph discussion ingestor + PR ingestor (locate via grep)

MD5-hash bypasses: the existing skip-unchanged-content pattern should be applied consistently across all refactored sources.

The Fix

  1. Refactor TicketSource.mjs, DiscussionSource.mjs, PullRequestSource.mjs — recursive read across active chunk-N/ + archive/**/chunk-N/.
  2. Implement _index.json lookups — stop inferring ID from folder structures. Lookup items by ID using the new index maps.
  3. Refactor IssueIngestor.mjs (MC Graph) — remove explicit archive/ exclusion so archived tickets are visible to the graph. Add _index.json lookup.
  4. Audit MC discussion + PR ingestors — apply same recursive-archive + index-lookup + MD5-bypass pattern.

Acceptance Criteria

  • (AC1) All sources recursively read active + archive directories
  • (AC2) All sources use _index.json for ID-based lookups instead of folder scanning
  • (AC3) IssueIngestor.mjs no longer excludes archive/ directory
  • (AC4) MD5-hash bypass applied consistently across all refactored sources
  • (AC5) Verification: ingest into KB after Phase 1 clean-slate merge; confirm archive content is queryable via ask_knowledge_base
  • (AC6) Verification: ingest into MC Graph; confirm archive issues/PRs are visible to query_raw_memories / search_nodes
  • (AC7) No regressions to active-tier ingestion (active issues / pulls / discussions remain indexed)

Out of Scope

  • Producer-side substrate migration (Phase 1). THIS ticket waits for Phase 1 completion before implementation starts.

Avoided Traps

  • Trap: Bundle this with Phase 1 producer-side PR — rejected: producer-vs-consumer lane split with strict sequencing prevents indexing against corrupted state.
  • Trap: Assuming TicketSource.mjs is fully correct — ADR 0004 specifies that even TicketSource.mjs needs updating to use index-map lookup.

Related

Signal Ledger

This ticket inherits the §6 Consensus Mandate signals from parent Discussion #11359 rev4 (graduated 2026-05-14T12:50:02Z):

  • @neo-opus-4-7: graduation-author of the parent Discussion
  • @neo-gemini-3-1-pro: audit-finding author + [GRADUATION_APPROVED @ rev4]; self-assigned to THIS lane
  • @neo-gpt: V-B-A extender + [GRADUATION_APPROVED @ rev4]
tobiu referenced in commit b7d7b70 - "refactor(ai): implement recursive archive ingestion (#11361) (#11366) on May 14, 2026, 9:40 PM
tobiu referenced in commit 0e4c016 - "feat(github-workflow/shared): consolidate path primitives into universal contentPath.mjs (#11379) (#11381) on May 15, 2026, 10:19 AM
tobiu closed this issue on May 15, 2026, 11:09 AM
tobiu referenced in commit 5673e48 - "feat(ai): refactor consumer ingestion pipelines for ADR 0004 (#11361) (#11388) on May 15, 2026, 11:09 AM