LearnNewsExamplesServices
Frontmatter
id10117
titleDevIndex: 354 users.jsonl entries absent from tracker.json
stateClosed
labels
bugaiarchitecture
assigneesneo-gpt
createdAtApr 20, 2026, 2:06 AM
updatedAtMay 19, 2026, 4:38 AM
githubUrlhttps://github.com/neomjs/neo/issues/10117
authortobiu
commentsCount2
parentIssuenull
subIssues
11516 Repair DevIndex users/tracker reconciliation and rename recovery
subIssuesCompleted1
subIssuesTotal1
blockedBy[]
blocking[]
closedAtMay 19, 2026, 4:38 AM

DevIndex: 354 users.jsonl entries absent from tracker.json

Closedbugaiarchitecture
tobiu
tobiu commented on Apr 20, 2026, 2:06 AM

Context

During post-merge observation of PRs #10114 / #10115 (DevIndex rate-limit pressure reduction), counts on the live data files surfaced an invariant violation: users.jsonl has entries not present in tracker.json. The gap was flagged (initially estimated ~351) and verified empirically — 354 orphans confirmed by parsing both files.

The Updater service drains work from tracker.json as its processing queue. Entries in users.jsonl without a corresponding tracker record are invisible to the Updater — they will never be re-fetched or have their profile refreshed. This is a slow-burn data-staleness bug: 354 users frozen at their last-known state with no recovery path through the normal sync pipeline.

The Problem

Verified counts (2026-04-19 against the dev HEAD worktree copy of apps/devindex/resources/data/):

File Entries
users.jsonl 50,000 (exactly at config.github.maxUsers cap)
tracker.json 49,646 (0 pending, all active)
users.jsonl ∩ tracker.json 49,646
users in users.jsonl without tracker entry 354 — invariant violation
tracker entries without user record 0 — this direction is clean

Reproducer (offline, reads the committed files):

import json
tracker = json.load(open('apps/devindex/resources/data/tracker.json'))
tracker_keys = set(k.lower() for k in tracker.keys())

users = []
with open('apps/devindex/resources/data/users.jsonl') as f:
    for raw in f:
        line = raw.strip().rstrip(',')
        if line and line not in ('[', ']'):
            users.append(json.loads(line))

user_logins = set(u['l'].lower() for u in users)
orphans = user_logins - tracker_keys
print(f'orphans: {len(orphans)}')
<h1 class="neo-h1" data-record-id="4">Sample: 06kellyjac, a-h-mzd, aabed, aaorris, aaronabuusama, adjazzzz, adomenech73, aeroastro, ...</h1>

Samples skew early-alphabet in the sorted output (may be coincidence; not enough evidence to claim a pattern).

The Architectural Reality

Three services maintain the tracker/users invariant:

  • Storage.updateUsers (apps/devindex/services/Storage.mjs:375): on prune-when-over-cap, removes pruned logins from tracker via updateTracker([{login, delete: true}, ...]) at line 417. Crucially, when adding NEW user records, it does not call updateTracker at all — the caller is expected to have already added the tracker entry (typically by the Spider discovering the user).
  • Updater.saveCheckpoint (per learn/guides/devindex/data-factory/Updater.md): atomically synchronizes new enriched profiles to users.jsonl AND updated timestamps to tracker.json — documented as atomic, but the actual code path involves two separate fs.writeFile calls via Storage, which are individually atomic (temp + rename) but not mutually transactional.
  • Cleanup (per learn/guides/devindex/data-factory/DataHygiene.md): threshold-prunes users.jsonl AND purges from tracker.json. Runs before major ops via Orchestrator invocation.

Candidate root-cause hypotheses (investigation must verify, NOT ratify):

  1. Updater saveCheckpoint partial-flush race: Updater writes users.jsonl first, then tracker.json. If the process is interrupted between writes (CI runner timeout, rate-limit-kill, node crash), users.jsonl has the new record but tracker.json does not. Next run would not know to re-add the tracker entry for an already-in-users user.
  2. Rename recovery (Updater §2 Rename Problem): when a user renames on GitHub, old login is deleted, new login is fetched and its data is merged into users.jsonl. If the new login's tracker entry isn't also inserted, orphan.
  3. Manual / admin-script writes: any CI or maintenance script writing to users.jsonl without going through Storage.updateUsers + explicit updateTracker.
  4. Historical migration residue: an earlier data-shape migration may have left orphans that Cleanup has never caught (Cleanup's threshold-pruning only touches below-threshold users; above-threshold orphans survive indefinitely).
  5. Blocklist / opt-out partial delete: if a user is blocklisted after having a profile, Cleanup hard-deletes from both — but a partial delete (tracker succeeded, users failed) would produce tracker-without-user (not what we observe). Conversely, a failed tracker-delete after users-delete succeeded would produce user-without-tracker. Worth checking the actual ordering in Cleanup.mjs.

The learn/guides/devindex/data-factory/DataHygiene.md guide describes an Allowlist Resurrection mechanism that injects missing tracker entries for allowlisted VIPs pre-pruning. If this mechanism ran for all users (not just allowlist), the invariant would self-heal. Currently scoped to allowlist only.

Investigation Plan

Execute in order, stopping when root cause is pinned:

  1. Audit Storage.updateUsers callers. Every caller that adds a new user record — does it also call updateTracker for the add case? Specifically audit:
    • Updater.saveCheckpoint — new-record branch
    • Updater.#handleRenameRecovery — rename branch
    • Any other callers discovered by grep
  2. Check Cleanup deletion ordering. Does it delete tracker-first-then-user or user-first-then-tracker? Partial-failure of the latter ordering produces exactly our observed gap shape.
  3. Trace a sample of the 354 orphans. Spot-check 3-5 logins in the sample against GitHub to see if they:
    • Look like likely renames (account exists under a different name)
    • Are below current threshold.tc
    • Appear in any recent Updater logs (if CI logs retained)
  4. Extend Cleanup resurrection to the full users.jsonl set. If the missing-entry pattern matches a class the existing Allowlist Resurrection already handles, widening the scope is a candidate fix. Verify before implementing.
  5. Document findings. File a follow-up fix ticket with the confirmed root cause. This ticket becomes the investigation reference.

Acceptance Criteria

  • Storage.updateUsers callers audited; any new-record-without-tracker-add code path documented
  • Cleanup.mjs deletion order verified against hypothesis #5
  • Sample of 5+ orphaned logins traced against live GitHub + local state; findings recorded
  • Root cause hypothesis narrowed to 1–2 candidates with evidence
  • Follow-up fix ticket filed (OR this ticket pivoted to a fix if scope stays small)
  • If follow-up is non-trivial: consider one-time repair script + CI invariant assertion

Out of Scope

  • Implementing the fix — investigation first, fix in a separate ticket once cause is known
  • Tracker file format change — tracker.json structure is not the issue
  • config.github.maxUsers cap adjustment — orthogonal
  • Backpressure Valve tuning (config.spider.maxPendingUsers) — already working (0 pending)

Avoided Traps

  • Writing a one-time repair script without root cause: fixes symptom, risks masking an ongoing bug that will re-create orphans after repair
  • Broadening Allowlist Resurrection to all users blindly: might look like a fix but could paper over a legitimate delete path (e.g., opt-out) that should keep the user removed from tracker
  • Dismissing as "only 354 out of 50,000": users.jsonl-vs-tracker.json is a correctness invariant — the ratio is orthogonal to whether it's broken

Related

  • Surfaced during: #10113 / PR #10115 post-merge observation
  • Architectural context: learn/guides/devindex/data-factory/Storage.md, Updater.md, DataHygiene.md
  • Sibling services: apps/devindex/services/Storage.mjs, Updater.mjs, Cleanup.mjs, Spider.mjs

Origin Session ID

07f601dc-353a-44d2-a373-18da2a0d305a

tobiu added the bug label on Apr 20, 2026, 2:06 AM
tobiu added the ai label on Apr 20, 2026, 2:06 AM
tobiu added the architecture label on Apr 20, 2026, 2:06 AM
tobiu referenced in commit 5468a10 - "feat(ai): document withHeavyMaintenanceLease release-timing + reference-impl + spec (#11515) (#11518) on May 17, 2026, 8:15 AM