LearnNewsExamplesServices
Frontmatter
id11516
titleRepair DevIndex users/tracker reconciliation and rename recovery
stateClosed
labels
bugaiarchitecture
assigneesneo-gpt
createdAtMay 17, 2026, 3:34 AM
updatedAtMay 17, 2026, 9:40 PM
githubUrlhttps://github.com/neomjs/neo/issues/11516
authorneo-gpt
commentsCount1
parentIssue10117
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtMay 17, 2026, 9:40 PM

Repair DevIndex users/tracker reconciliation and rename recovery

Closedbugaiarchitecture
neo-gpt
neo-gpt commented on May 17, 2026, 3:34 AM

Context

Follow-up from #10117 intake/investigation on 2026-05-17. The original #10117 snapshot reported 354 users.jsonl records absent from tracker.json on 2026-04-19. Fresh V-B-A on current dev data shows the invariant is still broken and has worsened.

Current counts from apps/devindex/resources/data/:

Metric Current value
users.jsonl rich-user records 50,000
tracker.json entries 49,387
Rich-user records absent from tracker 614
Tracker entries absent from rich users 1

Recent bot data-sync history confirms this is not self-healing:

bc8c7f25d 2026-05-15T23:26Z users=50000 tracker=49394 orphans=606 trackerOnly=0
8383fae69 2026-05-16T04:08Z users=50000 tracker=49393 orphans=607 trackerOnly=0
907240408 2026-05-16T22:18Z users=50000 tracker=49387 orphans=614 trackerOnly=1

The Problem

tracker.json is the Updater scheduling index. A rich users.jsonl record absent from tracker.json is invisible to refresh scheduling, so the profile can remain stale forever. The drift is currently growing slowly across hourly data-sync commits.

The investigation also found a concrete rename / username-reuse hazard:

  • Rich record sample: 0xBigBoss has stored i: 95193764, tc: 17244, lu: 2026-05-08T15:18:19.485Z.
  • Live REST GET /users/0xBigBoss now returns a different account, id: 283503690, created 2026-05-11.
  • Live REST GET /user/95193764 resolves the stored user ID to alleneubank.

Therefore a naive cleanup that simply re-adds all rich-user logins into tracker.json would requeue stale logins and may overwrite an old identity with a new account that reused the login. The repair must be identity-aware.

A second V-B-A catch: apps/devindex/services/GitHub.mjs#getLoginByDatabaseId() currently uses GraphQL user(databaseId: ...), but GitHub GraphQL rejects that argument:

Field 'user' is missing required arguments: login
Field 'user' doesn't accept argument 'databaseId'

That means the archived rename-recovery intent from #9137 is not actually reliable today.

The Architectural Reality

Relevant current surfaces:

  • apps/devindex/services/Storage.mjs#updateUsers() writes rich records and prunes over-cap users, but does not add tracker entries for inserted rich records. It expects callers to own tracker updates.
  • apps/devindex/services/Updater.mjs#saveCheckpoint() currently persists in this order: Storage.updateUsers(results), Storage.updateTracker(indexUpdates), Storage.deleteUsers(prunedLogins), then failed-list updates. Interruptions between these operations can leave one store updated without the other.
  • apps/devindex/services/Updater.mjs rename recovery depends on GitHub.getLoginByDatabaseId() to map old stored user IDs to current logins.
  • apps/devindex/services/GitHub.mjs#getLoginByDatabaseId() uses a GraphQL signature that is invalid against current GitHub GraphQL.
  • apps/devindex/services/Cleanup.mjs only resurrects allowlisted users into tracker. It does not reconcile non-allowlisted rich-user records missing from tracker, and it must not do so blindly because of login reuse.
  • .github/workflows/data-sync-pipeline.yml runs devindex:spider and devindex:update; the data-sync bot commits both users.jsonl and tracker.json hourly.

The Fix

Implement a bounded, identity-aware repair and prevention path:

  1. Replace or fix GitHub.getLoginByDatabaseId() with a current GitHub API-compatible resolver for integer database IDs. If no direct GraphQL lookup exists, use a safe REST fallback (GET /user/{id}) or equivalent verified endpoint.
  2. Update rename recovery to use the fixed resolver and to avoid leaving stale old-login rich records behind when a user has renamed or when a login has been reused by a different account.
  3. Add a reconciliation step or script that audits rich-user records missing from tracker and repairs them safely:
    • If the stored user ID resolves to the same login, restore the tracker entry with the rich record's lu timestamp or a deliberate recheck timestamp.
    • If the stored user ID resolves to a different current login, treat it as a rename: migrate/queue the current login and remove or supersede the stale old-login rich record.
    • If the stored user ID no longer resolves, route to failed/penalty handling or prune according to existing data-minimization rules.
    • Never requeue a stale login when live login ownership differs from the stored user ID.
  4. Add focused unit coverage for the resolver and reconciliation behavior. Add an offline invariant assertion for users.jsonl vs tracker.json after repair, or document why it cannot be enforced in CI until the committed data is repaired.
  5. After code is fixed, perform the committed data repair in a dedicated data-sync-safe commit or document the operator/data-pipeline step required to perform it.

Contract Ledger Matrix

Target Surface Source of Authority Proposed Behavior Fallback Docs Evidence
GitHub.getLoginByDatabaseId() Current GitHub API behavior + archived rename-recovery intent #9137 Resolves stored integer user IDs to the current login using a verified supported API path Returns null only when the user cannot be resolved; transient API errors still throw learn/guides/devindex/data-factory/GitHubAPI.md Unit test or isolated API mock proving resolver handles success, missing user, and transient error
Updater rename recovery #9137 + current #10117 drift evidence Old login is removed only after current-login replacement is safely persisted or queued; username reuse does not overwrite the wrong identity Failed/penalty path preserves retriable state without deleting valid rich history learn/guides/devindex/data-factory/Updater.md Unit test for rename and username-reuse scenario (0xBigBoss class)
Rich-store / tracker invariant #10117 + current data counts Rich users that should be refreshable have a tracker entry; tracker-only entries are resolved or pruned according to existing rules Ambiguous identity records are quarantined/failed instead of blindly requeued learn/guides/devindex/data-factory/DataHygiene.md Offline count assertion shows no unsafe rich-user orphan drift after repair
Data-sync pipeline .github/workflows/data-sync-pipeline.yml Hourly bot runs do not reintroduce the invariant drift If GitHub API resolver unavailable, data-sync fails loudly or records bounded failed state Workflow logs + data files Post-merge data-sync observation confirms orphan count does not grow

Acceptance Criteria

  • GitHub.getLoginByDatabaseId() no longer uses invalid user(databaseId: ...) GraphQL and is covered by tests.
  • Rename recovery distinguishes same-login, renamed-login, missing-user, and username-reuse cases.
  • A reconciliation/repair path handles the existing rich-user orphan set without blindly requeuing stale logins.
  • The 0xBigBoss / alleneubank class is represented in test fixtures or documented as a verified manual sample.
  • A post-fix audit reports current users.jsonl, tracker.json, rich-user orphan count, tracker-only count, and sample classifications.
  • #10117 receives a closing investigation comment linking this ticket and recording whether #10117 should be closed after the repair path lands.

Out of Scope

  • Changing the maxUsers cap or contribution threshold.
  • Rewriting DevIndex storage formats.
  • Blind one-time JSON surgery that repairs counts without fixing the pipeline path.
  • Broad spider/updater performance tuning unrelated to the invariant.

Avoided Traps

  • Blind resurrection of every rich-user login into tracker. Rejected because username reuse can map the same login to a different GitHub account than the stored user ID.
  • Treating #10117 as stale because it was filed in April. Rejected: current data is worse, and recent bot commits show active drift.
  • Only repairing committed data. Rejected because hourly data-sync would continue recreating drift if the resolver/checkpoint/reconciliation path remains broken.

Related

  • Parent investigation: #10117
  • Historical rename-recovery intent: #9137
  • Safe purge / fallen-hero context: #9135
  • Current data-sync commits sampled: bc8c7f25d, 8383fae69, 907240408

Origin Session ID: c934160e-e886-455a-b41e-4bb2dd1f2732

Handoff Retrieval Hints:

  • DevIndex users.jsonl tracker.json orphan invariant 614
  • 0xBigBoss alleneubank username reuse database id 95193764
  • GitHub getLoginByDatabaseId user(databaseId) invalid GraphQL
  • Updater saveCheckpoint Storage.updateUsers updateTracker deleteUsers order
tobiu referenced in commit 5468a10 - "feat(ai): document withHeavyMaintenanceLease release-timing + reference-impl + spec (#11515) (#11518) on May 17, 2026, 8:15 AM
tobiu referenced in commit ce7ccf3 - "fix(devindex): repair database-id login resolver (#11516) (#11517) on May 17, 2026, 8:18 AM
tobiu referenced in commit d3f79dc - "fix(devindex): protect rename recovery before replacement fetch (#11516) (#11522) on May 17, 2026, 8:19 AM
tobiu referenced in commit 75114c8 - "fix(devindex): reconcile rich-user tracker orphans (#11516) (#11545) on May 17, 2026, 9:40 PM
tobiu closed this issue on May 17, 2026, 9:40 PM