Frontmatter
| id | 9224 |
| title | Implement maxUsers cap for DevIndex GitHub Spider |
| state | Closed |
| labels | enhancementai |
| assignees | tobiu |
| createdAt | Feb 21, 2026, 2:50 PM |
| updatedAt | Feb 21, 2026, 3:36 PM |
| githubUrl | https://github.com/neomjs/neo/issues/9224 |
| author | tobiu |
| commentsCount | 1 |
| parentIssue | null |
| subIssues | [] |
| subIssuesCompleted | 0 |
| subIssuesTotal | 0 |
| blockedBy | [] |
| blocking | [] |
| closedAt | Feb 21, 2026, 3:04 PM |
Implement maxUsers cap for DevIndex GitHub Spider
tobiu assigned to @tobiu on Feb 21, 2026, 3:04 PM

tobiu
Feb 21, 2026, 3:04 PM
Input from Gemini 3 Pro:
✦ I have implemented the
maxUserscap for the DevIndex GitHub spider and updater as requested.Key Changes:
- Added
maxUsers: 50000to the DevIndex Github configuration inapps/devindex/services/config.mjsand configured a path forthreshold.json.- Modified
Storage.mjsto automatically extract the total contributions of the lowest user when theusers.jsonlfile hits themaxUsersceiling, write this value tothreshold.json, and slice the array to the top 50,000 entries. It also performs cleanup oftracker.jsonandfailed.jsonfor pruned users.- Updated
Updater.mjsto read fromthreshold.jsonvia a new helpergetLowestContributionThreshold(). This allows the Updater to evaluate candidates against a dynamically raising bar (Math.max(config.github.minTotalContributions, threshold)) without incurring the I/O cost of reading the massive 20MBusers.jsonldatabase directly.The task is now complete, and the changes have been pushed. Closing this issue.
tobiu closed this issue on Feb 21, 2026, 3:04 PM
The DevIndex user database (
users.jsonl) has grown to ~20MB (44k users). We need to implement a user cap (maxUsers) to prevent unbounded growth.The strategy is to maintain a maximum number of users by pruning those with the lowest total contributions when new, higher-contributing users are discovered.
Tasks:
maxUserstoapps/devindex/services/config.mjs(e.g., default to 50,000).Spider.mjs): The Spider currently adds any valid discovered user to the Tracker (tracker.json). If we are at themaxUserscap, the Spider should only add new candidates if they are likely to displace a bottom-tier user. However, since the Spider doesn't fetch full stats, it might need to add them anyway, leaving the evaluation to the Updater. Alternative: The Spider might need a lightweight check, or we just let the Updater handle all pruning.Updater.mjs): The Updater is where the actual evaluation happens. After fetching a user's stats, if the user meets theminTotalContributionsthreshold AND the total tracked users exceedmaxUsers, the Updater must:tc) against the lowesttcin the currentusers.jsonl.Storage.mjs): Ensure the sorting and pruning logic is efficient given the file size.Note: The exact division of labor between Spider (discovery) and Updater (evaluation) needs to be finalized during implementation.