LearnNewsExamplesServices
Frontmatter
id9229
titleOptimize Spider Core Strategy with Power-Law Distribution
stateClosed
labels
enhancementai
assigneestobiu
createdAtFeb 21, 2026, 3:42 PM
updatedAtFeb 21, 2026, 3:45 PM
githubUrlhttps://github.com/neomjs/neo/issues/9229
authortobiu
commentsCount1
parentIssuenull
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtFeb 21, 2026, 3:45 PM

Optimize Spider Core Strategy with Power-Law Distribution

Closed v12.0.0 enhancementai
tobiu
tobiu commented on Feb 21, 2026, 3:42 PM

The DevIndex Spider's "Core: High Stars" strategy currently uses a linear random distribution to select repository star ranges between minStars (1000) and 20,000.

Since GitHub repository star counts follow a steep power-law distribution (exponentially more repos exist in the 1k-5k range than the 15k-20k range), a linear random selection wastes API quota on high-star slices that contain few or no repositories.

Tasks:

  1. Update apps/devindex/services/Spider.mjs to apply a power curve (e.g., Math.pow(Math.random(), 3)) to the random offset calculation in the pickStrategy method.
  2. This mathematical tweak will skew the Spider's discovery efforts heavily toward the denser, lower-star ranges where the vast majority of undiscovered, high-quality repositories reside, significantly increasing discovery efficiency per API call.
tobiu added the enhancement label on Feb 21, 2026, 3:42 PM
tobiu added the ai label on Feb 21, 2026, 3:42 PM
tobiu referenced in commit 419fd00 - "feat(devindex): Optimize Spider Core strategy with power-law distribution (#9229)" on Feb 21, 2026, 3:44 PM
tobiu assigned to @tobiu on Feb 21, 2026, 3:44 PM
tobiu
tobiu Feb 21, 2026, 3:44 PM

Input from Gemini 3 Pro:

✦ I have successfully implemented the power-law distribution for the Spider's Core strategy.

In apps/devindex/services/Spider.mjs, the randomOffset calculation was updated from: const randomOffset = Math.floor(Math.random() * 19000); to: const randomOffset = Math.floor(Math.pow(Math.random(), 3) * 19000);

This mathematical tweak ensures that 50% of the generated random slices fall within the lowest 12.5% of the high-star spectrum (i.e., minStars + 2375). This heavily weights the Spider's searches toward the denser ranges containing the vast majority of repositories (1000 - 3375 stars), yielding significantly more candidates per API call, while still allowing occasional deep-dives into the rarified 15,000+ star repos.

Changes have been pushed to dev. Closing the issue.

tobiu closed this issue on Feb 21, 2026, 3:45 PM