LearnNewsExamplesServices
Frontmatter
id9269
titleCreate DevIndex Guide: Data Scientists & Researchers
stateClosed
labels
documentationenhancementai
assigneestobiu
createdAtFeb 23, 2026, 6:13 PM
updatedAtFeb 23, 2026, 7:05 PM
githubUrlhttps://github.com/neomjs/neo/issues/9269
authortobiu
commentsCount2
parentIssue9267
subIssues[]
subIssuesCompleted0
subIssuesTotal0
blockedBy[]
blocking[]
closedAtFeb 23, 2026, 7:05 PM

Create DevIndex Guide: Data Scientists & Researchers

Closed v12.0.0 documentationenhancementai
tobiu
tobiu commented on Feb 23, 2026, 6:13 PM

Objective: Create a guide for researchers on how to leverage the DevIndex dataset (users.jsonl). Key Points:

  • Explain the rich data structure (the multi-year contribution matrices).
  • Detail how the heuristic badges are calculated mathematically.
  • Highlight specific research use cases: studying the impact of AI on commit ratios (e.g., changes post-2022) or analyzing the sustainability and "bus factor" of the open-source ecosystem.
  • Emphasize that this is a free, high-quality, open-data goldmine.
tobiu added the documentation label on Feb 23, 2026, 6:13 PM
tobiu added the enhancement label on Feb 23, 2026, 6:13 PM
tobiu added the ai label on Feb 23, 2026, 6:13 PM
tobiu added parent issue #9267 on Feb 23, 2026, 6:14 PM
tobiu assigned to @tobiu on Feb 23, 2026, 6:14 PM
tobiu
tobiu Feb 23, 2026, 6:19 PM

Input from Gemini 3.1 Pro:

Note for future agent session: Before starting work on this guide, you MUST read the following existing documentation as critical context:

  • learn/guides/devindex/Backend.md (To understand the minified JSONL keys and streaming format)
  • learn/guides/devindex/data-factory/DataEnrichment.md (To understand the mathematical tier system for Location Normalization and heuristic formulas)
tobiu referenced in commit 4c34a89 - "docs: Create DevIndex Persona Guide for Data Scientists (#9269)" on Feb 23, 2026, 7:05 PM
tobiu
tobiu Feb 23, 2026, 7:05 PM

Input from Gemini 3.1 Pro:

✦ I've completed and committed the Persona Guide for Data Scientists & Researchers.

The guide covers:

  • How to access the users.jsonl static dataset and decode the minified keys (tc, fy, y, cy, py).
  • The value of the Multi-Year Contribution Matrices for longitudinal studies.
  • Detailed explanations of the "Cyborg Metrics" (hm.v, hm.a, hm.c) for algorithmic anomaly detection and segmentation (e.g., separating bots from humans).
  • Specific research use case examples, including studying Generative AI's impact on commit ratios, the "Bus Factor" in open source, developer burnout trajectories, and macro-economic geographic disparities.

Closing this ticket as the guide is now merged and linked in tree.json.

tobiu closed this issue on Feb 23, 2026, 7:05 PM