fork-intelligence
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFork Intelligence
Fork情报系统
Systematic methodology for discovering valuable work in GitHub fork ecosystems. Stars-only filtering misses 60-100% of substantive forks — this skill uses branch-level divergence analysis, upstream PR cross-referencing, and domain-specific heuristics to find what matters.
Validated empirically across 10 repositories spanning Python, Rust, TypeScript, C++/Python, and Node.js (tensortrade, backtesting.py, kokoro, pymoo, firecrawl, barter-rs, pueue, dukascopy-node, ArcticDB, flowsurface).
用于在GitHub fork生态中发掘有价值工作的系统性方法论。仅通过星标筛选会漏掉60-100%的有实际价值的fork——本方法使用分支级差异分析、上游PR交叉引用和领域特定启发式规则来找出真正有价值的内容。
已在横跨Python、Rust、TypeScript、C++/Python、Node.js的10个仓库中得到实证验证(tensortrade、backtesting.py、kokoro、pymoo、firecrawl、barter-rs、pueue、dukascopy-node、ArcticDB、flowsurface)。
FIRST — TodoWrite Task Templates
第一步:待办任务模板
MANDATORY: Select and load the appropriate template before any fork analysis.
强制要求:在进行任何fork分析前,请先选择并加载对应的模板。
Template A — Full Analysis (new repository)
模板A——全量分析(新仓库)
1. Get upstream baseline (stars, forks, default branch, last push)
2. List all forks with pagination, note timestamp clusters
3. Filter to unique-timestamp forks (skip bulk mirrors)
4. Check default branch divergence (ahead_by/behind_by)
5. Check non-default branches for all forks with recent push or >1 branch
6. Evaluate commit content, author emails, tags/releases
7. Cross-reference upstream PR history from fork owners
8. Tier ranking and cross-fork convergence analysis
9. Produce report with actionable recommendations1. 获取上游基线(星标数、fork数、默认分支、最后推送时间)
2. 分页列出所有fork,记录时间戳集群
3. 筛选出时间戳唯一的fork(跳过批量镜像)
4. 检查默认分支差异(领先/落后提交数)
5. 检查所有近期有推送或分支数>1的fork的非默认分支
6. 评估提交内容、作者邮箱、标签/发布版本
7. 交叉引用fork所有者的上游PR历史
8. 等级排序与跨fork趋同分析
9. 生成包含可落地建议的报告Template B — Quick Scan (triage only)
模板B——快速扫描(仅分类筛选)
1. Get upstream baseline
2. List forks, filter by timestamp clustering
3. Check default branch divergence only
4. Report forks with ahead_by > 01. 获取上游基线
2. 列出fork,按时间戳集群筛选
3. 仅检查默认分支差异
4. 报告领先提交数>0的forkTemplate C — Targeted Fork Evaluation (specific fork)
模板C——定向fork评估(特定fork)
1. Compare fork vs upstream on all branches
2. Examine commit messages and changed files
3. Check for tags/releases, open issues, PRs
4. Assess cherry-pick viability1. 对比fork与上游所有分支的差异
2. 检查提交信息和修改的文件
3. 检查标签/发布版本、开放issue、PR
4. 评估代码 cherry-pick 可行性Signal Priority Order
信号优先级排序
Ranked by empirical reliability across 10 repositories. See signal-priority.md for details.
| Rank | Signal | Reliability | What It Catches |
|---|---|---|---|
| 1 | Branch-level divergence | Highest | Work on feature branches (50%+ of substantive forks) |
| 2 | Upstream PR cross-reference | High | Rebased/force-pushed work invisible to compare API |
| 3 | Tags/releases on fork | High | Independent maintenance intent |
| 4 | Commit email domains | High | Institutional contributors ( |
| 5 | Timestamp clustering | Medium | Eliminates 85%+ mirror noise |
| 6 | Cross-fork convergence | Medium | Reveals unmet upstream demand |
| 7 | Stars | Lowest | Often anti-correlated with actual value |
按在10个仓库中的实证可靠性排序。详情请查看signal-priority.md。
| 排名 | 信号 | 可靠性 | 可识别内容 |
|---|---|---|---|
| 1 | 分支级差异 | 最高 | 特性分支上的工作(占有价值fork的50%以上) |
| 2 | 上游PR交叉引用 | 高 | compare API无法识别的变基/强制推送工作 |
| 3 | fork上的标签/发布版本 | 高 | 独立维护的意图 |
| 4 | 提交邮箱域名 | 高 | 机构贡献者( |
| 5 | 时间戳集群 | 中等 | 消除85%以上的镜像噪音 |
| 6 | 跨fork趋同 | 中等 | 揭示未被上游满足的需求 |
| 7 | 星标 | 最低 | 通常与实际价值负相关 |
Pipeline — 7 Steps
处理流程——7步
Step 1: Upstream Baseline
步骤1:上游基线
bash
UPSTREAM="OWNER/REPO"
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, default_branch, stargazers_count}'bash
UPSTREAM="OWNER/REPO"
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, default_branch, stargazers_count}'Step 2: List All Forks + Timestamp Clustering
步骤2:列出所有fork + 时间戳聚类
bash
undefinedbash
undefinedList all forks with activity signals
列出所有携带活跃信号的fork
gh api "repos/$UPSTREAM/forks" --paginate
--jq '.[] | {full_name, pushed_at, stargazers_count, default_branch}'
--jq '.[] | {full_name, pushed_at, stargazers_count, default_branch}'
**Timestamp clustering**: Forks sharing exact `pushed_at` with upstream are bulk mirrors created by GitHub's fork mechanism and never touched. Group by `pushed_at` — forks with unique timestamps warrant investigation. This alone eliminates 85%+ of noise.
```bashgh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count, default_branch}'
**时间戳聚类**:与上游`pushed_at`完全相同的fork是GitHub fork机制创建的从未被修改过的批量镜像。按`pushed_at`分组——时间戳唯一的fork值得进一步排查。仅这一步就能消除85%以上的噪音。
```bashFilter to unique-timestamp forks (skip bulk mirrors)
筛选出时间戳唯一的fork(跳过批量镜像)
gh api "repos/$UPSTREAM/forks" --paginate
--jq '.[] | {full_name, pushed_at, stargazers_count}' |
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten'
--jq '.[] | {full_name, pushed_at, stargazers_count}' |
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten'
undefinedgh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count}' | \
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten'
undefinedStep 3: Default Branch Divergence
步骤3:默认分支差异
bash
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')bash
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')For each candidate fork
对每个候选fork执行
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:$BRANCH"
--jq '{ahead_by, behind_by, status}'
--jq '{ahead_by, behind_by, status}'
The `status` field meanings:
- `identical` — pure mirror, skip
- `behind` — stale mirror, skip
- `diverged` — has original commits AND is behind (interesting)
- `ahead` — has original commits, up-to-date with upstream (rare, most valuable)
**Important**: Always compare from the upstream repo's perspective (`repos/UPSTREAM/compare/...`). The reverse direction (`repos/FORK/compare/...`) returns 404 for some repositories.gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:$BRANCH" \
--jq '{ahead_by, behind_by, status}'
`status`字段含义:
- `identical` — 纯镜像,跳过
- `behind` — 过时镜像,跳过
- `diverged` — 存在原始提交且落后于上游(值得关注)
- `ahead` — 存在原始提交,与上游保持同步(罕见,价值最高)
**重要提示**:始终从上游仓库的视角进行对比(`repos/UPSTREAM/compare/...`)。反向对比(`repos/FORK/compare/...`)在部分仓库会返回404。Step 4: Non-Default Branch Analysis (CRITICAL)
步骤4:非默认分支分析(关键步骤)
This is the single biggest methodology improvement. Across all 10 repos tested, 50%+ of the most valuable fork work lived exclusively on feature branches.
Examples:
- flowsurface/aviu16: 7,000-line GPU shader heatmap only on
shader-heatmap - ArcticDB/DerThorsten: 147 commits across ,
conda_build,clangapple_changes - pueue/FrancescElies: Duration display only on
cesc/duration - barter-rs: 6 of 12 top forks had work only on feature branches
bash
undefined这是方法最大的改进点。在所有测试的10个仓库中,50%以上最有价值的fork工作仅存在于特性分支上。
示例:
- flowsurface/aviu16:7000行GPU着色器热力图仅存在于分支
shader-heatmap - ArcticDB/DerThorsten:、
conda_build、clang分支共147次提交apple_changes - pueue/FrancescElies:时长显示功能仅存在于分支
cesc/duration - barter-rs:12个顶级fork中有6个的工作仅存在于特性分支
bash
undefinedList branches on a fork
列出fork上的分支
gh api "repos/FORK_OWNER/REPO/branches" --jq '.[].name' | head -20
gh api "repos/FORK_OWNER/REPO/branches" --jq '.[].name' | head -20
Check divergence on a specific branch
检查特定分支的差异
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:FEATURE_BRANCH"
--jq '{ahead_by, behind_by, status}'
--jq '{ahead_by, behind_by, status}'
**Heuristics for which forks need branch checks**:
- Any fork with `pushed_at` more recent than upstream but `ahead_by == 0` on default branch
- Any fork with more than 1 branch
- Branch count > 10 is suspicious — likely non-trivial work (ArcticDB: Rohan-flutterint had 197 branches)gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:FEATURE_BRANCH" \
--jq '{ahead_by, behind_by, status}'
**需要检查分支的fork启发式规则**:
- 任何`pushed_at`比上游新但默认分支`ahead_by == 0`的fork
- 任何分支数>1的fork
- 分支数>10属于可疑情况——大概率存在有价值的工作(ArcticDB的Rohan-flutterint有197个分支)Step 5: Commit Content Evaluation
步骤5:提交内容评估
bash
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:BRANCH" \
--jq '.commits[] | {sha: .sha[:8], message: .commit.message | split("\n")[0], date: .commit.committer.date[:10], author: .commit.author.email}'What to look for:
- Commit email domains reveal institutional contributors (,
@man.com)@quantstack.net - Subtract merge commits from ahead_by count (e.g., akeda2/pueue showed 35 ahead but 28 were upstream merges)
- Build system changes (,
CMakeLists.txt,Cargo.toml) indicate platform enablementpyproject.toml - Protobuf schema changes indicate architectural-level features
- Test files alongside source changes signal production-intent work
bash
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:BRANCH" \\
--jq '.commits[] | {sha: .sha[:8], message: .commit.message | split("\
")[0], date: .commit.committer.date[:10], author: .commit.author.email}'需要关注的点:
- 提交邮箱域名可识别机构贡献者(、
@man.com)@quantstack.net - 从领先提交数中减去合并提交(例如akeda2/pueue显示领先35次提交,但其中28次是上游合并)
- 构建系统变更(、
CMakeLists.txt、Cargo.toml)表明平台适配工作pyproject.toml - Protobuf schema变更表明架构级功能更新
- 源代码修改附带测试文件说明是生产级工作
Step 6: Fork-Specific Signals
步骤6:fork专属信号
bash
undefinedbash
undefinedTags/releases (strongest independent maintenance signal)
标签/发布版本(最强的独立维护信号)
gh api "repos/FORK_OWNER/REPO/tags" --jq '.[].name' | head -10
gh api "repos/FORK_OWNER/REPO/releases" --jq '.[] | {tag_name, name, published_at}' | head -5
gh api "repos/FORK_OWNER/REPO/tags" --jq '.[].name' | head -10
gh api "repos/FORK_OWNER/REPO/releases" --jq '.[] | {tag_name, name, published_at}' | head -5
Open issues on the fork (signals independent project maintenance)
fork上的开放issue(表明独立项目维护)
gh api "repos/FORK_OWNER/REPO/issues?state=open" --jq 'length'
gh api "repos/FORK_OWNER/REPO/issues?state=open" --jq 'length'
Check if repo was renamed (strong divergence intent signal)
检查仓库是否被重命名(强差异意图信号)
gh api "repos/FORK_OWNER/REPO" --jq '.name'
| Signal | Strength | Example |
| ------------------------- | ------------------------- | --------------------------------------- |
| Tags/releases on fork | Highest | pueue/freesrz93 had 6 releases |
| Open PRs against upstream | High | Formal proposals with review context |
| Open issues on the fork | High | Independent project maintenance |
| Repo renamed | Medium | flowsurface/sinaha81 became volume_flow |
| Build config changes | High (compiled languages) | Cargo.toml, CMakeLists.txt diff |
| Description changed | Weak | Many vanity renames with no code |gh api "repos/FORK_OWNER/REPO" --jq '.name'
| 信号 | 强度 | 示例 |
| ------------------------- | ------------------------- | --------------------------------------- |
| fork上的标签/发布版本 | 最高 | pueue/freesrz93有6个发布版本 |
| 向上游提交的开放PR | 高 | 带评审上下文的正式提案 |
| fork上的开放issue | 高 | 独立项目维护 |
| 仓库重命名 | 中等 | flowsurface/sinaha81改名为volume_flow |
| 构建配置变更 | 高(编译型语言) | Cargo.toml、CMakeLists.txt差异 |
| 描述变更 | 弱 | 大量无代码变更的 vanity 重命名 |Step 7: Cross-Fork Convergence + Upstream PR History
步骤7:跨fork趋同 + 上游PR历史
bash
undefinedbash
undefinedCheck upstream PRs from fork owners
检查fork所有者提交的上游PR
gh api "repos/$UPSTREAM/pulls?state=all" --paginate
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
**Cross-fork convergence**: When multiple forks independently solve the same problem, it signals unmet upstream demand:
- firecrawl: 3 forks adopted Patchright for anti-detection
- flowsurface: 3 forks added technical indicators independently
- kokoro: 2 independent batched inference implementations
- barter-rs: 4 forks added Bybit support
**Upstream PR cross-reference catches**:
- Rebased/force-pushed work invisible to compare API
- Work that was merged upstream (fork shows 0 ahead but was historically significant)
- Declined PRs with valuable code that the fork still maintains
---gh api "repos/$UPSTREAM/pulls?state=all" --paginate \
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
**跨fork趋同**:当多个fork独立解决同一个问题时,表明存在未被上游满足的需求:
- firecrawl:3个fork采用Patchright实现反检测
- flowsurface:3个fork独立添加了技术指标
- kokoro:2个独立的批量推理实现
- barter-rs:4个fork添加了Bybit支持
**上游PR交叉引用可识别**:
- compare API无法识别的变基/强制推送工作
- 已合并到上游的工作(fork显示领先0次提交但历史上有重要价值)
- 被拒绝的PR中包含fork仍在维护的有价值代码
---Tier Classification
等级分类
After running the pipeline, classify forks into tiers:
| Tier | Criteria | Action |
|---|---|---|
| Tier 1: Major Extensions | New features, architectural changes, >10 original commits | Deep evaluation, cherry-pick candidates |
| Tier 2: Targeted Features | Focused additions, bug fixes, 2-10 commits | Cherry-pick individual commits |
| Tier 3: Infrastructure | CI/CD, packaging, deployment, docs | Evaluate if relevant to your setup |
| Tier 4: Historical | Merged upstream or stale but once significant | Note for context, no action needed |
运行完流程后,将fork分为以下等级:
| 等级 | 判定标准 | 处理建议 |
|---|---|---|
| 一级:重大扩展 | 新功能、架构变更、>10次原始提交 | 深度评估,可作为cherry-pick候选 |
| 二级:定向功能 | 聚焦的功能新增、bug修复、2-10次提交 | 单独cherry-pick对应提交 |
| 三级:基础设施 | CI/CD、打包、部署、文档 | 评估是否与你的配置相关 |
| 四级:历史价值 | 已合并到上游或过时但曾经有重要意义 | 记录作为上下文,无需操作 |
Domain-Specific Patterns
领域特定模式
Different codebases exhibit different fork behaviors. See domain-patterns.md for full details.
| Domain | Key Pattern | Example |
|---|---|---|
| Scientific/ML | Researchers fork-implement-publish-vanish, zero social engagement | pymoo: 300-file fork with 0 stars |
| Trading/Finance | Exchange connectors dominate; best forks are private | barter-rs: 4 independent Bybit impls |
| Infrastructure/DevTools | Self-hosting/SaaS-removal is the dominant theme | firecrawl: devflowinc/firecrawl-simple (630 stars) |
| C++/Python Mixed | Feature work lives on branches; email domains reveal institutions | ArcticDB: @man.com, @quantstack.net |
| Node.js Libraries | Check npm publication as separate packages | dukascopy-node: kyo06 published |
| Rust CLI | Cargo.toml diff is reliable quick filter; "superset" forks add subcommands | pueue: freesrz93 added 7 subcommands |
不同代码库表现出不同的fork行为。完整详情请查看domain-patterns.md。
| 领域 | 核心模式 | 示例 |
|---|---|---|
| 科学/机器学习 | 研究者fork-实现-发布-消失,零社交参与 | pymoo:300个文件的fork,0个星标 |
| 交易/金融 | 交易所连接器占主导;最好的fork都是私有的 | barter-rs:4个独立的Bybit实现 |
| 基础设施/开发工具 | 自托管/移除SaaS是主流主题 | firecrawl:devflowinc/firecrawl-simple(630星标) |
| C++/Python混合 | 功能工作存在于分支上;邮箱域名可识别机构 | ArcticDB:@man.com、@quantstack.net |
| Node.js库 | 检查是否作为独立包发布到npm | dukascopy-node:kyo06发布了 |
| Rust CLI | Cargo.toml差异是可靠的快速筛选条件;"超集"fork会添加子命令 | pueue:freesrz93添加了7个子命令 |
Quick-Scan Pipeline (5-minute triage)
快速扫描流程(5分钟分类)
For rapid triage of any new repo:
bash
UPSTREAM="OWNER/REPO"
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')用于快速筛选任意新仓库:
bash
UPSTREAM="OWNER/REPO"
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')1. Baseline
1. 基线
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, stargazers_count}'
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, stargazers_count}'
2. Forks with unique timestamps (skip mirrors)
2. 时间戳唯一的fork(跳过镜像)
gh api "repos/$UPSTREAM/forks" --paginate
--jq '.[] | {full_name, pushed_at, stargazers_count}' |
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten | sort_by(.pushed_at) | reverse'
--jq '.[] | {full_name, pushed_at, stargazers_count}' |
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten | sort_by(.pushed_at) | reverse'
gh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count}' | \
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten | sort_by(.pushed_at) | reverse'
3. Check ahead_by for each candidate
3. 检查每个候选fork的领先提交数
(loop over candidates from step 2)
(遍历步骤2得到的候选列表)
4. Check upstream PRs from fork authors
4. 检查fork作者提交的上游PR
gh api "repos/$UPSTREAM/pulls?state=all" --paginate
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
---gh api "repos/$UPSTREAM/pulls?state=all" --paginate \
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
---Known Limitations
已知限制
| Limitation | Impact | Workaround |
|---|---|---|
| GitHub compare API 250-commit limit | Highly divergent forks may truncate | Use |
| Private forks invisible | Trading firms keep best work private | Accepted limitation |
| Force-pushed branches break compare API | Shows 0 ahead despite significant work | Cross-reference upstream PR history |
| Renamed forks may break API calls | Old URLs may 404 | Use |
| Rate limiting on large fork ecosystems | >1000 forks = many API calls | Use timestamp clustering to reduce calls by 85%+ |
| Maintainer dev forks look like independent work | Branch names 1:1 with upstream PRs | Cross-reference branch names against upstream PR branch names |
| 限制 | 影响 | 解决方案 |
|---|---|---|
| GitHub compare API 250次提交限制 | 差异极大的fork可能会截断结果 | 使用 |
| 私有fork不可见 | 交易公司将最好的工作设为私有 | 接受该限制 |
| 强制推送的分支会破坏compare API | 尽管有大量工作仍显示领先0次提交 | 交叉引用上游PR历史 |
| 重命名的fork可能导致API调用失败 | 旧URL可能返回404 | 使用 |
| 大型fork生态的速率限制 | >1000个fork会产生大量API调用 | 使用时间戳聚类减少85%以上的调用 |
| 维护者的开发fork看起来像独立工作 | 分支名与上游PR一一对应 | 将分支名与上游PR分支名交叉比对 |
Report Template
报告模板
Use this structure for the final analysis report:
markdown
undefined最终分析报告使用以下结构:
markdown
undefinedFork Analysis Report: OWNER/REPO
Fork分析报告:OWNER/REPO
Repository: OWNER/REPO (N stars, M forks)
Analysis date: YYYY-MM-DD
仓库:OWNER/REPO(N星标,M个fork)
分析日期:YYYY-MM-DD
Fork Landscape Summary
Fork生态概览
| Metric | Value |
|---|---|
| Total forks | N |
| Pure mirrors | N (X%) |
| Divergent forks (ahead on any branch) | N |
| Substantive forks (meaningful work) | N |
| Stars-only miss rate | X% |
| 指标 | 数值 |
|---|---|
| 总fork数 | N |
| 纯镜像 | N (X%) |
| 差异fork(任意分支领先) | N |
| 有价值fork(有实际意义的工作) | N |
| 仅星标筛选遗漏率 | X% |
Tiered Ranking
等级排序
Tier 1: Major Extensions
一级:重大扩展
(fork details with ahead_by, key features, files changed)
(fork详情:领先提交数、核心功能、修改文件)
Tier 2: Targeted Features
二级:定向功能
...
...
Tier 3: Infrastructure/Packaging
三级:基础设施/打包
...
...
Cross-Fork Convergence Patterns
跨fork趋同模式
(themes that multiple forks independently implemented)
(多个fork独立实现的共性主题)
Actionable Recommendations
可落地建议
- Cherry-pick candidates
- Feature inspiration
- Security fixes
---- Cherry-pick候选
- 功能灵感
- 安全修复
---Post-Change Checklist
修改后检查清单
After modifying THIS skill:
- YAML frontmatter valid (no colons in description)
- Trigger keywords current in description
- All links resolve
./references/ - Pipeline steps numbered consistently
- Shell commands tested against a real repository
- Append changes to evolution-log.md
修改本方法后请检查:
- YAML frontmatter有效(描述中无冒号)
- 描述中的触发关键词是最新的
- 所有链接可访问
./references/ - 流程步骤编号一致
- Shell命令已在真实仓库测试过
- 将修改追加到evolution-log.md