Systematic methodology for discovering valuable work in GitHub fork ecosystems. Stars-only filtering misses 60-100% of substantive forks — this skill uses branch-level divergence analysis, upstream PR cross-referencing, and domain-specific heuristics to find what matters.
MANDATORY: Select and load the appropriate template before any fork analysis.
Template A — Full Analysis (new repository)
1. Get upstream baseline (stars, forks, default branch, last push)
2. List all forks with pagination, note timestamp clusters
3. Filter to unique-timestamp forks (skip bulk mirrors)
4. Check default branch divergence (ahead_by/behind_by)
5. Check non-default branches for all forks with recent push or >1 branch
6. Evaluate commit content, author emails, tags/releases
7. Cross-reference upstream PR history from fork owners
8. Tier ranking and cross-fork convergence analysis
9. Produce report with actionable recommendations
Template B — Quick Scan (triage only)
1. Get upstream baseline
2. List forks, filter by timestamp clustering
3. Check default branch divergence only
4. Report forks with ahead_by > 0
Template C — Targeted Fork Evaluation (specific fork)
1. Compare fork vs upstream on all branches
2. Examine commit messages and changed files
3. Check for tags/releases, open issues, PRs
4. Assess cherry-pick viability
Signal Priority Order
Ranked by empirical reliability across 10 repositories. See signal-priority.md for details.
Rank
Signal
Reliability
What It Catches
1
Branch-level divergence
Highest
Work on feature branches (50%+ of substantive forks)
2
Upstream PR cross-reference
High
Rebased/force-pushed work invisible to compare API
3
Tags/releases on fork
High
Independent maintenance intent
4
Commit email domains
High
Institutional contributors (
@company.com
)
5
Timestamp clustering
Medium
Eliminates 85%+ mirror noise
6
Cross-fork convergence
Medium
Reveals unmet upstream demand
7
Stars
Lowest
Often anti-correlated with actual value
Pipeline — 7 Steps
Step 1: Upstream Baseline
bash
UPSTREAM="OWNER/REPO"gh api "repos/$UPSTREAM"--jq'{forks_count, pushed_at, default_branch, stargazers_count}'
Step 2: List All Forks + Timestamp Clustering
bash
# List all forks with activity signalsgh api "repos/$UPSTREAM/forks"--paginate\--jq'.[] | {full_name, pushed_at, stargazers_count, default_branch}'
Timestamp clustering: Forks sharing exact
pushed_at
with upstream are bulk mirrors created by GitHub's fork mechanism and never touched. Group by
pushed_at
— forks with unique timestamps warrant investigation. This alone eliminates 85%+ of noise.
BRANCH=$(gh api "repos/$UPSTREAM"--jq'.default_branch')# For each candidate forkgh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:$BRANCH"\--jq'{ahead_by, behind_by, status}'
The
status
field meanings:
identical
— pure mirror, skip
behind
— stale mirror, skip
diverged
— has original commits AND is behind (interesting)
ahead
— has original commits, up-to-date with upstream (rare, most valuable)
Important: Always compare from the upstream repo's perspective (
repos/UPSTREAM/compare/...
). The reverse direction (
repos/FORK/compare/...
) returns 404 for some repositories.
Step 4: Non-Default Branch Analysis (CRITICAL)
This is the single biggest methodology improvement. Across all 10 repos tested, 50%+ of the most valuable fork work lived exclusively on feature branches.
Examples:
flowsurface/aviu16: 7,000-line GPU shader heatmap only on
shader-heatmap
ArcticDB/DerThorsten: 147 commits across
conda_build
,
clang
,
apple_changes
pueue/FrancescElies: Duration display only on
cesc/duration
barter-rs: 6 of 12 top forks had work only on feature branches
bash
# List branches on a forkgh api "repos/FORK_OWNER/REPO/branches"--jq'.[].name'|head-20# Check divergence on a specific branchgh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:FEATURE_BRANCH"\--jq'{ahead_by, behind_by, status}'
Heuristics for which forks need branch checks:
Any fork with
pushed_at
more recent than upstream but
ahead_by == 0
on default branch
Any fork with more than 1 branch
Branch count > 10 is suspicious — likely non-trivial work (ArcticDB: Rohan-flutterint had 197 branches)
Subtract merge commits from ahead_by count (e.g., akeda2/pueue showed 35 ahead but 28 were upstream merges)
Build system changes (
CMakeLists.txt
,
Cargo.toml
,
pyproject.toml
) indicate platform enablement
Protobuf schema changes indicate architectural-level features
Test files alongside source changes signal production-intent work
Step 6: Fork-Specific Signals
bash
# Tags/releases (strongest independent maintenance signal)gh api "repos/FORK_OWNER/REPO/tags"--jq'.[].name'|head-10gh api "repos/FORK_OWNER/REPO/releases"--jq'.[] | {tag_name, name, published_at}'|head-5# Open issues on the fork (signals independent project maintenance)gh api "repos/FORK_OWNER/REPO/issues?state=open"--jq'length'# Check if repo was renamed (strong divergence intent signal)gh api "repos/FORK_OWNER/REPO"--jq'.name'
Signal
Strength
Example
Tags/releases on fork
Highest
pueue/freesrz93 had 6 releases
Open PRs against upstream
High
Formal proposals with review context
Open issues on the fork
High
Independent project maintenance
Repo renamed
Medium
flowsurface/sinaha81 became volume_flow
Build config changes
High (compiled languages)
Cargo.toml, CMakeLists.txt diff
Description changed
Weak
Many vanity renames with no code
Step 7: Cross-Fork Convergence + Upstream PR History
bash
# Check upstream PRs from fork ownersgh api "repos/$UPSTREAM/pulls?state=all"--paginate\--jq'.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
Cross-fork convergence: When multiple forks independently solve the same problem, it signals unmet upstream demand:
firecrawl: 3 forks adopted Patchright for anti-detection