github-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GitHub Research Skill

GitHub研究技能

Trigger

触发条件

Activate this skill when the user wants to:
  • "Find repos for [topic]", "GitHub research on [topic]"
  • "Analyze open-source code for [topic]"
  • "Find implementations of [paper/technique]"
  • "Which repos implement [algorithm]?"
  • Uses
    /github-research <deep-research-output-dir>
    slash command
当用户有以下需求时激活此技能:
  • "为[主题]查找仓库"、"针对[主题]的GitHub研究"
  • "分析[主题]的开源代码"
  • "查找[论文/技术]的实现方案"
  • "哪些仓库实现了[算法]?"
  • 使用
    /github-research <deep-research-output-dir>
    斜杠命令

Overview

概述

This skill systematically discovers, evaluates, and deeply analyzes GitHub repositories related to a research topic. It reads deep-research output (paper database, phase reports, code references) and produces an actionable integration blueprint for reusing open-source code.
Installation:
~/.claude/skills/github-research/
— scripts, references, and this skill definition. Output:
./github-research-output/{slug}/
relative to the current working directory. Input: A deep-research output directory (containing
paper_db.jsonl
, phase reports,
code_repos.md
, etc.)
本技能可系统地发现、评估并深度分析与研究主题相关的GitHub仓库。它会读取deep-research输出内容(论文数据库、阶段报告、代码参考),并生成可直接复用开源代码的实用集成蓝图。
安装路径
~/.claude/skills/github-research/
—— 包含脚本、参考资料和本技能定义。 输出路径:相对于当前工作目录的
./github-research-output/{slug}/
输入:deep-research输出目录(包含
paper_db.jsonl
、阶段报告、
code_repos.md
等文件)

6-Phase Pipeline

六阶段流程

Phase 1: Intake     → Extract refs, URLs, keywords from deep-research output
Phase 2: Discovery  → Multi-source broad GitHub search (50-200 repos)
Phase 3: Filtering  → Score & rank → select top 15-30 repos
Phase 4: Deep Dive  → Clone & deeply analyze top 8-15 repos (code reading)
Phase 5: Analysis   → Per-repo reports + cross-repo comparison
Phase 6: Blueprint  → Integration/reuse plan for research topic
Phase 1: 数据采集 → 从deep-research输出中提取参考链接、URL和关键词
Phase 2: 仓库发现 → 多源广泛搜索GitHub仓库(50-200个)
Phase 3: 筛选排序 → 评分并排序 → 选出Top 15-30个仓库
Phase 4: 深度调研 → 克隆并深度分析Top 8-15个仓库(阅读代码)
Phase 5: 综合分析 → 单仓报告 + 跨仓对比
Phase 6: 集成蓝图 → 生成研究主题的代码复用集成计划

Output Directory Structure

输出目录结构

github-research-output/{slug}/
├── repo_db.jsonl                     # Master repo database
├── phase1_intake/
│   ├── extracted_refs.jsonl          # URLs, keywords, paper-repo links
│   └── intake_summary.md
├── phase2_discovery/
│   ├── search_results/               # Raw JSONL from each search
│   └── discovery_log.md
├── phase3_filtering/
│   ├── ranked_repos.jsonl            # Scored & ranked subset
│   └── filtering_report.md
├── phase4_deep_dive/
│   ├── repos/                        # Cloned repos (shallow)
│   ├── analyses/                     # Per-repo analysis .md files
│   └── deep_dive_summary.md
├── phase5_analysis/
│   ├── comparison_matrix.md          # Cross-repo comparison
│   ├── technique_map.md              # Paper concept → code mapping
│   └── analysis_report.md
└── phase6_blueprint/
    ├── integration_plan.md           # How to combine repos
    ├── reuse_catalog.md              # Reusable components catalog
    ├── final_report.md               # Complete compiled report
    └── blueprint_summary.md
github-research-output/{slug}/
├── repo_db.jsonl                     # 主仓库数据库
├── phase1_intake/
│   ├── extracted_refs.jsonl          # URL、关键词、论文-仓库关联
│   └── intake_summary.md
├── phase2_discovery/
│   ├── search_results/               # 各搜索来源的原始JSONL数据
│   └── discovery_log.md
├── phase3_filtering/
│   ├── ranked_repos.jsonl            # 已评分排序的仓库子集
│   └── filtering_report.md
├── phase4_deep_dive/
│   ├── repos/                        # 浅克隆的仓库
│   ├── analyses/                     # 单仓分析的.md文件
│   └── deep_dive_summary.md
├── phase5_analysis/
│   ├── comparison_matrix.md          # 跨仓对比表
│   ├── technique_map.md              # 论文概念→代码映射表
│   └── analysis_report.md
└── phase6_blueprint/
    ├── integration_plan.md           # 仓库整合方案
    ├── reuse_catalog.md              # 可复用组件目录
    ├── final_report.md               # 完整的编译报告
    └── blueprint_summary.md

Scripts Reference

脚本参考

All scripts are Python 3, stdlib-only, located in
~/.claude/skills/github-research/scripts/
.
ScriptPurposeKey Flags
extract_research_refs.py
Parse deep-research output for GitHub URLs, paper refs, keywords
--research-dir
,
--output
search_github.py
Search GitHub repos via
gh api
--query
,
--language
,
--min-stars
,
--sort
,
--max-results
,
--topic
,
--output
search_github_code.py
Search GitHub code for implementations
--query
,
--language
,
--filename
,
--max-results
,
--output
search_paperswithcode.py
Search Papers With Code for paper→repo mappings
--paper-title
,
--arxiv-id
,
--query
,
--output
repo_db.py
JSONL repo database managementsubcommands:
merge
,
filter
,
score
,
search
,
tag
,
stats
,
export
,
rank
repo_metadata.py
Fetch detailed metadata via
gh api
--repos
,
--input
,
--output
,
--delay
clone_repo.py
Shallow-clone repos for analysis
--repo
,
--output-dir
,
--depth
,
--branch
analyze_repo_structure.py
Map file tree, key files, LOC stats
--repo-dir
,
--output
extract_dependencies.py
Extract and parse dependency files
--repo-dir
,
--output
find_implementations.py
Search cloned repo for specific code patterns
--repo-dir
,
--patterns
,
--output
repo_readme_fetch.py
Fetch README without cloning
--repos
,
--input
,
--output
,
--max-chars
compare_repos.py
Generate comparison matrix across repos
--input
,
--output
compile_github_report.py
Assemble final report from all phases
--topic-dir

所有脚本均为Python 3编写,仅依赖标准库,位于
~/.claude/skills/github-research/scripts/
目录。
脚本用途关键参数
extract_research_refs.py
解析deep-research输出,提取GitHub URL、论文参考和关键词
--research-dir
,
--output
search_github.py
通过
gh api
搜索GitHub仓库
--query
,
--language
,
--min-stars
,
--sort
,
--max-results
,
--topic
,
--output
search_github_code.py
搜索GitHub代码以查找实现方案
--query
,
--language
,
--filename
,
--max-results
,
--output
search_paperswithcode.py
搜索Papers With Code获取论文→仓库的映射
--paper-title
,
--arxiv-id
,
--query
,
--output
repo_db.py
JSONL仓库数据库管理子命令:
merge
,
filter
,
score
,
search
,
tag
,
stats
,
export
,
rank
repo_metadata.py
通过
gh api
获取详细元数据
--repos
,
--input
,
--output
,
--delay
clone_repo.py
浅克隆仓库用于分析
--repo
,
--output-dir
,
--depth
,
--branch
analyze_repo_structure.py
映射文件树、关键文件、LOC统计
--repo-dir
,
--output
extract_dependencies.py
提取并解析依赖文件
--repo-dir
,
--output
find_implementations.py
在克隆仓库中搜索特定代码模式
--repo-dir
,
--patterns
,
--output
repo_readme_fetch.py
无需克隆即可获取README
--repos
,
--input
,
--output
,
--max-chars
compare_repos.py
生成跨仓对比矩阵
--input
,
--output
compile_github_report.py
整合所有阶段内容生成最终报告
--topic-dir

Phase 1: Intake

Phase 1: 数据采集

Goal: Extract all relevant references, URLs, and keywords from the deep-research output.
目标:从deep-research输出中提取所有相关参考链接、URL和关键词。

Steps

步骤

  1. Create output directory structure:
    bash
    SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
    mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint}
  2. Extract references from deep-research output:
    bash
    python ~/.claude/skills/github-research/scripts/extract_research_refs.py \
      --research-dir <deep-research-output-dir> \
      --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl
  3. Review extracted refs: Read the generated JSONL. Note:
    • GitHub URLs found directly in reports
    • Paper titles and arxiv IDs (for Papers With Code lookup)
    • Research keywords and themes (for GitHub search queries)
  4. Write intake summary: Create
    phase1_intake/intake_summary.md
    with:
    • Number of direct GitHub URLs found
    • Number of papers with potential code links
    • Key research themes extracted
    • Planned search queries for Phase 2
  1. 创建输出目录结构
    bash
    SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
    mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint}
  2. 从deep-research输出中提取参考信息
    bash
    python ~/.claude/skills/github-research/scripts/extract_research_refs.py \
      --research-dir <deep-research-output-dir> \
      --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl
  3. 检查提取的参考信息:读取生成的JSONL文件,注意:
    • 报告中直接找到的GitHub URL
    • 可能有代码链接的论文标题和arxiv ID(用于Papers With Code查询)
    • 研究关键词和主题(用于GitHub搜索查询)
  4. 撰写采集总结:创建
    phase1_intake/intake_summary.md
    ,包含:
    • 直接找到的GitHub URL数量
    • 可能有代码链接的论文数量
    • 提取的关键研究主题
    • Phase 2计划使用的搜索查询语句

Checkpoint

检查点

  • extracted_refs.jsonl
    exists with entries
  • intake_summary.md
    written
  • Search strategy documented

  • 存在包含有效条目的
    extracted_refs.jsonl
  • 已撰写
    intake_summary.md
  • 已记录搜索策略

Phase 2: Discovery

Phase 2: 仓库发现

Goal: Cast a wide net to find 50-200 candidate repos from multiple sources.
目标:广泛搜索,从多源获取50-200个候选仓库。

Steps

步骤

  1. Search by direct URLs: Any GitHub URLs from Phase 1 → fetch metadata:
    bash
    python ~/.claude/skills/github-research/scripts/repo_metadata.py \
      --repos owner1/name1 owner2/name2 ... \
      --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl
  2. Search Papers With Code: For each paper with an arxiv ID:
    bash
    python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \
      --arxiv-id 2401.12345 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl
  3. Search GitHub by keywords (3-8 queries based on research themes):
    bash
    python ~/.claude/skills/github-research/scripts/search_github.py \
      --query "multi-agent LLM coordination" \
      --min-stars 10 --sort stars --max-results 50 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl
  4. Search GitHub code (for specific implementations):
    bash
    python ~/.claude/skills/github-research/scripts/search_github_code.py \
      --query "class MultiAgentOrchestrator" \
      --language python --max-results 30 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl
  5. Fetch READMEs for repos that lack descriptions:
    bash
    python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \
      --input <repos.jsonl> \
      --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl
  6. Merge all results into master database:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py merge \
      --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
  7. Write discovery log: Create
    phase2_discovery/discovery_log.md
    with search queries used, results per source, total unique repos found.
  1. 通过直接URL搜索:将Phase 1中获取的GitHub URL提取元数据:
    bash
    python ~/.claude/skills/github-research/scripts/repo_metadata.py \
      --repos owner1/name1 owner2/name2 ... \
      --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl
  2. 搜索Papers With Code:对每个带有arxiv ID的论文执行:
    bash
    python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \
      --arxiv-id 2401.12345 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl
  3. 按关键词搜索GitHub(基于研究主题生成3-8个查询语句):
    bash
    python ~/.claude/skills/github-research/scripts/search_github.py \
      --query "multi-agent LLM coordination" \
      --min-stars 10 --sort stars --max-results 50 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl
  4. 搜索GitHub代码(查找特定实现):
    bash
    python ~/.claude/skills/github-research/scripts/search_github_code.py \
      --query "class MultiAgentOrchestrator" \
      --language python --max-results 30 \
      --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl
  5. 为缺少描述的仓库获取README
    bash
    python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \
      --input <repos.jsonl> \
      --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl
  6. 合并所有结果到主数据库
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py merge \
      --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
  7. 撰写发现日志:创建
    phase2_discovery/discovery_log.md
    ,包含使用的搜索查询、各来源的结果数量、找到的唯一仓库总数。

Rate Limits

速率限制

  • GitHub search API: 30 requests/minute (authenticated)
  • Papers With Code API: No strict limit but be respectful (1 req/sec)
  • Add
    --delay 1.0
    to batch operations when needed
  • GitHub搜索API:30请求/分钟(已认证)
  • Papers With Code API:无严格限制,但请保持克制(1请求/秒)
  • 批量操作时可添加
    --delay 1.0
    参数

Checkpoint

检查点

  • repo_db.jsonl
    populated with 50-200 repos
  • discovery_log.md
    with search details

  • repo_db.jsonl
    已填充50-200个仓库
  • 已撰写包含搜索详情的
    discovery_log.md

Phase 3: Filtering

Phase 3: 筛选排序

Goal: Score and rank repos, select top 15-30 for deeper analysis.
目标:对仓库进行评分和排序,选出Top 15-30个用于深度分析。

Steps

步骤

  1. Enrich metadata for all repos:
    bash
    python ~/.claude/skills/github-research/scripts/repo_metadata.py \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl \
      --delay 0.5
  2. Score repos (quality + activity scores):
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py score \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
  3. LLM relevance scoring: Read through the top ~50 repos (by quality_score) and assign
    relevance_score
    (0.0-1.0) based on:
    • Direct relevance to research topic
    • Implementation completeness
    • Code quality signals (from README, description)
    • Update the relevance scores:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py tag \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --ids owner/name --tags "relevance:0.85"
  4. Compute composite scores and rank:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py score \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
    python ~/.claude/skills/github-research/scripts/repo_db.py rank \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --by composite_score
  5. Select top repos: Filter to top 15-30:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py filter \
      --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --max-repos 30 --not-archived
  6. Write filtering report: Create
    phase3_filtering/filtering_report.md
    :
    • Stats before/after filtering
    • Score distributions
    • Top 30 repos with scores and rationale
  1. 为所有仓库丰富元数据
    bash
    python ~/.claude/skills/github-research/scripts/repo_metadata.py \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl \
      --delay 0.5
  2. 为仓库评分(质量 + 活跃度评分):
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py score \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
  3. LLM相关性评分:查看评分Top 50左右的仓库,根据以下维度分配
    relevance_score
    (0.0-1.0):
    • 与研究主题的直接相关性
    • 实现的完整性
    • 代码质量信号(来自README、描述)
    • 更新相关性评分:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py tag \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --ids owner/name --tags "relevance:0.85"
  4. 计算综合评分并排序
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py score \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/repo_db.jsonl
    python ~/.claude/skills/github-research/scripts/repo_db.py rank \
      --input github-research-output/$SLUG/repo_db.jsonl \
      --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --by composite_score
  5. 选择Top仓库:筛选出Top 15-30个仓库:
    bash
    python ~/.claude/skills/github-research/scripts/repo_db.py filter \
      --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
      --max-repos 30 --not-archived
  6. 撰写筛选报告:创建
    phase3_filtering/filtering_report.md
    • 筛选前后的统计数据
    • 评分分布情况
    • Top 30仓库的评分和入选理由

Scoring Formula

评分公式

activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score  = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25
activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score  = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25

Checkpoint

检查点

  • ranked_repos.jsonl
    with 15-30 repos
  • filtering_report.md
    with scoring details

  • ranked_repos.jsonl
    包含15-30个仓库
  • 已撰写包含评分详情的
    filtering_report.md

Phase 4: Deep Dive

Phase 4: 深度调研

Goal: Clone and deeply analyze the top 8-15 repos.
目标:克隆并深度分析Top 8-15个仓库。

Steps

步骤

  1. Select repos for deep dive: Take top 8-15 from ranked list.
  2. Clone each repo (shallow):
    bash
    python ~/.claude/skills/github-research/scripts/clone_repo.py \
      --repo owner/name \
      --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/
  3. Analyze structure for each cloned repo:
    bash
    python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json
  4. Extract dependencies:
    bash
    python ~/.claude/skills/github-research/scripts/extract_dependencies.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json
  5. Find implementations: Search for key algorithms/concepts from research:
    bash
    python ~/.claude/skills/github-research/scripts/find_implementations.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --patterns "class Transformer" "def forward" "attention" \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl
  6. Deep code reading: For each repo, READ the key source files identified by structure analysis. Write a per-repo analysis in
    phase4_deep_dive/analyses/{name}_analysis.md
    :
    • Architecture overview
    • Key algorithms implemented
    • Code quality assessment
    • API / interface design
    • Dependencies and requirements
    • Strengths and limitations
    • Reusability assessment (how easy to extract components)
  7. Write deep dive summary:
    phase4_deep_dive/deep_dive_summary.md
  1. 选择深度调研的仓库:从排序列表中选取Top 8-15个。
  2. 浅克隆每个仓库
    bash
    python ~/.claude/skills/github-research/scripts/clone_repo.py \
      --repo owner/name \
      --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/
  3. 分析每个克隆仓库的结构
    bash
    python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json
  4. 提取依赖信息
    bash
    python ~/.claude/skills/github-research/scripts/extract_dependencies.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json
  5. 查找实现方案:在克隆仓库中搜索研究中的关键算法/概念:
    bash
    python ~/.claude/skills/github-research/scripts/find_implementations.py \
      --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
      --patterns "class Transformer" "def forward" "attention" \
      --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl
  6. 深度阅读代码:对每个仓库,阅读结构分析中识别出的主要源码文件。在
    phase4_deep_dive/analyses/{name}_analysis.md
    中撰写单仓分析:
    • 架构概述
    • 实现的关键算法
    • 代码质量评估
    • API/接口设计
    • 依赖和要求
    • 优势和局限性
    • 可复用性评估(提取组件的难易程度)
  7. 撰写深度调研总结
    phase4_deep_dive/deep_dive_summary.md

IMPORTANT: Actually Read Code

重要提示:务必阅读实际代码

Do NOT just summarize READMEs. You must:
  • Read the main source files (entry points, core modules)
  • Understand the actual implementation approach
  • Identify specific functions/classes that implement research concepts
  • Note code patterns, design decisions, and trade-offs
不要仅依赖README进行总结。你必须:
  • 阅读主要源码文件(入口点、核心模块)
  • 理解实际的实现思路
  • 识别出实现研究概念的特定函数/类
  • 记录代码模式、设计决策和权衡

Checkpoint

检查点

  • Repos cloned in
    repos/
  • Per-repo analysis files in
    analyses/
  • deep_dive_summary.md
    written

  • 仓库已克隆到
    repos/
    目录
  • 各仓库分析文件已在
    analyses/
    目录中
  • 已撰写
    deep_dive_summary.md

Phase 5: Analysis

Phase 5: 综合分析

Goal: Cross-repo comparison and technique-to-code mapping.
目标:进行跨仓对比和技术-代码映射。

Steps

步骤

  1. Generate comparison matrix:
    bash
    python ~/.claude/skills/github-research/scripts/compare_repos.py \
      --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \
      --output github-research-output/$SLUG/phase5_analysis/comparison.json
  2. Write comparison matrix: Create
    phase5_analysis/comparison_matrix.md
    :
    • Table comparing repos across dimensions (language, LOC, stars, framework, license, tests)
    • Dependency overlap analysis
    • Strengths/weaknesses per repo
  3. Write technique map: Create
    phase5_analysis/technique_map.md
    :
    • Map each paper concept / research technique → specific repo + file + function
    • Identify gaps (techniques with no implementation found)
    • Note alternative implementations of the same concept
  4. Write analysis report:
    phase5_analysis/analysis_report.md
    :
    • Executive summary of findings
    • Key insights from code analysis
    • Recommendations for which repos to use for which purposes
  1. 生成对比矩阵
    bash
    python ~/.claude/skills/github-research/scripts/compare_repos.py \
      --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \
      --output github-research-output/$SLUG/phase5_analysis/comparison.json
  2. 撰写对比矩阵文档:创建
    phase5_analysis/comparison_matrix.md
    • 跨维度对比仓库的表格(语言、LOC、星标、框架、许可证、测试)
    • 依赖重叠分析
    • 各仓库的优势/劣势
  3. 撰写技术映射文档:创建
    phase5_analysis/technique_map.md
    • 将每个论文概念/研究技术映射到特定仓库 + 文件 + 函数
    • 识别空白点(未找到实现的技术)
    • 记录同一概念的替代实现方案
  4. 撰写分析报告
    phase5_analysis/analysis_report.md
    • 研究发现的执行摘要
    • 代码分析的关键见解
    • 针对不同用途的仓库推荐

Checkpoint

检查点

  • comparison_matrix.md
    with repo comparison table
  • technique_map.md
    mapping concepts to code
  • analysis_report.md
    with findings

  • 包含仓库对比表格的
    comparison_matrix.md
  • 包含概念到代码映射的
    technique_map.md
  • 包含研究发现的
    analysis_report.md

Phase 6: Blueprint

Phase 6: 集成蓝图

Goal: Produce an actionable integration and reuse plan.
目标:生成可直接执行的集成和复用计划。

Steps

步骤

  1. Write integration plan:
    phase6_blueprint/integration_plan.md
    :
    • Recommended architecture for combining repos
    • Step-by-step integration approach
    • Dependency resolution strategy
    • Potential conflicts and how to resolve them
  2. Write reuse catalog:
    phase6_blueprint/reuse_catalog.md
    :
    • For each reusable component: source repo, file path, function/class, what it does, how to extract it
    • License compatibility matrix
    • Effort estimates (easy/medium/hard to integrate)
  3. Compile final report:
    bash
    python ~/.claude/skills/github-research/scripts/compile_github_report.py \
      --topic-dir github-research-output/$SLUG/
  4. Write blueprint summary:
    phase6_blueprint/blueprint_summary.md
    :
    • One-page executive summary
    • Top 5 repos and why
    • Recommended next steps
  1. 撰写集成计划
    phase6_blueprint/integration_plan.md
    • 推荐的仓库整合架构
    • 分步集成方案
    • 依赖解决策略
    • 潜在冲突及解决方法
  2. 撰写复用目录
    phase6_blueprint/reuse_catalog.md
    • 每个可复用组件:来源仓库、文件路径、函数/类、功能描述、提取方法
    • 许可证兼容性矩阵
    • 集成难度预估(易/中/难)
  3. 编译最终报告
    bash
    python ~/.claude/skills/github-research/scripts/compile_github_report.py \
      --topic-dir github-research-output/$SLUG/
  4. 撰写蓝图摘要
    phase6_blueprint/blueprint_summary.md
    • 一页纸的执行摘要
    • Top 5仓库及入选理由
    • 推荐的下一步行动

Checkpoint

检查点

  • integration_plan.md
    complete
  • reuse_catalog.md
    with component catalog
  • final_report.md
    compiled
  • blueprint_summary.md
    as executive summary

  • integration_plan.md
    已完成
  • reuse_catalog.md
    包含组件目录
  • final_report.md
    已编译完成
  • blueprint_summary.md
    已作为执行摘要

Quality Conventions

质量规范

  1. Repos are ranked by composite score:
    relevance × 0.4 + quality × 0.35 + activity × 0.25
  2. Deep dive requires reading actual code, not just READMEs
  3. Integration blueprint must map paper concepts → specific code files/functions
  4. Incremental saves: Each phase writes to disk immediately
  5. Checkpoint recovery: Can resume from any phase by checking what outputs exist
  6. All scripts are stdlib-only Python — no pip installs needed
  7. gh
    CLI is required
    for GitHub API access (must be authenticated)
  8. Deduplication by
    repo_id
    (owner/name) across all searches
  9. Rate limit awareness: Respect GitHub search API limits (30 req/min)
  1. 仓库按综合评分排序
    相关性 × 0.4 + 质量 × 0.35 + 活跃度 × 0.25
  2. 深度调研必须阅读实际代码,而非仅依赖README
  3. 集成蓝图必须将论文概念映射到具体代码文件/函数
  4. 增量保存:每个阶段的结果立即写入磁盘
  5. 检查点恢复:可通过检查已存在的输出从任意阶段恢复
  6. 所有脚本均为仅依赖标准库的Python代码 —— 无需pip安装
  7. 需要
    gh
    CLI
    用于GitHub API访问(必须已认证)
  8. 去重:通过
    repo_id
    (owner/name)在所有搜索中去重
  9. 速率限制意识:遵守GitHub搜索API的限制(30请求/分钟)

Error Handling

错误处理

  • If
    gh
    is not installed: warn user and provide installation instructions
  • If a repo is archived/deleted: skip gracefully, note in log
  • If clone fails: skip, note in log, continue with remaining repos
  • If Papers With Code API is down: skip, rely on GitHub search only
  • Always write partial progress to disk so work is not lost
  • 若未安装
    gh
    :提醒用户并提供安装说明
  • 若仓库已归档/删除:优雅跳过,在日志中记录
  • 若克隆失败:跳过,在日志中记录,继续处理剩余仓库
  • 若Papers With Code API不可用:跳过,仅依赖GitHub搜索
  • 始终将部分进度写入磁盘,避免工作成果丢失

References

参考资料

  • See
    references/phase-guide.md
    for detailed phase execution guidance
  • Deep-research skill:
    ~/.claude/skills/deep-research/SKILL.md
  • Paper database pattern:
    ~/.claude/skills/deep-research/scripts/paper_db.py
  • 详细的阶段执行指南见
    references/phase-guide.md
  • Deep-research技能:
    ~/.claude/skills/deep-research/SKILL.md
  • 论文数据库模式:
    ~/.claude/skills/deep-research/scripts/paper_db.py