github-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GitHub Research Skill

GitHub研究技能

Trigger

触发条件

Activate this skill when the user wants to:

"Find repos for [topic]", "GitHub research on [topic]"
"Analyze open-source code for [topic]"
"Find implementations of [paper/technique]"
"Which repos implement [algorithm]?"

Uses

/github-research <deep-research-output-dir>

slash command

当用户有以下需求时激活此技能：

"为[主题]查找仓库"、"针对[主题]的GitHub研究"
"分析[主题]的开源代码"
"查找[论文/技术]的实现方案"
"哪些仓库实现了[算法]？"

使用

/github-research <deep-research-output-dir>

斜杠命令

Overview

概述

This skill systematically discovers, evaluates, and deeply analyzes GitHub repositories related to a research topic. It reads deep-research output (paper database, phase reports, code references) and produces an actionable integration blueprint for reusing open-source code.

Installation:

~/.claude/skills/github-research/

— scripts, references, and this skill definition. Output:

./github-research-output/{slug}/

relative to the current working directory. Input: A deep-research output directory (containing

paper_db.jsonl

, phase reports,

code_repos.md

, etc.)

本技能可系统地发现、评估并深度分析与研究主题相关的GitHub仓库。它会读取deep-research输出内容（论文数据库、阶段报告、代码参考），并生成可直接复用开源代码的实用集成蓝图。

安装路径：

~/.claude/skills/github-research/

—— 包含脚本、参考资料和本技能定义。 输出路径：相对于当前工作目录的

./github-research-output/{slug}/

输入：deep-research输出目录（包含

paper_db.jsonl

、阶段报告、

code_repos.md

等文件）

6-Phase Pipeline

六阶段流程

Phase 1: Intake     → Extract refs, URLs, keywords from deep-research output
Phase 2: Discovery  → Multi-source broad GitHub search (50-200 repos)
Phase 3: Filtering  → Score & rank → select top 15-30 repos
Phase 4: Deep Dive  → Clone & deeply analyze top 8-15 repos (code reading)
Phase 5: Analysis   → Per-repo reports + cross-repo comparison
Phase 6: Blueprint  → Integration/reuse plan for research topic

Phase 1: 数据采集 → 从deep-research输出中提取参考链接、URL和关键词
Phase 2: 仓库发现 → 多源广泛搜索GitHub仓库（50-200个）
Phase 3: 筛选排序 → 评分并排序 → 选出Top 15-30个仓库
Phase 4: 深度调研 → 克隆并深度分析Top 8-15个仓库（阅读代码）
Phase 5: 综合分析 → 单仓报告 + 跨仓对比
Phase 6: 集成蓝图 → 生成研究主题的代码复用集成计划

Output Directory Structure

输出目录结构

github-research-output/{slug}/
├── repo_db.jsonl                     # Master repo database
├── phase1_intake/
│   ├── extracted_refs.jsonl          # URLs, keywords, paper-repo links
│   └── intake_summary.md
├── phase2_discovery/
│   ├── search_results/               # Raw JSONL from each search
│   └── discovery_log.md
├── phase3_filtering/
│   ├── ranked_repos.jsonl            # Scored & ranked subset
│   └── filtering_report.md
├── phase4_deep_dive/
│   ├── repos/                        # Cloned repos (shallow)
│   ├── analyses/                     # Per-repo analysis .md files
│   └── deep_dive_summary.md
├── phase5_analysis/
│   ├── comparison_matrix.md          # Cross-repo comparison
│   ├── technique_map.md              # Paper concept → code mapping
│   └── analysis_report.md
└── phase6_blueprint/
    ├── integration_plan.md           # How to combine repos
    ├── reuse_catalog.md              # Reusable components catalog
    ├── final_report.md               # Complete compiled report
    └── blueprint_summary.md

github-research-output/{slug}/
├── repo_db.jsonl                     # 主仓库数据库
├── phase1_intake/
│   ├── extracted_refs.jsonl          # URL、关键词、论文-仓库关联
│   └── intake_summary.md
├── phase2_discovery/
│   ├── search_results/               # 各搜索来源的原始JSONL数据
│   └── discovery_log.md
├── phase3_filtering/
│   ├── ranked_repos.jsonl            # 已评分排序的仓库子集
│   └── filtering_report.md
├── phase4_deep_dive/
│   ├── repos/                        # 浅克隆的仓库
│   ├── analyses/                     # 单仓分析的.md文件
│   └── deep_dive_summary.md
├── phase5_analysis/
│   ├── comparison_matrix.md          # 跨仓对比表
│   ├── technique_map.md              # 论文概念→代码映射表
│   └── analysis_report.md
└── phase6_blueprint/
    ├── integration_plan.md           # 仓库整合方案
    ├── reuse_catalog.md              # 可复用组件目录
    ├── final_report.md               # 完整的编译报告
    └── blueprint_summary.md

Scripts Reference

脚本参考

All scripts are Python 3, stdlib-only, located in

~/.claude/skills/github-research/scripts/

Script	Purpose	Key Flags
`extract_research_refs.py`	Parse deep-research output for GitHub URLs, paper refs, keywords	`--research-dir` , `--output`
`search_github.py`	Search GitHub repos via `gh api`	`--query` , `--language` , `--min-stars` , `--sort` , `--max-results` , `--topic` , `--output`
`search_github_code.py`	Search GitHub code for implementations	`--query` , `--language` , `--filename` , `--max-results` , `--output`
`search_paperswithcode.py`	Search Papers With Code for paper→repo mappings	`--paper-title` , `--arxiv-id` , `--query` , `--output`
`repo_db.py`	JSONL repo database management	subcommands: `merge` , `filter` , `score` , `search` , `tag` , `stats` , `export` , `rank`
`repo_metadata.py`	Fetch detailed metadata via `gh api`	`--repos` , `--input` , `--output` , `--delay`
`clone_repo.py`	Shallow-clone repos for analysis	`--repo` , `--output-dir` , `--depth` , `--branch`
`analyze_repo_structure.py`	Map file tree, key files, LOC stats	`--repo-dir` , `--output`
`extract_dependencies.py`	Extract and parse dependency files	`--repo-dir` , `--output`
`find_implementations.py`	Search cloned repo for specific code patterns	`--repo-dir` , `--patterns` , `--output`
`repo_readme_fetch.py`	Fetch README without cloning	`--repos` , `--input` , `--output` , `--max-chars`
`compare_repos.py`	Generate comparison matrix across repos	`--input` , `--output`
`compile_github_report.py`	Assemble final report from all phases	`--topic-dir`

所有脚本均为Python 3编写，仅依赖标准库，位于

~/.claude/skills/github-research/scripts/

目录。

脚本	用途	关键参数
`extract_research_refs.py`	解析deep-research输出，提取GitHub URL、论文参考和关键词	`--research-dir` , `--output`
`search_github.py`	通过 `gh api` 搜索GitHub仓库	`--query` , `--language` , `--min-stars` , `--sort` , `--max-results` , `--topic` , `--output`
`search_github_code.py`	搜索GitHub代码以查找实现方案	`--query` , `--language` , `--filename` , `--max-results` , `--output`
`search_paperswithcode.py`	搜索Papers With Code获取论文→仓库的映射	`--paper-title` , `--arxiv-id` , `--query` , `--output`
`repo_db.py`	JSONL仓库数据库管理	子命令： `merge` , `filter` , `score` , `search` , `tag` , `stats` , `export` , `rank`
`repo_metadata.py`	通过 `gh api` 获取详细元数据	`--repos` , `--input` , `--output` , `--delay`
`clone_repo.py`	浅克隆仓库用于分析	`--repo` , `--output-dir` , `--depth` , `--branch`
`analyze_repo_structure.py`	映射文件树、关键文件、LOC统计	`--repo-dir` , `--output`
`extract_dependencies.py`	提取并解析依赖文件	`--repo-dir` , `--output`
`find_implementations.py`	在克隆仓库中搜索特定代码模式	`--repo-dir` , `--patterns` , `--output`
`repo_readme_fetch.py`	无需克隆即可获取README	`--repos` , `--input` , `--output` , `--max-chars`
`compare_repos.py`	生成跨仓对比矩阵	`--input` , `--output`
`compile_github_report.py`	整合所有阶段内容生成最终报告	`--topic-dir`

Phase 1: Intake

Phase 1: 数据采集

Goal: Extract all relevant references, URLs, and keywords from the deep-research output.

目标：从deep-research输出中提取所有相关参考链接、URL和关键词。

Steps

步骤

Create output directory structure:

bash

SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint}

Extract references from deep-research output:

bash

python ~/.claude/skills/github-research/scripts/extract_research_refs.py \
  --research-dir <deep-research-output-dir> \
  --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl

Review extracted refs: Read the generated JSONL. Note:
- GitHub URLs found directly in reports
- Paper titles and arxiv IDs (for Papers With Code lookup)
- Research keywords and themes (for GitHub search queries)
Write intake summary: Create
```
phase1_intake/intake_summary.md
```
with:
- Number of direct GitHub URLs found
- Number of papers with potential code links
- Key research themes extracted
- Planned search queries for Phase 2

创建输出目录结构：

bash

SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint}

从deep-research输出中提取参考信息：

bash

python ~/.claude/skills/github-research/scripts/extract_research_refs.py \
  --research-dir <deep-research-output-dir> \
  --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl

检查提取的参考信息：读取生成的JSONL文件，注意：
- 报告中直接找到的GitHub URL
- 可能有代码链接的论文标题和arxiv ID（用于Papers With Code查询）
- 研究关键词和主题（用于GitHub搜索查询）
撰写采集总结：创建
```
phase1_intake/intake_summary.md
```
，包含：
- 直接找到的GitHub URL数量
- 可能有代码链接的论文数量
- 提取的关键研究主题
- Phase 2计划使用的搜索查询语句

Checkpoint

检查点

```
extracted_refs.jsonl
```
exists with entries
```
intake_summary.md
```
written
Search strategy documented

存在包含有效条目的
```
extracted_refs.jsonl
```
已撰写
```
intake_summary.md
```
已记录搜索策略

Phase 2: Discovery

Phase 2: 仓库发现

Goal: Cast a wide net to find 50-200 candidate repos from multiple sources.

目标：广泛搜索，从多源获取50-200个候选仓库。

Steps

步骤

Search by direct URLs: Any GitHub URLs from Phase 1 → fetch metadata:

bash

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --repos owner1/name1 owner2/name2 ... \
  --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl

Search Papers With Code: For each paper with an arxiv ID:

bash

python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \
  --arxiv-id 2401.12345 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl

Search GitHub by keywords (3-8 queries based on research themes):

bash

python ~/.claude/skills/github-research/scripts/search_github.py \
  --query "multi-agent LLM coordination" \
  --min-stars 10 --sort stars --max-results 50 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl

Search GitHub code (for specific implementations):

bash

python ~/.claude/skills/github-research/scripts/search_github_code.py \
  --query "class MultiAgentOrchestrator" \
  --language python --max-results 30 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl

Fetch READMEs for repos that lack descriptions:

bash

python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \
  --input <repos.jsonl> \
  --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl

Merge all results into master database:

bash

python ~/.claude/skills/github-research/scripts/repo_db.py merge \
  --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

Write discovery log: Create
```
phase2_discovery/discovery_log.md
```
with search queries used, results per source, total unique repos found.

通过直接URL搜索：将Phase 1中获取的GitHub URL提取元数据：

bash

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --repos owner1/name1 owner2/name2 ... \
  --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl

搜索Papers With Code：对每个带有arxiv ID的论文执行：

bash

python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \
  --arxiv-id 2401.12345 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl

按关键词搜索GitHub（基于研究主题生成3-8个查询语句）：

bash

python ~/.claude/skills/github-research/scripts/search_github.py \
  --query "multi-agent LLM coordination" \
  --min-stars 10 --sort stars --max-results 50 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl

搜索GitHub代码（查找特定实现）：

bash

python ~/.claude/skills/github-research/scripts/search_github_code.py \
  --query "class MultiAgentOrchestrator" \
  --language python --max-results 30 \
  --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl

为缺少描述的仓库获取README：

bash

python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \
  --input <repos.jsonl> \
  --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl

合并所有结果到主数据库：

bash

python ~/.claude/skills/github-research/scripts/repo_db.py merge \
  --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

撰写发现日志：创建
```
phase2_discovery/discovery_log.md
```
，包含使用的搜索查询、各来源的结果数量、找到的唯一仓库总数。

Rate Limits

速率限制

GitHub search API: 30 requests/minute (authenticated)
Papers With Code API: No strict limit but be respectful (1 req/sec)
Add
```
--delay 1.0
```
to batch operations when needed

GitHub搜索API：30请求/分钟（已认证）
Papers With Code API：无严格限制，但请保持克制（1请求/秒）
批量操作时可添加
```
--delay 1.0
```
参数

Checkpoint

检查点

```
repo_db.jsonl
```
populated with 50-200 repos
```
discovery_log.md
```
with search details

```
repo_db.jsonl
```
已填充50-200个仓库
已撰写包含搜索详情的
```
discovery_log.md
```

Phase 3: Filtering

Phase 3: 筛选排序

Goal: Score and rank repos, select top 15-30 for deeper analysis.

目标：对仓库进行评分和排序，选出Top 15-30个用于深度分析。

Steps

步骤

Enrich metadata for all repos:

bash

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl \
  --delay 0.5

Score repos (quality + activity scores):

bash

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

LLM relevance scoring: Read through the top ~50 repos (by quality_score) and assign
```
relevance_score
```
(0.0-1.0) based on:
- Direct relevance to research topic
- Implementation completeness
- Code quality signals (from README, description)
- Update the relevance scores:
bash
```
python ~/.claude/skills/github-research/scripts/repo_db.py tag \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --ids owner/name --tags "relevance:0.85"
```

Compute composite scores and rank:

bash

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl
python ~/.claude/skills/github-research/scripts/repo_db.py rank \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --by composite_score

Select top repos: Filter to top 15-30:

bash

python ~/.claude/skills/github-research/scripts/repo_db.py filter \
  --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --max-repos 30 --not-archived

Write filtering report: Create
```
phase3_filtering/filtering_report.md
```
:
- Stats before/after filtering
- Score distributions
- Top 30 repos with scores and rationale

为所有仓库丰富元数据：

bash

python ~/.claude/skills/github-research/scripts/repo_metadata.py \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl \
  --delay 0.5

为仓库评分（质量 + 活跃度评分）：

bash

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl

LLM相关性评分：查看评分Top 50左右的仓库，根据以下维度分配
```
relevance_score
```
（0.0-1.0）：
- 与研究主题的直接相关性
- 实现的完整性
- 代码质量信号（来自README、描述）
- 更新相关性评分：
bash
```
python ~/.claude/skills/github-research/scripts/repo_db.py tag \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --ids owner/name --tags "relevance:0.85"
```

计算综合评分并排序：

bash

python ~/.claude/skills/github-research/scripts/repo_db.py score \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/repo_db.jsonl
python ~/.claude/skills/github-research/scripts/repo_db.py rank \
  --input github-research-output/$SLUG/repo_db.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --by composite_score

选择Top仓库：筛选出Top 15-30个仓库：

bash

python ~/.claude/skills/github-research/scripts/repo_db.py filter \
  --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \
  --max-repos 30 --not-archived

撰写筛选报告：创建
```
phase3_filtering/filtering_report.md
```
：
- 筛选前后的统计数据
- 评分分布情况
- Top 30仓库的评分和入选理由

Scoring Formula

评分公式

activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score  = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25

activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score  = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25

Checkpoint

检查点

```
ranked_repos.jsonl
```
with 15-30 repos
```
filtering_report.md
```
with scoring details

```
ranked_repos.jsonl
```
包含15-30个仓库
已撰写包含评分详情的
```
filtering_report.md
```

Phase 4: Deep Dive

Phase 4: 深度调研

Goal: Clone and deeply analyze the top 8-15 repos.

目标：克隆并深度分析Top 8-15个仓库。

Steps

步骤

Select repos for deep dive: Take top 8-15 from ranked list.

Clone each repo (shallow):

bash

python ~/.claude/skills/github-research/scripts/clone_repo.py \
  --repo owner/name \
  --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/

Analyze structure for each cloned repo:

bash

python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json

Extract dependencies:

bash

python ~/.claude/skills/github-research/scripts/extract_dependencies.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json

Find implementations: Search for key algorithms/concepts from research:

bash

python ~/.claude/skills/github-research/scripts/find_implementations.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --patterns "class Transformer" "def forward" "attention" \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl

Deep code reading: For each repo, READ the key source files identified by structure analysis. Write a per-repo analysis in
```
phase4_deep_dive/analyses/{name}_analysis.md
```
:
- Architecture overview
- Key algorithms implemented
- Code quality assessment
- API / interface design
- Dependencies and requirements
- Strengths and limitations
- Reusability assessment (how easy to extract components)
Write deep dive summary:
```
phase4_deep_dive/deep_dive_summary.md
```

选择深度调研的仓库：从排序列表中选取Top 8-15个。

浅克隆每个仓库：

bash

python ~/.claude/skills/github-research/scripts/clone_repo.py \
  --repo owner/name \
  --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/

分析每个克隆仓库的结构：

bash

python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json

提取依赖信息：

bash

python ~/.claude/skills/github-research/scripts/extract_dependencies.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json

查找实现方案：在克隆仓库中搜索研究中的关键算法/概念：

bash

python ~/.claude/skills/github-research/scripts/find_implementations.py \
  --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \
  --patterns "class Transformer" "def forward" "attention" \
  --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl

深度阅读代码：对每个仓库，阅读结构分析中识别出的主要源码文件。在
```
phase4_deep_dive/analyses/{name}_analysis.md
```
中撰写单仓分析：
- 架构概述
- 实现的关键算法
- 代码质量评估
- API/接口设计
- 依赖和要求
- 优势和局限性
- 可复用性评估（提取组件的难易程度）
撰写深度调研总结：
```
phase4_deep_dive/deep_dive_summary.md
```

IMPORTANT: Actually Read Code

重要提示：务必阅读实际代码

Do NOT just summarize READMEs. You must:

Read the main source files (entry points, core modules)
Understand the actual implementation approach
Identify specific functions/classes that implement research concepts
Note code patterns, design decisions, and trade-offs

不要仅依赖README进行总结。你必须：

阅读主要源码文件（入口点、核心模块）
理解实际的实现思路
识别出实现研究概念的特定函数/类
记录代码模式、设计决策和权衡

Checkpoint

检查点

Repos cloned in
```
repos/
```
Per-repo analysis files in
```
analyses/
```
```
deep_dive_summary.md
```
written

仓库已克隆到
```
repos/
```
目录
各仓库分析文件已在
```
analyses/
```
目录中
已撰写
```
deep_dive_summary.md
```

Phase 5: Analysis

Phase 5: 综合分析

Goal: Cross-repo comparison and technique-to-code mapping.

目标：进行跨仓对比和技术-代码映射。

Steps

步骤

Generate comparison matrix:

bash

python ~/.claude/skills/github-research/scripts/compare_repos.py \
  --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \
  --output github-research-output/$SLUG/phase5_analysis/comparison.json

Write comparison matrix: Create
```
phase5_analysis/comparison_matrix.md
```
:
- Table comparing repos across dimensions (language, LOC, stars, framework, license, tests)
- Dependency overlap analysis
- Strengths/weaknesses per repo
Write technique map: Create
```
phase5_analysis/technique_map.md
```
:
- Map each paper concept / research technique → specific repo + file + function
- Identify gaps (techniques with no implementation found)
- Note alternative implementations of the same concept
Write analysis report:
```
phase5_analysis/analysis_report.md
```
:
- Executive summary of findings
- Key insights from code analysis
- Recommendations for which repos to use for which purposes

生成对比矩阵：

bash

python ~/.claude/skills/github-research/scripts/compare_repos.py \
  --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \
  --output github-research-output/$SLUG/phase5_analysis/comparison.json

撰写对比矩阵文档：创建
```
phase5_analysis/comparison_matrix.md
```
：
- 跨维度对比仓库的表格（语言、LOC、星标、框架、许可证、测试）
- 依赖重叠分析
- 各仓库的优势/劣势
撰写技术映射文档：创建
```
phase5_analysis/technique_map.md
```
：
- 将每个论文概念/研究技术映射到特定仓库 + 文件 + 函数
- 识别空白点（未找到实现的技术）
- 记录同一概念的替代实现方案
撰写分析报告：
```
phase5_analysis/analysis_report.md
```
：
- 研究发现的执行摘要
- 代码分析的关键见解
- 针对不同用途的仓库推荐

Checkpoint

检查点

```
comparison_matrix.md
```
with repo comparison table
```
technique_map.md
```
mapping concepts to code
```
analysis_report.md
```
with findings

包含仓库对比表格的
```
comparison_matrix.md
```
包含概念到代码映射的
```
technique_map.md
```
包含研究发现的
```
analysis_report.md
```

Phase 6: Blueprint

Phase 6: 集成蓝图

Goal: Produce an actionable integration and reuse plan.

目标：生成可直接执行的集成和复用计划。

Steps

步骤

Write integration plan:
```
phase6_blueprint/integration_plan.md
```
:
- Recommended architecture for combining repos
- Step-by-step integration approach
- Dependency resolution strategy
- Potential conflicts and how to resolve them
Write reuse catalog:
```
phase6_blueprint/reuse_catalog.md
```
:
- For each reusable component: source repo, file path, function/class, what it does, how to extract it
- License compatibility matrix
- Effort estimates (easy/medium/hard to integrate)

Compile final report:

bash

python ~/.claude/skills/github-research/scripts/compile_github_report.py \
  --topic-dir github-research-output/$SLUG/

Write blueprint summary:
```
phase6_blueprint/blueprint_summary.md
```
:
- One-page executive summary
- Top 5 repos and why
- Recommended next steps

撰写集成计划：
```
phase6_blueprint/integration_plan.md
```
：
- 推荐的仓库整合架构
- 分步集成方案
- 依赖解决策略
- 潜在冲突及解决方法
撰写复用目录：
```
phase6_blueprint/reuse_catalog.md
```
：
- 每个可复用组件：来源仓库、文件路径、函数/类、功能描述、提取方法
- 许可证兼容性矩阵
- 集成难度预估（易/中/难）

编译最终报告：

bash

python ~/.claude/skills/github-research/scripts/compile_github_report.py \
  --topic-dir github-research-output/$SLUG/

撰写蓝图摘要：
```
phase6_blueprint/blueprint_summary.md
```
：
- 一页纸的执行摘要
- Top 5仓库及入选理由
- 推荐的下一步行动

Checkpoint

检查点

```
integration_plan.md
```
complete
```
reuse_catalog.md
```
with component catalog
```
final_report.md
```
compiled
```
blueprint_summary.md
```
as executive summary

```
integration_plan.md
```
已完成
```
reuse_catalog.md
```
包含组件目录
```
final_report.md
```
已编译完成
```
blueprint_summary.md
```
已作为执行摘要

Quality Conventions

质量规范

Repos are ranked by composite score:

relevance × 0.4 + quality × 0.35 + activity × 0.25

Deep dive requires reading actual code, not just READMEs
Integration blueprint must map paper concepts → specific code files/functions
Incremental saves: Each phase writes to disk immediately
Checkpoint recovery: Can resume from any phase by checking what outputs exist
All scripts are stdlib-only Python — no pip installs needed
gh
CLI is required for GitHub API access (must be authenticated)
Deduplication by
```
repo_id
```
(owner/name) across all searches
Rate limit awareness: Respect GitHub search API limits (30 req/min)

仓库按综合评分排序：

相关性 × 0.4 + 质量 × 0.35 + 活跃度 × 0.25

深度调研必须阅读实际代码，而非仅依赖README
集成蓝图必须将论文概念映射到具体代码文件/函数
增量保存：每个阶段的结果立即写入磁盘
检查点恢复：可通过检查已存在的输出从任意阶段恢复
所有脚本均为仅依赖标准库的Python代码 —— 无需pip安装
需要
gh
CLI 用于GitHub API访问（必须已认证）
去重：通过
```
repo_id
```
（owner/name）在所有搜索中去重
速率限制意识：遵守GitHub搜索API的限制（30请求/分钟）

Error Handling

错误处理

If
```
gh
```
is not installed: warn user and provide installation instructions
If a repo is archived/deleted: skip gracefully, note in log
If clone fails: skip, note in log, continue with remaining repos
If Papers With Code API is down: skip, rely on GitHub search only
Always write partial progress to disk so work is not lost

若未安装
```
gh
```
：提醒用户并提供安装说明
若仓库已归档/删除：优雅跳过，在日志中记录
若克隆失败：跳过，在日志中记录，继续处理剩余仓库
若Papers With Code API不可用：跳过，仅依赖GitHub搜索
始终将部分进度写入磁盘，避免工作成果丢失

References

参考资料

See
```
references/phase-guide.md
```
for detailed phase execution guidance
Deep-research skill:
```
~/.claude/skills/deep-research/SKILL.md
```

Paper database pattern:

~/.claude/skills/deep-research/scripts/paper_db.py

详细的阶段执行指南见
```
references/phase-guide.md
```
Deep-research技能：
```
~/.claude/skills/deep-research/SKILL.md
```

论文数据库模式：

~/.claude/skills/deep-research/scripts/paper_db.py