github-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGitHub Research Skill
GitHub研究技能
Trigger
触发条件
Activate this skill when the user wants to:
- "Find repos for [topic]", "GitHub research on [topic]"
- "Analyze open-source code for [topic]"
- "Find implementations of [paper/technique]"
- "Which repos implement [algorithm]?"
- Uses slash command
/github-research <deep-research-output-dir>
当用户有以下需求时激活此技能:
- "为[主题]查找仓库"、"针对[主题]的GitHub研究"
- "分析[主题]的开源代码"
- "查找[论文/技术]的实现方案"
- "哪些仓库实现了[算法]?"
- 使用 斜杠命令
/github-research <deep-research-output-dir>
Overview
概述
This skill systematically discovers, evaluates, and deeply analyzes GitHub repositories related to a research topic. It reads deep-research output (paper database, phase reports, code references) and produces an actionable integration blueprint for reusing open-source code.
Installation: — scripts, references, and this skill definition.
Output: relative to the current working directory.
Input: A deep-research output directory (containing , phase reports, , etc.)
~/.claude/skills/github-research/./github-research-output/{slug}/paper_db.jsonlcode_repos.md本技能可系统地发现、评估并深度分析与研究主题相关的GitHub仓库。它会读取deep-research输出内容(论文数据库、阶段报告、代码参考),并生成可直接复用开源代码的实用集成蓝图。
安装路径: —— 包含脚本、参考资料和本技能定义。
输出路径:相对于当前工作目录的
输入:deep-research输出目录(包含 、阶段报告、 等文件)
~/.claude/skills/github-research/./github-research-output/{slug}/paper_db.jsonlcode_repos.md6-Phase Pipeline
六阶段流程
Phase 1: Intake → Extract refs, URLs, keywords from deep-research output
Phase 2: Discovery → Multi-source broad GitHub search (50-200 repos)
Phase 3: Filtering → Score & rank → select top 15-30 repos
Phase 4: Deep Dive → Clone & deeply analyze top 8-15 repos (code reading)
Phase 5: Analysis → Per-repo reports + cross-repo comparison
Phase 6: Blueprint → Integration/reuse plan for research topicPhase 1: 数据采集 → 从deep-research输出中提取参考链接、URL和关键词
Phase 2: 仓库发现 → 多源广泛搜索GitHub仓库(50-200个)
Phase 3: 筛选排序 → 评分并排序 → 选出Top 15-30个仓库
Phase 4: 深度调研 → 克隆并深度分析Top 8-15个仓库(阅读代码)
Phase 5: 综合分析 → 单仓报告 + 跨仓对比
Phase 6: 集成蓝图 → 生成研究主题的代码复用集成计划Output Directory Structure
输出目录结构
github-research-output/{slug}/
├── repo_db.jsonl # Master repo database
├── phase1_intake/
│ ├── extracted_refs.jsonl # URLs, keywords, paper-repo links
│ └── intake_summary.md
├── phase2_discovery/
│ ├── search_results/ # Raw JSONL from each search
│ └── discovery_log.md
├── phase3_filtering/
│ ├── ranked_repos.jsonl # Scored & ranked subset
│ └── filtering_report.md
├── phase4_deep_dive/
│ ├── repos/ # Cloned repos (shallow)
│ ├── analyses/ # Per-repo analysis .md files
│ └── deep_dive_summary.md
├── phase5_analysis/
│ ├── comparison_matrix.md # Cross-repo comparison
│ ├── technique_map.md # Paper concept → code mapping
│ └── analysis_report.md
└── phase6_blueprint/
├── integration_plan.md # How to combine repos
├── reuse_catalog.md # Reusable components catalog
├── final_report.md # Complete compiled report
└── blueprint_summary.mdgithub-research-output/{slug}/
├── repo_db.jsonl # 主仓库数据库
├── phase1_intake/
│ ├── extracted_refs.jsonl # URL、关键词、论文-仓库关联
│ └── intake_summary.md
├── phase2_discovery/
│ ├── search_results/ # 各搜索来源的原始JSONL数据
│ └── discovery_log.md
├── phase3_filtering/
│ ├── ranked_repos.jsonl # 已评分排序的仓库子集
│ └── filtering_report.md
├── phase4_deep_dive/
│ ├── repos/ # 浅克隆的仓库
│ ├── analyses/ # 单仓分析的.md文件
│ └── deep_dive_summary.md
├── phase5_analysis/
│ ├── comparison_matrix.md # 跨仓对比表
│ ├── technique_map.md # 论文概念→代码映射表
│ └── analysis_report.md
└── phase6_blueprint/
├── integration_plan.md # 仓库整合方案
├── reuse_catalog.md # 可复用组件目录
├── final_report.md # 完整的编译报告
└── blueprint_summary.mdScripts Reference
脚本参考
All scripts are Python 3, stdlib-only, located in .
~/.claude/skills/github-research/scripts/| Script | Purpose | Key Flags |
|---|---|---|
| Parse deep-research output for GitHub URLs, paper refs, keywords | |
| Search GitHub repos via | |
| Search GitHub code for implementations | |
| Search Papers With Code for paper→repo mappings | |
| JSONL repo database management | subcommands: |
| Fetch detailed metadata via | |
| Shallow-clone repos for analysis | |
| Map file tree, key files, LOC stats | |
| Extract and parse dependency files | |
| Search cloned repo for specific code patterns | |
| Fetch README without cloning | |
| Generate comparison matrix across repos | |
| Assemble final report from all phases | |
所有脚本均为Python 3编写,仅依赖标准库,位于 目录。
~/.claude/skills/github-research/scripts/| 脚本 | 用途 | 关键参数 |
|---|---|---|
| 解析deep-research输出,提取GitHub URL、论文参考和关键词 | |
| 通过 | |
| 搜索GitHub代码以查找实现方案 | |
| 搜索Papers With Code获取论文→仓库的映射 | |
| JSONL仓库数据库管理 | 子命令: |
| 通过 | |
| 浅克隆仓库用于分析 | |
| 映射文件树、关键文件、LOC统计 | |
| 提取并解析依赖文件 | |
| 在克隆仓库中搜索特定代码模式 | |
| 无需克隆即可获取README | |
| 生成跨仓对比矩阵 | |
| 整合所有阶段内容生成最终报告 | |
Phase 1: Intake
Phase 1: 数据采集
Goal: Extract all relevant references, URLs, and keywords from the deep-research output.
目标:从deep-research输出中提取所有相关参考链接、URL和关键词。
Steps
步骤
-
Create output directory structure:bash
SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-') mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint} -
Extract references from deep-research output:bash
python ~/.claude/skills/github-research/scripts/extract_research_refs.py \ --research-dir <deep-research-output-dir> \ --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl -
Review extracted refs: Read the generated JSONL. Note:
- GitHub URLs found directly in reports
- Paper titles and arxiv IDs (for Papers With Code lookup)
- Research keywords and themes (for GitHub search queries)
-
Write intake summary: Createwith:
phase1_intake/intake_summary.md- Number of direct GitHub URLs found
- Number of papers with potential code links
- Key research themes extracted
- Planned search queries for Phase 2
-
创建输出目录结构:bash
SLUG=$(echo "$TOPIC" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-') mkdir -p github-research-output/$SLUG/{phase1_intake,phase2_discovery/search_results,phase3_filtering,phase4_deep_dive/{repos,analyses},phase5_analysis,phase6_blueprint} -
从deep-research输出中提取参考信息:bash
python ~/.claude/skills/github-research/scripts/extract_research_refs.py \ --research-dir <deep-research-output-dir> \ --output github-research-output/$SLUG/phase1_intake/extracted_refs.jsonl -
检查提取的参考信息:读取生成的JSONL文件,注意:
- 报告中直接找到的GitHub URL
- 可能有代码链接的论文标题和arxiv ID(用于Papers With Code查询)
- 研究关键词和主题(用于GitHub搜索查询)
-
撰写采集总结:创建,包含:
phase1_intake/intake_summary.md- 直接找到的GitHub URL数量
- 可能有代码链接的论文数量
- 提取的关键研究主题
- Phase 2计划使用的搜索查询语句
Checkpoint
检查点
- exists with entries
extracted_refs.jsonl - written
intake_summary.md - Search strategy documented
- 存在包含有效条目的
extracted_refs.jsonl - 已撰写
intake_summary.md - 已记录搜索策略
Phase 2: Discovery
Phase 2: 仓库发现
Goal: Cast a wide net to find 50-200 candidate repos from multiple sources.
目标:广泛搜索,从多源获取50-200个候选仓库。
Steps
步骤
-
Search by direct URLs: Any GitHub URLs from Phase 1 → fetch metadata:bash
python ~/.claude/skills/github-research/scripts/repo_metadata.py \ --repos owner1/name1 owner2/name2 ... \ --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl -
Search Papers With Code: For each paper with an arxiv ID:bash
python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \ --arxiv-id 2401.12345 \ --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl -
Search GitHub by keywords (3-8 queries based on research themes):bash
python ~/.claude/skills/github-research/scripts/search_github.py \ --query "multi-agent LLM coordination" \ --min-stars 10 --sort stars --max-results 50 \ --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl -
Search GitHub code (for specific implementations):bash
python ~/.claude/skills/github-research/scripts/search_github_code.py \ --query "class MultiAgentOrchestrator" \ --language python --max-results 30 \ --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl -
Fetch READMEs for repos that lack descriptions:bash
python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \ --input <repos.jsonl> \ --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl -
Merge all results into master database:bash
python ~/.claude/skills/github-research/scripts/repo_db.py merge \ --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl -
Write discovery log: Createwith search queries used, results per source, total unique repos found.
phase2_discovery/discovery_log.md
-
通过直接URL搜索:将Phase 1中获取的GitHub URL提取元数据:bash
python ~/.claude/skills/github-research/scripts/repo_metadata.py \ --repos owner1/name1 owner2/name2 ... \ --output github-research-output/$SLUG/phase2_discovery/search_results/direct_urls.jsonl -
搜索Papers With Code:对每个带有arxiv ID的论文执行:bash
python ~/.claude/skills/github-research/scripts/search_paperswithcode.py \ --arxiv-id 2401.12345 \ --output github-research-output/$SLUG/phase2_discovery/search_results/pwc_2401.12345.jsonl -
按关键词搜索GitHub(基于研究主题生成3-8个查询语句):bash
python ~/.claude/skills/github-research/scripts/search_github.py \ --query "multi-agent LLM coordination" \ --min-stars 10 --sort stars --max-results 50 \ --output github-research-output/$SLUG/phase2_discovery/search_results/gh_query1.jsonl -
搜索GitHub代码(查找特定实现):bash
python ~/.claude/skills/github-research/scripts/search_github_code.py \ --query "class MultiAgentOrchestrator" \ --language python --max-results 30 \ --output github-research-output/$SLUG/phase2_discovery/search_results/code_query1.jsonl -
为缺少描述的仓库获取README:bash
python ~/.claude/skills/github-research/scripts/repo_readme_fetch.py \ --input <repos.jsonl> \ --output github-research-output/$SLUG/phase2_discovery/search_results/readmes.jsonl -
合并所有结果到主数据库:bash
python ~/.claude/skills/github-research/scripts/repo_db.py merge \ --inputs github-research-output/$SLUG/phase2_discovery/search_results/*.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl -
撰写发现日志:创建,包含使用的搜索查询、各来源的结果数量、找到的唯一仓库总数。
phase2_discovery/discovery_log.md
Rate Limits
速率限制
- GitHub search API: 30 requests/minute (authenticated)
- Papers With Code API: No strict limit but be respectful (1 req/sec)
- Add to batch operations when needed
--delay 1.0
- GitHub搜索API:30请求/分钟(已认证)
- Papers With Code API:无严格限制,但请保持克制(1请求/秒)
- 批量操作时可添加 参数
--delay 1.0
Checkpoint
检查点
- populated with 50-200 repos
repo_db.jsonl - with search details
discovery_log.md
- 已填充50-200个仓库
repo_db.jsonl - 已撰写包含搜索详情的
discovery_log.md
Phase 3: Filtering
Phase 3: 筛选排序
Goal: Score and rank repos, select top 15-30 for deeper analysis.
目标:对仓库进行评分和排序,选出Top 15-30个用于深度分析。
Steps
步骤
-
Enrich metadata for all repos:bash
python ~/.claude/skills/github-research/scripts/repo_metadata.py \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl \ --delay 0.5 -
Score repos (quality + activity scores):bash
python ~/.claude/skills/github-research/scripts/repo_db.py score \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl -
LLM relevance scoring: Read through the top ~50 repos (by quality_score) and assign(0.0-1.0) based on:
relevance_score- Direct relevance to research topic
- Implementation completeness
- Code quality signals (from README, description)
- Update the relevance scores:
bashpython ~/.claude/skills/github-research/scripts/repo_db.py tag \ --input github-research-output/$SLUG/repo_db.jsonl \ --ids owner/name --tags "relevance:0.85" -
Compute composite scores and rank:bash
python ~/.claude/skills/github-research/scripts/repo_db.py score \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl python ~/.claude/skills/github-research/scripts/repo_db.py rank \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --by composite_score -
Select top repos: Filter to top 15-30:bash
python ~/.claude/skills/github-research/scripts/repo_db.py filter \ --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --max-repos 30 --not-archived -
Write filtering report: Create:
phase3_filtering/filtering_report.md- Stats before/after filtering
- Score distributions
- Top 30 repos with scores and rationale
-
为所有仓库丰富元数据:bash
python ~/.claude/skills/github-research/scripts/repo_metadata.py \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl \ --delay 0.5 -
为仓库评分(质量 + 活跃度评分):bash
python ~/.claude/skills/github-research/scripts/repo_db.py score \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl -
LLM相关性评分:查看评分Top 50左右的仓库,根据以下维度分配(0.0-1.0):
relevance_score- 与研究主题的直接相关性
- 实现的完整性
- 代码质量信号(来自README、描述)
- 更新相关性评分:
bashpython ~/.claude/skills/github-research/scripts/repo_db.py tag \ --input github-research-output/$SLUG/repo_db.jsonl \ --ids owner/name --tags "relevance:0.85" -
计算综合评分并排序:bash
python ~/.claude/skills/github-research/scripts/repo_db.py score \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/repo_db.jsonl python ~/.claude/skills/github-research/scripts/repo_db.py rank \ --input github-research-output/$SLUG/repo_db.jsonl \ --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --by composite_score -
选择Top仓库:筛选出Top 15-30个仓库:bash
python ~/.claude/skills/github-research/scripts/repo_db.py filter \ --input github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --output github-research-output/$SLUG/phase3_filtering/ranked_repos.jsonl \ --max-repos 30 --not-archived -
撰写筛选报告:创建:
phase3_filtering/filtering_report.md- 筛选前后的统计数据
- 评分分布情况
- Top 30仓库的评分和入选理由
Scoring Formula
评分公式
activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25activity_score = sigmoid((days_since_push < 90) * 0.4 + has_recent_commits * 0.3 + open_issues_ratio * 0.3)
quality_score = normalize(log(stars+1) * 0.3 + log(forks+1) * 0.2 + has_license * 0.15 + has_readme * 0.15 + not_archived * 0.2)
composite_score = relevance * 0.4 + quality * 0.35 + activity * 0.25Checkpoint
检查点
- with 15-30 repos
ranked_repos.jsonl - with scoring details
filtering_report.md
- 包含15-30个仓库
ranked_repos.jsonl - 已撰写包含评分详情的
filtering_report.md
Phase 4: Deep Dive
Phase 4: 深度调研
Goal: Clone and deeply analyze the top 8-15 repos.
目标:克隆并深度分析Top 8-15个仓库。
Steps
步骤
-
Select repos for deep dive: Take top 8-15 from ranked list.
-
Clone each repo (shallow):bash
python ~/.claude/skills/github-research/scripts/clone_repo.py \ --repo owner/name \ --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/ -
Analyze structure for each cloned repo:bash
python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json -
Extract dependencies:bash
python ~/.claude/skills/github-research/scripts/extract_dependencies.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json -
Find implementations: Search for key algorithms/concepts from research:bash
python ~/.claude/skills/github-research/scripts/find_implementations.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --patterns "class Transformer" "def forward" "attention" \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl -
Deep code reading: For each repo, READ the key source files identified by structure analysis. Write a per-repo analysis in:
phase4_deep_dive/analyses/{name}_analysis.md- Architecture overview
- Key algorithms implemented
- Code quality assessment
- API / interface design
- Dependencies and requirements
- Strengths and limitations
- Reusability assessment (how easy to extract components)
-
Write deep dive summary:
phase4_deep_dive/deep_dive_summary.md
-
选择深度调研的仓库:从排序列表中选取Top 8-15个。
-
浅克隆每个仓库:bash
python ~/.claude/skills/github-research/scripts/clone_repo.py \ --repo owner/name \ --output-dir github-research-output/$SLUG/phase4_deep_dive/repos/ -
分析每个克隆仓库的结构:bash
python ~/.claude/skills/github-research/scripts/analyze_repo_structure.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_structure.json -
提取依赖信息:bash
python ~/.claude/skills/github-research/scripts/extract_dependencies.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_deps.json -
查找实现方案:在克隆仓库中搜索研究中的关键算法/概念:bash
python ~/.claude/skills/github-research/scripts/find_implementations.py \ --repo-dir github-research-output/$SLUG/phase4_deep_dive/repos/name/ \ --patterns "class Transformer" "def forward" "attention" \ --output github-research-output/$SLUG/phase4_deep_dive/analyses/name_impls.jsonl -
深度阅读代码:对每个仓库,阅读结构分析中识别出的主要源码文件。在中撰写单仓分析:
phase4_deep_dive/analyses/{name}_analysis.md- 架构概述
- 实现的关键算法
- 代码质量评估
- API/接口设计
- 依赖和要求
- 优势和局限性
- 可复用性评估(提取组件的难易程度)
-
撰写深度调研总结:
phase4_deep_dive/deep_dive_summary.md
IMPORTANT: Actually Read Code
重要提示:务必阅读实际代码
Do NOT just summarize READMEs. You must:
- Read the main source files (entry points, core modules)
- Understand the actual implementation approach
- Identify specific functions/classes that implement research concepts
- Note code patterns, design decisions, and trade-offs
不要仅依赖README进行总结。你必须:
- 阅读主要源码文件(入口点、核心模块)
- 理解实际的实现思路
- 识别出实现研究概念的特定函数/类
- 记录代码模式、设计决策和权衡
Checkpoint
检查点
- Repos cloned in
repos/ - Per-repo analysis files in
analyses/ - written
deep_dive_summary.md
- 仓库已克隆到 目录
repos/ - 各仓库分析文件已在 目录中
analyses/ - 已撰写
deep_dive_summary.md
Phase 5: Analysis
Phase 5: 综合分析
Goal: Cross-repo comparison and technique-to-code mapping.
目标:进行跨仓对比和技术-代码映射。
Steps
步骤
-
Generate comparison matrix:bash
python ~/.claude/skills/github-research/scripts/compare_repos.py \ --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \ --output github-research-output/$SLUG/phase5_analysis/comparison.json -
Write comparison matrix: Create:
phase5_analysis/comparison_matrix.md- Table comparing repos across dimensions (language, LOC, stars, framework, license, tests)
- Dependency overlap analysis
- Strengths/weaknesses per repo
-
Write technique map: Create:
phase5_analysis/technique_map.md- Map each paper concept / research technique → specific repo + file + function
- Identify gaps (techniques with no implementation found)
- Note alternative implementations of the same concept
-
Write analysis report::
phase5_analysis/analysis_report.md- Executive summary of findings
- Key insights from code analysis
- Recommendations for which repos to use for which purposes
-
生成对比矩阵:bash
python ~/.claude/skills/github-research/scripts/compare_repos.py \ --input github-research-output/$SLUG/phase4_deep_dive/analyses/ \ --output github-research-output/$SLUG/phase5_analysis/comparison.json -
撰写对比矩阵文档:创建:
phase5_analysis/comparison_matrix.md- 跨维度对比仓库的表格(语言、LOC、星标、框架、许可证、测试)
- 依赖重叠分析
- 各仓库的优势/劣势
-
撰写技术映射文档:创建:
phase5_analysis/technique_map.md- 将每个论文概念/研究技术映射到特定仓库 + 文件 + 函数
- 识别空白点(未找到实现的技术)
- 记录同一概念的替代实现方案
-
撰写分析报告::
phase5_analysis/analysis_report.md- 研究发现的执行摘要
- 代码分析的关键见解
- 针对不同用途的仓库推荐
Checkpoint
检查点
- with repo comparison table
comparison_matrix.md - mapping concepts to code
technique_map.md - with findings
analysis_report.md
- 包含仓库对比表格的
comparison_matrix.md - 包含概念到代码映射的
technique_map.md - 包含研究发现的
analysis_report.md
Phase 6: Blueprint
Phase 6: 集成蓝图
Goal: Produce an actionable integration and reuse plan.
目标:生成可直接执行的集成和复用计划。
Steps
步骤
-
Write integration plan::
phase6_blueprint/integration_plan.md- Recommended architecture for combining repos
- Step-by-step integration approach
- Dependency resolution strategy
- Potential conflicts and how to resolve them
-
Write reuse catalog::
phase6_blueprint/reuse_catalog.md- For each reusable component: source repo, file path, function/class, what it does, how to extract it
- License compatibility matrix
- Effort estimates (easy/medium/hard to integrate)
-
Compile final report:bash
python ~/.claude/skills/github-research/scripts/compile_github_report.py \ --topic-dir github-research-output/$SLUG/ -
Write blueprint summary::
phase6_blueprint/blueprint_summary.md- One-page executive summary
- Top 5 repos and why
- Recommended next steps
-
撰写集成计划::
phase6_blueprint/integration_plan.md- 推荐的仓库整合架构
- 分步集成方案
- 依赖解决策略
- 潜在冲突及解决方法
-
撰写复用目录::
phase6_blueprint/reuse_catalog.md- 每个可复用组件:来源仓库、文件路径、函数/类、功能描述、提取方法
- 许可证兼容性矩阵
- 集成难度预估(易/中/难)
-
编译最终报告:bash
python ~/.claude/skills/github-research/scripts/compile_github_report.py \ --topic-dir github-research-output/$SLUG/ -
撰写蓝图摘要::
phase6_blueprint/blueprint_summary.md- 一页纸的执行摘要
- Top 5仓库及入选理由
- 推荐的下一步行动
Checkpoint
检查点
- complete
integration_plan.md - with component catalog
reuse_catalog.md - compiled
final_report.md - as executive summary
blueprint_summary.md
- 已完成
integration_plan.md - 包含组件目录
reuse_catalog.md - 已编译完成
final_report.md - 已作为执行摘要
blueprint_summary.md
Quality Conventions
质量规范
- Repos are ranked by composite score:
relevance × 0.4 + quality × 0.35 + activity × 0.25 - Deep dive requires reading actual code, not just READMEs
- Integration blueprint must map paper concepts → specific code files/functions
- Incremental saves: Each phase writes to disk immediately
- Checkpoint recovery: Can resume from any phase by checking what outputs exist
- All scripts are stdlib-only Python — no pip installs needed
- CLI is required for GitHub API access (must be authenticated)
gh - Deduplication by (owner/name) across all searches
repo_id - Rate limit awareness: Respect GitHub search API limits (30 req/min)
- 仓库按综合评分排序:
相关性 × 0.4 + 质量 × 0.35 + 活跃度 × 0.25 - 深度调研必须阅读实际代码,而非仅依赖README
- 集成蓝图必须将论文概念映射到具体代码文件/函数
- 增量保存:每个阶段的结果立即写入磁盘
- 检查点恢复:可通过检查已存在的输出从任意阶段恢复
- 所有脚本均为仅依赖标准库的Python代码 —— 无需pip安装
- 需要 CLI 用于GitHub API访问(必须已认证)
gh - 去重:通过 (owner/name)在所有搜索中去重
repo_id - 速率限制意识:遵守GitHub搜索API的限制(30请求/分钟)
Error Handling
错误处理
- If is not installed: warn user and provide installation instructions
gh - If a repo is archived/deleted: skip gracefully, note in log
- If clone fails: skip, note in log, continue with remaining repos
- If Papers With Code API is down: skip, rely on GitHub search only
- Always write partial progress to disk so work is not lost
- 若未安装 :提醒用户并提供安装说明
gh - 若仓库已归档/删除:优雅跳过,在日志中记录
- 若克隆失败:跳过,在日志中记录,继续处理剩余仓库
- 若Papers With Code API不可用:跳过,仅依赖GitHub搜索
- 始终将部分进度写入磁盘,避免工作成果丢失
References
参考资料
- See for detailed phase execution guidance
references/phase-guide.md - Deep-research skill:
~/.claude/skills/deep-research/SKILL.md - Paper database pattern:
~/.claude/skills/deep-research/scripts/paper_db.py
- 详细的阶段执行指南见
references/phase-guide.md - Deep-research技能:
~/.claude/skills/deep-research/SKILL.md - 论文数据库模式:
~/.claude/skills/deep-research/scripts/paper_db.py