github-explorer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GitHub Explorer — 项目深度分析

GitHub Explorer — In-Depth Project Analysis

Philosophy: README 只是门面,真正的价值藏在 Issues、Commits 和社区讨论里。
Philosophy: The README is just the facade; the real value lies in Issues, Commits, and community discussions.

Workflow

Workflow

[项目名] → [1. 定位 Repo] → [2. 多源采集] → [3. 分析研判] → [4. 结构化输出]
[Project Name] → [1. Locate Repo] → [2. Multi-Source Collection] → [3. Analysis and Judgment] → [4. Structured Output]

Phase 1: 定位 Repo

Phase 1: Locate the Repo

  • web_search
    搜索
    site:github.com <project_name>
    确认完整 org/repo
  • search-layer
    (Deep 模式 + 意图感知)补充获取社区链接和非 GitHub 资源:
    bash
    python3 skills/search-layer/scripts/search.py \
      --queries "<project_name> review" "<project_name> 评测 使用体验" \
      --mode deep --intent exploratory --num 5
  • web_fetch
    抓取 repo 主页获取基础信息(README、Stars、Forks、License、最近更新)
  • Use
    web_search
    to search
    site:github.com <project_name>
    to confirm the full org/repo path
  • Use
    search-layer
    (Deep Mode + Intent Awareness) to supplement community links and non-GitHub resources:
    bash
    python3 skills/search-layer/scripts/search.py \
      --queries "<project_name> review" "<project_name> evaluation user experience" \
      --mode deep --intent exploratory --num 5
  • Use
    web_fetch
    to crawl the repo homepage for basic information (README, Stars, Forks, License, latest updates)

Phase 2: 多源采集(并行)

Phase 2: Multi-Source Collection (Parallel)

⚠️ GitHub 页面抓取规则(强制):GitHub repo 页面是 SPA(客户端渲染),
web_fetch
只能拿到导航栏壳子,禁止用 web_fetch 抓 github.com 的 repo 页面。一律使用 GitHub API:
  • README:
    curl -s -H "Authorization: token {PAT}" -H "Accept: application/vnd.github.v3.raw" "https://api.github.com/repos/{owner}/{repo}/readme"
  • Repo 元数据:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}"
  • Issues:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/issues?state=all&sort=comments&per_page=10"
  • Commits:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/commits?per_page=10"
  • File tree:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"
PAT 见 TOOLS.md。
以下来源按需检查,有则采集,无则跳过:
来源URL 模式采集内容建议工具
GitHub Repo
github.com/{org}/{repo}
README、About、Contributors
web_fetch
GitHub Issues
github.com/{org}/{repo}/issues?q=sort:comments
Top 3-5 高质量 Issue
browser
中文社区微信/知乎/小红书深度评测、使用经验
content-extract
技术博客Medium/Dev.to技术架构分析
web_fetch
/
content-extract
讨论区V2EX/Reddit用户反馈、槽点
search-layer
(Deep 模式)
⚠️ GitHub Page Crawling Rules (Mandatory): GitHub repo pages are SPAs (client-side rendered),
web_fetch
can only get the navigation bar shell. Do NOT use web_fetch to crawl github.com repo pages. Always use the GitHub API:
  • README:
    curl -s -H "Authorization: token {PAT}" -H "Accept: application/vnd.github.v3.raw" "https://api.github.com/repos/{owner}/{repo}/readme"
  • Repo Metadata:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}"
  • Issues:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/issues?state=all&sort=comments&per_page=10"
  • Commits:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/commits?per_page=10"
  • File Tree:
    curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"
See TOOLS.md for PAT.
Check the following sources as needed, collect if available, skip if not:
SourceURL PatternCollected ContentRecommended Tool
GitHub Repo
github.com/{org}/{repo}
README, About, Contributors
web_fetch
GitHub Issues
github.com/{org}/{repo}/issues?q=sort:comments
Top 3-5 high-quality Issues
browser
Chinese CommunitiesWeChat/Zhihu/XiaohongshuIn-depth reviews, usage experience
content-extract
Technical BlogsMedium/Dev.toTechnical architecture analysis
web_fetch
/
content-extract
Discussion ForumsV2EX/RedditUser feedback, pain points
search-layer
(Deep Mode)

search-layer 调用规范

search-layer Calling Specifications

search-layer v2 支持意图感知评分。github-explorer 场景下的推荐用法:
场景命令说明
项目调研(默认)
python3 skills/search-layer/scripts/search.py --queries "<project> review" "<project> 评测" --mode deep --intent exploratory --num 5
多查询并行,按权威性排序
最新动态
python3 skills/search-layer/scripts/search.py "<project> latest release" --mode deep --intent status --freshness pw --num 5
优先新鲜度,过滤一周内
竞品对比
python3 skills/search-layer/scripts/search.py --queries "<project> vs <competitor>" "<project> alternatives" --mode deep --intent comparison --num 5
对比意图,关键词+权威双权重
快速查链接
python3 skills/search-layer/scripts/search.py "<project> official docs" --mode fast --intent resource --num 3
精确匹配,最快
社区讨论
python3 skills/search-layer/scripts/search.py "<project> discussion experience" --mode deep --intent exploratory --domain-boost reddit.com,news.ycombinator.com --num 5
加权社区站点
意图类型速查
factual
(事实) /
status
(动态) /
comparison
(对比) /
tutorial
(教程) /
exploratory
(探索) /
news
(新闻) /
resource
(资源定位)
不带
--intent
时行为与 v1 完全一致(无评分,按原始顺序输出)。
降级规则:Exa/Tavily 任一 429/5xx → 继续用剩余源;脚本整体失败 → 退回
web_search
单源。

search-layer v2 supports intent-aware scoring. Recommended usage for github-explorer scenarios:
ScenarioCommandDescription
Project Research (Default)
python3 skills/search-layer/scripts/search.py --queries "<project> review" "<project> evaluation" --mode deep --intent exploratory --num 5
Parallel multi-query, sorted by authority
Latest Updates
python3 skills/search-layer/scripts/search.py "<project> latest release" --mode deep --intent status --freshness pw --num 5
Prioritize freshness, filter content from the past week
Competitor Comparison
python3 skills/search-layer/scripts/search.py --queries "<project> vs <competitor>" "<project> alternatives" --mode deep --intent comparison --num 5
Comparison intent, dual weighting of keywords and authority
Quick Link Lookup
python3 skills/search-layer/scripts/search.py "<project> official docs" --mode fast --intent resource --num 3
Exact match, fastest speed
Community Discussions
python3 skills/search-layer/scripts/search.py "<project> discussion experience" --mode deep --intent exploratory --domain-boost reddit.com,news.ycombinator.com --num 5
Weighted community sites
Intent Type Quick Reference:
factual
(Factual) /
status
(Status) /
comparison
(Comparison) /
tutorial
(Tutorial) /
exploratory
(Exploratory) /
news
(News) /
resource
(Resource Locator)
Without
--intent
, the behavior is exactly the same as v1 (no scoring, output in original order).
Degradation Rules: If either Exa/Tavily returns 429/5xx → continue using remaining sources; if the entire script fails → fall back to single-source
web_search
.

抓取降级与增强协议 (Extraction Upgrade)

Extraction Upgrade and Degradation Protocol

当遇到以下情况时,必须
web_fetch
升级为
content-extract
  1. 域名限制:
    mp.weixin.qq.com
    ,
    zhihu.com
    ,
    xiaohongshu.com
  2. 结构复杂: 页面包含大量公式 (LaTeX)、复杂表格、或
    web_fetch
    返回的 Markdown 极其凌乱。
  3. 内容缺失:
    web_fetch
    因反爬返回空内容或 Challenge 页面。
调用方式:
bash
python3 skills/content-extract/scripts/content_extract.py --url <URL>
content-extract 内部会:
  • 先检查域名白名单(微信/知乎等),命中则直接走 MinerU
  • 否则先用
    web_fetch
    探针,失败再 fallback 到 MinerU-HTML
  • 返回统一 JSON 合同(含
    ok
    ,
    markdown
    ,
    sources
    等字段)
Must upgrade from
web_fetch
to
content-extract
when encountering the following situations:
  1. Domain Restrictions:
    mp.weixin.qq.com
    ,
    zhihu.com
    ,
    xiaohongshu.com
    .
  2. Complex Structure: Pages contain a large number of formulas (LaTeX), complex tables, or the Markdown returned by
    web_fetch
    is extremely messy.
  3. Content Missing:
    web_fetch
    returns empty content or a Challenge page due to anti-crawling measures.
Calling Method:
bash
python3 skills/content-extract/scripts/content_extract.py --url <URL>
content-extract internally:
  • First checks the domain whitelist (WeChat/Zhihu, etc.), uses MinerU directly if matched
  • Otherwise, first uses
    web_fetch
    for probing, falls back to MinerU-HTML if failed
  • Returns a unified JSON contract (including fields like
    ok
    ,
    markdown
    ,
    sources
    )

Phase 3: 分析研判

Phase 3: Analysis and Judgment

基于采集数据进行判断:
  • 项目阶段: 早期实验 / 快速成长 / 成熟稳定 / 维护模式 / 停滞(基于 commit 频率和内容)
  • 精选 Issue 标准: 评论数多、maintainer 参与、暴露架构问题、或包含有价值的技术讨论
  • 竞品识别: 从 README 的 "Comparison"/"Alternatives" 章节、Issues 讨论、以及 web 搜索中提取
Make judgments based on collected data:
  • Project Phase: Early Experimentation / Rapid Growth / Mature & Stable / Maintenance Mode / Stagnant (based on commit frequency and content)
  • High-Quality Issue Criteria: High number of comments, maintainer participation, exposes architecture issues, or contains valuable technical discussions
  • Competitor Identification: Extract from the "Comparison"/"Alternatives" section of the README, Issue discussions, and web searches

Phase 4: 结构化输出

Phase 4: Structured Output

严格按以下模板输出,每个模块都必须有实质内容或明确标注"未找到"
Strictly follow the template below, each module must have substantive content or clearly marked "Not Found".

排版规则(强制)

Formatting Rules (Mandatory)

  1. 标题必须链接到 GitHub 仓库(格式:
    # [Project Name](https://github.com/org/repo)
    ,确保可点击跳转)
  2. 标题前后都统一空行(上一板块结尾 → 空行 → 标题 → 空行 → 内容,确保视觉分隔清晰)
  3. Telegram 空行修复(强制):Telegram 会吞掉列表项(
    -
    开头)后面的空行。解决方案:在列表末尾与下一个标题之间,插入一行盲文空格
    (U+2800),格式如下:
    - 列表最后一项
    
    **下一个标题**
    这确保在 Telegram 渲染时标题前的空行不被吞掉。
  4. 所有标题加粗(emoji + 粗体文字)
  5. 竞品对比必须附链接(GitHub / 官网 / 文档,至少一个)
  6. 社区声量必须具体:引用具体的帖子/推文/讨论内容摘要,附原始链接。不要写"评价很高"、"热度很高"这种概括性描述,要写"某某说了什么"或"某帖讨论了什么具体问题"
  7. 信息溯源原则:所有引用的外部信息都应附上原始链接,让读者能追溯到源头
markdown
undefined
  1. Title Must Link to GitHub Repository (Format:
    # [Project Name](https://github.com/org/repo)
    , ensure clickable jump)
  2. Uniform Empty Lines Around Titles (End of previous section → empty line → title → empty line → content, ensure clear visual separation)
  3. Telegram Empty Line Fix (Mandatory): Telegram will swallow empty lines after list items (starting with
    -
    ). Solution: Insert a line of braille space
    (U+2800) between the end of the list block and the next title, format as follows:
    - Last item in list
    
    **Next Title**
    This ensures the empty line before the title is not swallowed during Telegram rendering.
  4. All Titles Bold (emoji + bold text)
  5. Competitor Comparison Must Include Links (GitHub / official website / documentation, at least one)
  6. Community Volume Must Be Specific: Quote specific post/tweet/discussion content summaries, attach original links. Do not use general descriptions like "highly praised" or "very popular"; instead, write "someone said something" or "a certain post discussed specific issues"
  7. Information Traceability Principle: All quoted external information should include the original link so readers can trace the source
markdown
undefined

[{Project Name}]({GitHub Repo URL})

[{Project Name}]({GitHub Repo URL})

🎯 一句话定位
{是什么、解决什么问题}
⚙️ 核心机制
{技术原理/架构,用人话讲清楚,不是复制 README。包含关键技术栈。}
📊 项目健康度
  • Stars: {数量} | Forks: {数量} | License: {类型}
  • 团队/作者: {背景}
  • Commit 趋势: {最近活跃度 + 项目阶段判断}
  • 最近动态: {最近几条重要 commit 概述}
🔥 精选 Issue
{Top 3-5 高质量 Issue,每条包含标题、链接、核心讨论点。如无高质量 Issue 则注明。}
✅ 适用场景
{什么时候该用,解决什么具体问题}
⚠️ 局限
{什么时候别碰,已知问题}
🆚 竞品对比
{同赛道项目对比,差异点。每个竞品必须附 GitHub 或官网链接,格式示例:}
🌐 知识图谱
  • DeepWiki: {链接或"未收录"}
  • Zread.ai: {链接或"未收录"}
🎬 Demo
{在线体验链接,或"无"}
📄 关联论文
{arXiv 链接,或"无"}
📰 社区声量
X/Twitter
{具体引用推文内容摘要 + 链接,格式示例:}
  • @某用户: "具体说了什么..."
  • 某讨论串: 讨论了什么具体问题... {如未找到则注明"未找到相关讨论"}
中文社区
{具体引用帖子标题/内容摘要 + 链接,格式示例:}
  • 知乎: 帖子标题 — 讨论了什么
  • V2EX: 帖子标题 — 讨论了什么 {如未找到则注明"未找到相关讨论"}
💬 我的判断
{主观评价:值不值得投入时间,适合什么水平的人,建议怎么用}
undefined
🎯 One-sentence Positioning
{What it is, what problem it solves}
⚙️ Core Mechanism
{Technical principles/architecture, explained in plain language, not copied from README. Include key tech stack.}
📊 Project Health
  • Stars: {Number} | Forks: {Number} | License: {Type}
  • Team/Author: {Background}
  • Commit Trend: {Recent activity + project phase judgment}
  • Latest Updates: {Overview of recent important commits}
🔥 Selected Issues
{Top 3-5 high-quality Issues, each including title, link, core discussion points. Note if no high-quality Issues are available.}
✅ Applicable Scenarios
{When to use it, what specific problems it solves}
⚠️ Limitations
{When to avoid it, known issues}
🆚 Competitor Comparison
{Comparison with same-track projects, differences. Each competitor must include a GitHub or official website link, example format:}
  • vs GraphRAG — Difference description
  • vs RAGFlow — Difference description
🌐 Knowledge Graph
  • DeepWiki: {Link or "Not Included"}
  • Zread.ai: {Link or "Not Included"}
🎬 Demo
{Online experience link, or "None"}
📄 Related Papers
{arXiv link, or "None"}
📰 Community Volume
X/Twitter
{Quote specific tweet content summaries + links, example format:}
  • @Username: "Specific content..."
  • Discussion Thread: Discussed specific issues... {Note "No relevant discussions found" if none are available}
Chinese Communities
{Quote specific post titles/content summaries + links, example format:}
  • Zhihu: Post Title — Discussed content
  • V2EX: Post Title — Discussed content {Note "No relevant discussions found" if none are available}
💬 My Judgment
{Subjective evaluation: Whether it's worth investing time in, suitable for what level of users, suggestions on how to use it}
undefined

Execution Notes

Execution Notes

  • 优先使用
    web_search
    +
    web_fetch
    ,browser 作为备选
  • 搜索增强:项目调研类任务默认使用
    search-layer
    v2 Deep 模式 +
    --intent exploratory
    (Brave + Exa + Tavily 三源并行去重 + 意图感知评分),单源失败不阻塞主流程
  • 抓取降级(强制):当
    web_fetch
    失败/403/反爬页/正文过短,或来源域名属于高风险站点(如微信/知乎/小红书)时:改用
    content-extract
    (其内部会 fallback 到 MinerU-HTML),拿到更干净的 Markdown + 可追溯 sources
  • 并行采集不同来源以提高效率
  • 所有链接必须真实可访问,不要编造 URL
  • 中文输出,技术术语保留英文
  • Prioritize using
    web_search
    +
    web_fetch
    , with
    browser
    as a fallback
  • Search Enhancement: For project research tasks, default to
    search-layer
    v2 Deep Mode +
    --intent exploratory
    (Brave + Exa + Tavily three-source parallel deduplication + intent-aware scoring), single-source failure does not block the main process
  • Mandatory Crawling Degradation: When
    web_fetch
    fails/returns 403/anti-crawling page/too short content, or the source domain belongs to high-risk sites (such as WeChat/Zhihu/Xiaohongshu): switch to
    content-extract
    (which internally falls back to MinerU-HTML) to get cleaner Markdown + traceable sources
  • Collect from different sources in parallel to improve efficiency
  • All links must be valid and accessible, do not fabricate URLs
  • Output in Chinese, retain English for technical terms

⚠️ 输出自检清单(强制,每次输出前逐条核对)

⚠️ Output Self-Check List (Mandatory, Check Item by Item Before Each Output)

输出报告前,必须逐条检查以下项目,全部通过才可发送:
  • 标题链接
    # [Project Name](GitHub URL)
    格式,可点击跳转
  • 标题空行:每个粗体标题(
    **🎯 ...**
    )前后各有一个空行
  • Telegram 空行:每个列表块末尾与下一个标题之间有盲文空格
    行(防止 Telegram 吞空行)
  • Issue 链接:精选 Issue 每条都有完整
    [#号 标题](完整URL)
    格式
  • 竞品链接:每个竞品都附
    [名称](GitHub/官网链接)
  • 社区声量链接:每条引用都有
    [来源: 标题](URL)
    格式
  • 无空泛描述:社区声量部分没有"评价很高"、"热度很高"等概括性描述
  • 信息溯源:所有外部引用都附原始链接
Before sending the output report, must check the following items one by one, send only if all are passed:
  • Title Link: In the format
    # [Project Name](GitHub URL)
    , clickable for jump
  • Empty Lines Around Titles: Each bold title (
    **🎯 ...**
    ) has one empty line before and after
  • Telegram Empty Line: There is a braille space
    line between the end of each list block and the next title (prevents Telegram from swallowing empty lines)
  • Issue Links: Each selected Issue is in the complete format
    [#Number Title](Full URL)
  • Competitor Links: Each competitor is attached with
    [Name](GitHub/Official Website Link)
  • Community Volume Links: Each quote is in the format
    [Source: Title](URL)
  • No Vague Descriptions: No general descriptions like "highly praised" or "very popular" in the community volume section
  • Information Traceability: All external quotes are attached with original links

Dependencies

Dependencies

本 Skill 依赖以下 OpenClaw 工具和 Skills:
依赖类型用途
web_search
内置工具Brave Search 检索
web_fetch
内置工具网页内容抓取
browser
内置工具动态页面渲染(备选)
search-layer
Skill多源搜索 + 意图感知评分(Brave + Exa + Tavily + Grok),v2.1 支持
--intent
/
--queries
/
--freshness
content-extract
Skill高保真内容提取(反爬站点降级方案)
This Skill depends on the following OpenClaw tools and Skills:
DependencyTypePurpose
web_search
Built-in ToolBrave Search retrieval
web_fetch
Built-in ToolWeb content crawling
browser
Built-in ToolDynamic page rendering (fallback)
search-layer
SkillMulti-source search + intent-aware scoring (Brave + Exa + Tavily + Grok), v2.1 supports
--intent
/
--queries
/
--freshness
content-extract
SkillHigh-fidelity content extraction (degradation solution for anti-crawling sites)