Back to Details

github-explorer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GitHub Explorer — 项目深度分析

GitHub Explorer — In-Depth Project Analysis

Philosophy: README 只是门面，真正的价值藏在 Issues、Commits 和社区讨论里。

Philosophy: The README is just the facade; the real value lies in Issues, Commits, and community discussions.

Workflow

Workflow

[项目名] → [1. 定位 Repo] → [2. 多源采集] → [3. 分析研判] → [4. 结构化输出]

[Project Name] → [1. Locate Repo] → [2. Multi-Source Collection] → [3. Analysis and Judgment] → [4. Structured Output]

Phase 1: 定位 Repo

Phase 1: Locate the Repo

用

web_search

搜索

site:github.com <project_name>

确认完整 org/repo

用

search-layer

（Deep 模式 + 意图感知）补充获取社区链接和非 GitHub 资源：

bash

python3 skills/search-layer/scripts/search.py \
  --queries "<project_name> review" "<project_name> 评测 使用体验" \
  --mode deep --intent exploratory --num 5

用
```
web_fetch
```
抓取 repo 主页获取基础信息（README、Stars、Forks、License、最近更新）

Use
```
web_search
```
to search
```
site:github.com <project_name>
```
to confirm the full org/repo path

Use

search-layer

(Deep Mode + Intent Awareness) to supplement community links and non-GitHub resources:

bash

python3 skills/search-layer/scripts/search.py \
  --queries "<project_name> review" "<project_name> evaluation user experience" \
  --mode deep --intent exploratory --num 5

Use
```
web_fetch
```
to crawl the repo homepage for basic information (README, Stars, Forks, License, latest updates)

Phase 2: 多源采集（并行）

Phase 2: Multi-Source Collection (Parallel)

⚠️ GitHub 页面抓取规则（强制）：GitHub repo 页面是 SPA（客户端渲染），

web_fetch

只能拿到导航栏壳子，禁止用 web_fetch 抓 github.com 的 repo 页面。一律使用 GitHub API：

README:

curl -s -H "Authorization: token {PAT}" -H "Accept: application/vnd.github.v3.raw" "https://api.github.com/repos/{owner}/{repo}/readme"

Repo 元数据:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}"

Issues:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/issues?state=all&sort=comments&per_page=10"

Commits:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/commits?per_page=10"

File tree:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"

PAT 见 TOOLS.md。

以下来源按需检查，有则采集，无则跳过：

来源	URL 模式	采集内容	建议工具
GitHub Repo	`github.com/{org}/{repo}`	README、About、Contributors	`web_fetch`
GitHub Issues	`github.com/{org}/{repo}/issues?q=sort:comments`	Top 3-5 高质量 Issue	`browser`
中文社区	微信/知乎/小红书	深度评测、使用经验	`content-extract`
技术博客	Medium/Dev.to	技术架构分析	`web_fetch` / `content-extract`
讨论区	V2EX/Reddit	用户反馈、槽点	`search-layer` （Deep 模式）

⚠️ GitHub Page Crawling Rules (Mandatory): GitHub repo pages are SPAs (client-side rendered),

web_fetch

can only get the navigation bar shell. Do NOT use web_fetch to crawl github.com repo pages. Always use the GitHub API:

README:

curl -s -H "Authorization: token {PAT}" -H "Accept: application/vnd.github.v3.raw" "https://api.github.com/repos/{owner}/{repo}/readme"

Repo Metadata:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}"

Issues:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/issues?state=all&sort=comments&per_page=10"

Commits:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/commits?per_page=10"

File Tree:

curl -s -H "Authorization: token {PAT}" "https://api.github.com/repos/{owner}/{repo}/git/trees/{branch}?recursive=1"

See TOOLS.md for PAT.

Check the following sources as needed, collect if available, skip if not:

Source	URL Pattern	Collected Content	Recommended Tool
GitHub Repo	`github.com/{org}/{repo}`	README, About, Contributors	`web_fetch`
GitHub Issues	`github.com/{org}/{repo}/issues?q=sort:comments`	Top 3-5 high-quality Issues	`browser`
Chinese Communities	WeChat/Zhihu/Xiaohongshu	In-depth reviews, usage experience	`content-extract`
Technical Blogs	Medium/Dev.to	Technical architecture analysis	`web_fetch` / `content-extract`
Discussion Forums	V2EX/Reddit	User feedback, pain points	`search-layer` (Deep Mode)

search-layer 调用规范

search-layer Calling Specifications

search-layer v2 支持意图感知评分。github-explorer 场景下的推荐用法：

场景	命令	说明
项目调研（默认）	`python3 skills/search-layer/scripts/search.py --queries "<project> review" "<project> 评测" --mode deep --intent exploratory --num 5`	多查询并行，按权威性排序
最新动态	`python3 skills/search-layer/scripts/search.py "<project> latest release" --mode deep --intent status --freshness pw --num 5`	优先新鲜度，过滤一周内
竞品对比	`python3 skills/search-layer/scripts/search.py --queries "<project> vs <competitor>" "<project> alternatives" --mode deep --intent comparison --num 5`	对比意图，关键词+权威双权重
快速查链接	`python3 skills/search-layer/scripts/search.py "<project> official docs" --mode fast --intent resource --num 3`	精确匹配，最快
社区讨论	`python3 skills/search-layer/scripts/search.py "<project> discussion experience" --mode deep --intent exploratory --domain-boost reddit.com,news.ycombinator.com --num 5`	加权社区站点

意图类型速查：

factual

(事实) /

status

(动态) /

comparison

(对比) /

tutorial

(教程) /

exploratory

(探索) /

news

(新闻) /

resource

(资源定位)

不带
--intent
时行为与 v1 完全一致（无评分，按原始顺序输出）。

降级规则：Exa/Tavily 任一 429/5xx → 继续用剩余源；脚本整体失败 → 退回

web_search

单源。

search-layer v2 supports intent-aware scoring. Recommended usage for github-explorer scenarios:

Scenario	Command	Description
Project Research (Default)	`python3 skills/search-layer/scripts/search.py --queries "<project> review" "<project> evaluation" --mode deep --intent exploratory --num 5`	Parallel multi-query, sorted by authority
Latest Updates	`python3 skills/search-layer/scripts/search.py "<project> latest release" --mode deep --intent status --freshness pw --num 5`	Prioritize freshness, filter content from the past week
Competitor Comparison	`python3 skills/search-layer/scripts/search.py --queries "<project> vs <competitor>" "<project> alternatives" --mode deep --intent comparison --num 5`	Comparison intent, dual weighting of keywords and authority
Quick Link Lookup	`python3 skills/search-layer/scripts/search.py "<project> official docs" --mode fast --intent resource --num 3`	Exact match, fastest speed
Community Discussions	`python3 skills/search-layer/scripts/search.py "<project> discussion experience" --mode deep --intent exploratory --domain-boost reddit.com,news.ycombinator.com --num 5`	Weighted community sites

Intent Type Quick Reference:

factual

(Factual) /

status

(Status) /

comparison

(Comparison) /

tutorial

(Tutorial) /

exploratory

(Exploratory) /

news

(News) /

resource

(Resource Locator)

Without
--intent
, the behavior is exactly the same as v1 (no scoring, output in original order).

Degradation Rules: If either Exa/Tavily returns 429/5xx → continue using remaining sources; if the entire script fails → fall back to single-source

web_search

.

抓取降级与增强协议 (Extraction Upgrade)

Extraction Upgrade and Degradation Protocol

当遇到以下情况时，必须从

web_fetch

升级为

content-extract

：

域名限制:

mp.weixin.qq.com

,

zhihu.com

,

xiaohongshu.com

。

结构复杂: 页面包含大量公式 (LaTeX)、复杂表格、或
```
web_fetch
```
返回的 Markdown 极其凌乱。
内容缺失:
```
web_fetch
```
因反爬返回空内容或 Challenge 页面。

调用方式：

bash

python3 skills/content-extract/scripts/content_extract.py --url <URL>

content-extract 内部会：

先检查域名白名单（微信/知乎等），命中则直接走 MinerU
否则先用
```
web_fetch
```
探针，失败再 fallback 到 MinerU-HTML
返回统一 JSON 合同（含
```
ok
```
,
```
markdown
```
,
```
sources
```
等字段）

Must upgrade from

web_fetch

to

content-extract

when encountering the following situations:

Domain Restrictions:

mp.weixin.qq.com

,

zhihu.com

,

xiaohongshu.com

.

Complex Structure: Pages contain a large number of formulas (LaTeX), complex tables, or the Markdown returned by
```
web_fetch
```
is extremely messy.
Content Missing:
```
web_fetch
```
returns empty content or a Challenge page due to anti-crawling measures.

Calling Method:

bash

python3 skills/content-extract/scripts/content_extract.py --url <URL>

content-extract internally:

First checks the domain whitelist (WeChat/Zhihu, etc.), uses MinerU directly if matched
Otherwise, first uses
```
web_fetch
```
for probing, falls back to MinerU-HTML if failed
Returns a unified JSON contract (including fields like
```
ok
```
,
```
markdown
```
,
```
sources
```
)

Phase 3: 分析研判

Phase 3: Analysis and Judgment

基于采集数据进行判断：

项目阶段: 早期实验 / 快速成长 / 成熟稳定 / 维护模式 / 停滞（基于 commit 频率和内容）
精选 Issue 标准: 评论数多、maintainer 参与、暴露架构问题、或包含有价值的技术讨论
竞品识别: 从 README 的 "Comparison"/"Alternatives" 章节、Issues 讨论、以及 web 搜索中提取

Make judgments based on collected data:

Project Phase: Early Experimentation / Rapid Growth / Mature & Stable / Maintenance Mode / Stagnant (based on commit frequency and content)
High-Quality Issue Criteria: High number of comments, maintainer participation, exposes architecture issues, or contains valuable technical discussions
Competitor Identification: Extract from the "Comparison"/"Alternatives" section of the README, Issue discussions, and web searches

Phase 4: 结构化输出

Phase 4: Structured Output

严格按以下模板输出，每个模块都必须有实质内容或明确标注"未找到"。

Strictly follow the template below, each module must have substantive content or clearly marked "Not Found".

排版规则（强制）

Formatting Rules (Mandatory)

标题必须链接到 GitHub 仓库（格式：
```
# [Project Name](https://github.com/org/repo)
```
，确保可点击跳转）
标题前后都统一空行（上一板块结尾 → 空行 → 标题 → 空行 → 内容，确保视觉分隔清晰）
Telegram 空行修复（强制）：Telegram 会吞掉列表项（
```
-
```
开头）后面的空行。解决方案：在列表末尾与下一个标题之间，插入一行盲文空格
```
⠀
```
（U+2800），格式如下：
```
- 列表最后一项

⠀
**下一个标题**
```
这确保在 Telegram 渲染时标题前的空行不被吞掉。
所有标题加粗（emoji + 粗体文字）
竞品对比必须附链接（GitHub / 官网 / 文档，至少一个）
社区声量必须具体：引用具体的帖子/推文/讨论内容摘要，附原始链接。不要写"评价很高"、"热度很高"这种概括性描述，要写"某某说了什么"或"某帖讨论了什么具体问题"
信息溯源原则：所有引用的外部信息都应附上原始链接，让读者能追溯到源头

markdown

undefined

Title Must Link to GitHub Repository (Format:
```
# [Project Name](https://github.com/org/repo)
```
, ensure clickable jump)
Uniform Empty Lines Around Titles (End of previous section → empty line → title → empty line → content, ensure clear visual separation)
Telegram Empty Line Fix (Mandatory): Telegram will swallow empty lines after list items (starting with
```
-
```
). Solution: Insert a line of braille space
```
⠀
```
(U+2800) between the end of the list block and the next title, format as follows:
```
- Last item in list

⠀
**Next Title**
```
This ensures the empty line before the title is not swallowed during Telegram rendering.
All Titles Bold (emoji + bold text)
Competitor Comparison Must Include Links (GitHub / official website / documentation, at least one)
Community Volume Must Be Specific: Quote specific post/tweet/discussion content summaries, attach original links. Do not use general descriptions like "highly praised" or "very popular"; instead, write "someone said something" or "a certain post discussed specific issues"
Information Traceability Principle: All quoted external information should include the original link so readers can trace the source

markdown

undefined

[{Project Name}]({GitHub Repo URL})

[{Project Name}]({GitHub Repo URL})

🎯 一句话定位

{是什么、解决什么问题}

⚙️ 核心机制

{技术原理/架构，用人话讲清楚，不是复制 README。包含关键技术栈。}

📊 项目健康度

Stars: {数量} | Forks: {数量} | License: {类型}
团队/作者: {背景}
Commit 趋势: {最近活跃度 + 项目阶段判断}
最近动态: {最近几条重要 commit 概述}

🔥 精选 Issue

{Top 3-5 高质量 Issue，每条包含标题、链接、核心讨论点。如无高质量 Issue 则注明。}

✅ 适用场景

{什么时候该用，解决什么具体问题}

⚠️ 局限

{什么时候别碰，已知问题}

🆚 竞品对比

{同赛道项目对比，差异点。每个竞品必须附 GitHub 或官网链接，格式示例：}

vs GraphRAG — 差异描述
vs RAGFlow — 差异描述

🌐 知识图谱

DeepWiki: {链接或"未收录"}
Zread.ai: {链接或"未收录"}

🎬 Demo

{在线体验链接，或"无"}

📄 关联论文

{arXiv 链接，或"无"}

📰 社区声量

X/Twitter

{具体引用推文内容摘要 + 链接，格式示例：}

@某用户: "具体说了什么..."
某讨论串: 讨论了什么具体问题... {如未找到则注明"未找到相关讨论"}

中文社区

{具体引用帖子标题/内容摘要 + 链接，格式示例：}

知乎: 帖子标题 — 讨论了什么
V2EX: 帖子标题 — 讨论了什么 {如未找到则注明"未找到相关讨论"}

💬 我的判断

{主观评价：值不值得投入时间，适合什么水平的人，建议怎么用}

undefined

🎯 One-sentence Positioning

{What it is, what problem it solves}

⚙️ Core Mechanism

{Technical principles/architecture, explained in plain language, not copied from README. Include key tech stack.}

📊 Project Health

Stars: {Number} | Forks: {Number} | License: {Type}
Team/Author: {Background}
Commit Trend: {Recent activity + project phase judgment}
Latest Updates: {Overview of recent important commits}

🔥 Selected Issues

{Top 3-5 high-quality Issues, each including title, link, core discussion points. Note if no high-quality Issues are available.}

✅ Applicable Scenarios

{When to use it, what specific problems it solves}

⚠️ Limitations

{When to avoid it, known issues}

🆚 Competitor Comparison

{Comparison with same-track projects, differences. Each competitor must include a GitHub or official website link, example format:}

vs GraphRAG — Difference description
vs RAGFlow — Difference description

🌐 Knowledge Graph

DeepWiki: {Link or "Not Included"}
Zread.ai: {Link or "Not Included"}

🎬 Demo

{Online experience link, or "None"}

📄 Related Papers

{arXiv link, or "None"}

📰 Community Volume

X/Twitter

{Quote specific tweet content summaries + links, example format:}

@Username: "Specific content..."
Discussion Thread: Discussed specific issues... {Note "No relevant discussions found" if none are available}

Chinese Communities

{Quote specific post titles/content summaries + links, example format:}

Zhihu: Post Title — Discussed content
V2EX: Post Title — Discussed content {Note "No relevant discussions found" if none are available}

💬 My Judgment

{Subjective evaluation: Whether it's worth investing time in, suitable for what level of users, suggestions on how to use it}

undefined

Execution Notes

Execution Notes

优先使用
```
web_search
```
+
```
web_fetch
```
，browser 作为备选
搜索增强：项目调研类任务默认使用
```
search-layer
```
v2 Deep 模式 +
```
--intent exploratory
```
（Brave + Exa + Tavily 三源并行去重 + 意图感知评分），单源失败不阻塞主流程
抓取降级（强制）：当
```
web_fetch
```
失败/403/反爬页/正文过短，或来源域名属于高风险站点（如微信/知乎/小红书）时：改用
```
content-extract
```
（其内部会 fallback 到 MinerU-HTML），拿到更干净的 Markdown + 可追溯 sources
并行采集不同来源以提高效率
所有链接必须真实可访问，不要编造 URL
中文输出，技术术语保留英文

Prioritize using
```
web_search
```
+
```
web_fetch
```
, with
```
browser
```
as a fallback
Search Enhancement: For project research tasks, default to
```
search-layer
```
v2 Deep Mode +
```
--intent exploratory
```
(Brave + Exa + Tavily three-source parallel deduplication + intent-aware scoring), single-source failure does not block the main process
Mandatory Crawling Degradation: When
```
web_fetch
```
fails/returns 403/anti-crawling page/too short content, or the source domain belongs to high-risk sites (such as WeChat/Zhihu/Xiaohongshu): switch to
```
content-extract
```
(which internally falls back to MinerU-HTML) to get cleaner Markdown + traceable sources
Collect from different sources in parallel to improve efficiency
All links must be valid and accessible, do not fabricate URLs
Output in Chinese, retain English for technical terms

⚠️ 输出自检清单（强制，每次输出前逐条核对）

⚠️ Output Self-Check List (Mandatory, Check Item by Item Before Each Output)

输出报告前，必须逐条检查以下项目，全部通过才可发送：

标题链接：
```
# [Project Name](GitHub URL)
```
格式，可点击跳转
标题空行：每个粗体标题（
```
**🎯 ...**
```
）前后各有一个空行
Telegram 空行：每个列表块末尾与下一个标题之间有盲文空格
```
⠀
```
行（防止 Telegram 吞空行）
Issue 链接：精选 Issue 每条都有完整
```
[#号 标题](完整URL)
```
格式
竞品链接：每个竞品都附
```
[名称](GitHub/官网链接)
```
社区声量链接：每条引用都有
```
[来源: 标题](URL)
```
格式
无空泛描述：社区声量部分没有"评价很高"、"热度很高"等概括性描述
信息溯源：所有外部引用都附原始链接

Before sending the output report, must check the following items one by one, send only if all are passed:

Title Link: In the format
```
# [Project Name](GitHub URL)
```
, clickable for jump
Empty Lines Around Titles: Each bold title (
```
**🎯 ...**
```
) has one empty line before and after
Telegram Empty Line: There is a braille space
```
⠀
```
line between the end of each list block and the next title (prevents Telegram from swallowing empty lines)
Issue Links: Each selected Issue is in the complete format
```
[#Number Title](Full URL)
```
Competitor Links: Each competitor is attached with
```
[Name](GitHub/Official Website Link)
```
Community Volume Links: Each quote is in the format
```
[Source: Title](URL)
```
No Vague Descriptions: No general descriptions like "highly praised" or "very popular" in the community volume section
Information Traceability: All external quotes are attached with original links

Dependencies

Dependencies

本 Skill 依赖以下 OpenClaw 工具和 Skills：

依赖	类型	用途
`web_search`	内置工具	Brave Search 检索
`web_fetch`	内置工具	网页内容抓取
`browser`	内置工具	动态页面渲染（备选）
`search-layer`	Skill	多源搜索 + 意图感知评分（Brave + Exa + Tavily + Grok），v2.1 支持 `--intent` / `--queries` / `--freshness`
`content-extract`	Skill	高保真内容提取（反爬站点降级方案）

This Skill depends on the following OpenClaw tools and Skills:

Dependency	Type	Purpose
`web_search`	Built-in Tool	Brave Search retrieval
`web_fetch`	Built-in Tool	Web content crawling
`browser`	Built-in Tool	Dynamic page rendering (fallback)
`search-layer`	Skill	Multi-source search + intent-aware scoring (Brave + Exa + Tavily + Grok), v2.1 supports `--intent` / `--queries` / `--freshness`
`content-extract`	Skill	High-fidelity content extraction (degradation solution for anti-crawling sites)