github-repo-search

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GitHub 开源项目搜索助手

GitHub Open-Source Project Search Assistant

用途

Purpose

从用户自然语言需求出发，经过需求挖掘、检索词拆解、GitHub 检索、过滤分类、深度解读，最终产出结构化推荐结果。

目标不是"给很多链接"，而是"给用户可理解、可比较、可决策、可直接行动的候选仓库列表"。

Starting from users' natural language requirements, through demand mining, search term decomposition, GitHub retrieval, filtering and classification, in-depth interpretation, and finally produce structured recommendation results.

The goal is not to "provide many links", but to "deliver a list of candidate repositories that users can understand, compare, make decisions on, and take direct actions with."

适用范围（V1.1）

Scope of Application (V1.1)

数据源：GitHub 公开仓库。
默认不授权（不使用用户 Token）。
默认硬过滤：
```
stars >= 100
```
、
```
archived=false
```
、
```
is:public
```
。
默认输出：单榜单（Top N），榜单内按"仓库归属类型"标注。
本流程默认不包含安装与落地实施（除非用户单独提出）。

Data source: GitHub public repositories.
Default no authorization (does not use user Token).
Default hard filters:
```
stars >= 100
```
,
```
archived=false
```
,
```
is:public
```
.
Default output: Single list (Top N), with "repository ownership type" labeled within the list.
This process does not include installation and implementation by default (unless the user requests it separately).

配额说明（必须知晓）

Quota Notes (Must Know)

未授权 Core API：
```
60 次/小时
```
。
Search API：
```
10 次/分钟
```
（独立于 Core 额度）。
需要在报告中注明检索时间与配额状态，避免结果不可复现。

Unauthorized Core API:
```
60 requests/hour
```
.
Search API:
```
10 requests/minute
```
(independent of Core quota).
It is necessary to note the retrieval time and quota status in the report to avoid non-reproducible results.

工作流程

Workflow

环节一：需求收敛（必须完成，不可跳过）

Phase 1: Demand Convergence (Mandatory, Cannot Be Skipped)

硬性门控：环节一是整个流程的前置条件。无论用户的需求描述多么清晰，都必须走完本环节并获得用户明确确认后，才能进入环节二。禁止根据用户的初始描述直接推断需求并开始检索。即使用户说"直接搜就行"，也要先输出需求摘要让用户确认。

Hard Gatekeeping: Phase 1 is the precondition for the entire process. No matter how clear the user's demand description is, this phase must be completed and confirmed explicitly by the user before entering Phase 2. It is forbidden to directly infer the demand based on the user's initial description and start retrieval. Even if the user says "Just search directly", you must first output the demand summary for the user to confirm.

第一步：需求挖掘与对齐

Step 1: Demand Mining and Alignment

目标：把"我想看看 XX"转成可执行、可排序、可解释的检索目标。

需确认信息（最少）：

主题（如：agent 记忆、RAG、浏览器自动化）
数量（Top 10 / Top 20）
最低 stars（默认 100）
排序模式（必须二选一）：
```
相关性优先
```
/
```
星标优先
```
（默认：相关性优先）

目标形态（必须二选一或多选）：

可直接使用的产品

可二次开发的框架

资料清单/方法论

建议补充信息（可选）：

偏好技术栈（Python/TS/Go 等）
使用场景（学习、生产、对标）
排除项（教程仓库、归档仓库、纯论文复现等）
部署偏好（本地优先/云端优先/混合）

阶段输出（固定格式）：

text

核心诉求：
- 主题：xxx
- 数量：Top N
- 最低 stars：>= 100
- 排序模式：相关性优先 / 星标优先（默认：相关性优先）
- 目标形态：xxx
- 偏好：xxx（可空）
- 排除：xxx（可空）

向用户确认以上信息。用户明确确认后才能进入环节二，否则停在这里继续对齐。

Goal: Convert "I want to see XX" into an executable, sortable, and interpretable retrieval target.

Required Confirmation Information (Minimum):

Topic (e.g.: agent memory, RAG, browser automation)
Quantity (Top 10 / Top 20)
Minimum stars (default 100)
Sorting mode (must choose one):
```
Relevance first
```
/
```
Stars first
```
(default: Relevance first)

Target form (must choose one or multiple):

Ready-to-use product

Framework for secondary development

Resource list/methodology

Recommended Supplementary Information (Optional):

Preferred tech stack (Python/TS/Go, etc.)
Usage scenario (learning, production, benchmarking)
Exclusions (tutorial repositories, archived repositories, pure paper reproductions, etc.)
Deployment preference (local-first / cloud-first / hybrid)

Phase Output (Fixed Format):

text

Core Requirements:
- Topic: xxx
- Quantity: Top N
- Minimum stars: >= 100
- Sorting mode: Relevance first / Stars first (default: Relevance first)
- Target form: xxx
- Preferences: xxx (optional)
- Exclusions: xxx (optional)

Confirm the above information with the user. Only proceed to Phase 2 after the user confirms explicitly; otherwise, stay here to continue alignment.

环节二：检索执行（以下环节由模型自主执行，无需用户介入，直到环节四交付报告）

Phase 2: Retrieval Execution (The following phases are executed independently by the model, no user intervention is required until Phase 4 report delivery)

第二步：检索词拆解（5-10 组）

Step 2: Search Term Decomposition (5-10 groups)

目标：平衡"召回率"和"相关性"，避免只靠单词硬搜导致偏题。

拆词规则：

每组 query 由以下维度组合：

核心词：用户目标词
同义词：替代表达（如 long-term memory / stateful memory）
场景词：coding、mcp、tool、platform、awesome、curated
技术词：agent、sdk、framework、database、os
排除思路：不在 query 里硬写过多负例，放到后续过滤阶段

产出格式：

text

Query-1: "xxx"
目的：高召回核心主题

Query-2: "xxx"
目的：补同义词盲区

Goal: Balance "recall rate" and "relevance" to avoid off-topic results caused by only searching with single words.

Term Decomposition Rules:

Each query is a combination of the following dimensions:

Core term: User's target word
Synonyms: Alternative expressions (e.g. long-term memory / stateful memory)
Scenario terms: coding, mcp, tool, platform, awesome, curated
Technical terms: agent, sdk, framework, database, os
Exclusion approach: Do not write too many negative examples in the query; leave them to the subsequent filtering phase

Output Format:

text

Query-1: "xxx"
Purpose: High recall of core topic

Query-2: "xxx"
Purpose: Fill synonym blind spots

第三步：执行检索与候选召回

Step 3: Execute Retrieval and Candidate Recall

执行原则：

每组 query 都执行检索（建议每组 30-50 条）。
合并结果形成候选池。
按
```
owner/repo
```
去重。
记录检索时间与 API 额度信息。

候选池字段（最少）：

```
owner/repo
```
```
stars
```
```
description
```
```
repo_url
```
```
archived
```
```
language
```
```
updated_at
```
```
topics
```
```
license
```

Execution Principles:

Execute retrieval for each query (recommend 30-50 results per group).
Merge results to form a candidate pool.
Deduplicate by
```
owner/repo
```
.
Record retrieval time and API quota information.

Candidate Pool Fields (Minimum):

```
owner/repo
```
```
stars
```
```
description
```
```
repo_url
```
```
archived
```
```
language
```
```
updated_at
```
```
topics
```
```
license
```

第四步：去重与硬过滤

Step 4: Deduplication and Hard Filtering

硬过滤（默认）：

```
stars >= 100
```
```
archived = false
```
```
is:public
```

可选硬过滤（按需）：

```
fork = false
```
指定语言：
```
language:xxx
```
更新时效：最近 6-12 个月

Default Hard Filters:

```
stars >= 100
```
```
archived = false
```
```
is:public
```

Optional Hard Filters (As Needed):

```
fork = false
```
Specified language:
```
language:xxx
```
Update timeliness: Last 6-12 months

环节三：质量精炼

Phase 3: Quality Refinement

第五步：噪音剔除与相关性重排

Step 5: Noise Removal and Relevance Re-ranking

目标：解决"命中 memory 但其实不是 agent memory"的噪音问题。

噪音剔除规则（示例）：

与主题无关的通用工程仓库（即使 stars 很高）
关键词误命中仓库（仅描述中偶然出现 memory/agent）
无实质内容或异常仓库

排序原则（V1.1）：

star

不再作为主排序，只作为召回门槛之一。建议综合排序权重：

需求相关性：35%
场景适用性：30%
活跃度（更新时效）：15%
工程成熟度（文档/示例/可维护）：15%
stars：5%

Goal: Solve the noise problem of "hitting memory but not actually agent memory".

Noise Removal Rules (Examples):

General engineering repositories unrelated to the topic (even with high stars)
Repositories with accidental keyword hits (only occasionally mention memory/agent in description)
Repositories with no substantial content or abnormal status

Sorting Principles (V1.1):

star

is no longer the primary sorting factor, only one of the recall thresholds. Recommended comprehensive sorting weights:

Demand relevance: 35%
Scenario applicability: 30%
Activity (update timeliness): 15%
Engineering maturity (documentation/examples/maintainability): 15%
stars: 5%

第六步：仓库归属类型分类（必须）

Step 6: Repository Ownership Type Classification (Mandatory)

目标：让用户一眼看懂"这个仓库到底是什么角色"，避免把框架、应用、目录混为一谈。

推荐类型字典：

通用框架层
应用产品层（可直接使用）
记忆层/上下文基础设施
MCP 服务层
目录清单层（awesome/curated）
垂直场景方案层
方法论/研究层

Goal: Allow users to immediately understand "what role this repository plays", avoiding mixing frameworks, applications, and directories together.

Recommended Type Dictionary:

General framework layer
Application product layer (ready-to-use)
Memory layer/context infrastructure
MCP service layer
Directory list layer (awesome/curated)
Vertical scenario solution layer
Methodology/research layer

第七步：深读与项目介绍撰写（必须）

Step 7: In-depth Reading and Project Introduction Writing (Mandatory)

目标：不是"仓库简介复述"，而是输出"对用户有决策价值"的详细介绍。

深读最低要求：

每个入选仓库至少查看：

README 核心定位段
快速开始/功能章节标题
近期维护信号（更新时间、Issue/PR 活跃）

项目介绍写作要求（固定）：

"项目介绍"必须包含两部分并写细：

这是什么：它在系统架构中的角色和边界
为什么推荐：它在用户当前目标下的价值（不是泛泛优点）

可补充：

典型适用场景（1-2 条）
限制或不适用场景（1 条）

Goal: Not a "repetition of repository introduction", but output detailed introductions that "have decision-making value for users".

Minimum Requirements for In-depth Reading:

For each selected repository, at least check:

Core positioning section of README
Quick start/function chapter titles
Recent maintenance signals (update time, Issue/PR activity)

Writing Requirements for Project Introduction (Fixed):

The "Project Introduction" must include two detailed parts:

What it is: Its role and boundaries in the system architecture
Why it is recommended: Its value for the user's current goal (not general advantages)

Optional supplements:

Typical applicable scenarios (1-2 items)
Limitations or unsuitable scenarios (1 item)

环节四：交付与迭代

Phase 4: Delivery and Iteration

第八步：单榜生成与报告交付（最终）

Step 8: Single List Generation and Report Delivery (Final)

交付结构（固定）：

需求摘要
检索词清单（5-10 组 + 目的）
筛选与重排规则（明确写出）
结果总览（原始召回/去重后/过滤后）
Top N 单榜（表格）
结论与下一步建议

Top N 表格字段（固定）：

仓库	星标	仓库归属类型	项目介绍（是什么 + 推荐理由）	其它信息补充	链接

"其它信息补充"建议内容：

语言 / License / 最近更新时间
上手复杂度（低/中/高）
风险提示（若有）

Delivery Structure (Fixed):

Demand summary
List of search terms (5-10 groups + purposes)
Filtering and re-ranking rules (clearly stated)
Result overview (original recall / after deduplication / after filtering)
Top N single list (table)
Conclusion and next-step suggestions

Fixed Fields for Top N Table:

Repository	Stars	Repository Ownership Type	Project Introduction (What it is + Recommendation Reason)	Additional Information	Link

Recommended Content for "Additional Information":

Language / License / Last update time
Complexity of getting started (low/medium/high)
Risk warning (if any)

第九步：用户确认与迭代（可选）

Step 9: User Confirmation and Iteration (Optional)

迭代触发条件：

用户反馈"太泛/太窄/不够准/解释不够细"。

迭代动作：

调整检索词（增加场景词或同义词）
调整 stars 门槛（100 -> 200/500）
增加限定（语言/方向/更新时间）
调整类型权重（例如优先应用层或优先框架层）

Iteration Trigger Conditions:

User feedback such as "Too broad/Too narrow/Not accurate enough/Explanations are not detailed enough".

Iteration Actions:

Adjust search terms (add scenario terms or synonyms)
Adjust stars threshold (100 -> 200/500)
Add restrictions (language/direction/update time)
Adjust type weights (e.g., prioritize application layer or framework layer)

默认参数（V1.1）

Default Parameters (V1.1)

最低 stars：
```
100
```
默认输出：
```
Top 10
```
默认过滤：
```
archived=false
```
默认必须分类：是
默认项目介绍粒度：详细（至少"是什么 + 为什么推荐"）

Minimum stars:
```
100
```
Default output:
```
Top 10
```
Default filter:
```
archived=false
```
Default mandatory classification: Yes
Default granularity of project introduction: Detailed (at least "What it is + Why it is recommended")

质量检查清单（交付前自检）

Quality Check List (Self-inspection before delivery)

是否完成需求对齐并明确"目标形态"
是否有 5-10 组 query 且每组有目的
是否记录了检索时间与配额状态
是否执行了去重、硬过滤和噪音剔除
是否完成仓库归属类型分类
是否每个推荐都有详细项目介绍（不是一句话）
是否使用固定表格字段交付
是否避免把安装实施混入本流程

Whether demand alignment is completed and "target form" is confirmed
Whether there are 5-10 groups of queries with clear purposes
Whether retrieval time and quota status are recorded
Whether deduplication, hard filtering and noise removal are executed
Whether repository ownership type classification is completed
Whether each recommendation has a detailed project introduction (not a single sentence)
Whether the fixed table fields are used for delivery
Whether installation and implementation are avoided in this process",