GitHub Open-Source Project Search Assistant

Purpose

Starting from users' natural language requirements, through demand mining, search term decomposition, GitHub retrieval, filtering and classification, in-depth interpretation, and finally produce structured recommendation results.

The goal is not to "provide many links", but to "deliver a list of candidate repositories that users can understand, compare, make decisions on, and take direct actions with."

Scope of Application (V1.1)

Data source: GitHub public repositories.
Default no authorization (does not use user Token).
Default hard filters:
```
stars >= 100
```
,
```
archived=false
```
,
```
is:public
```
.
Default output: Single list (Top N), with "repository ownership type" labeled within the list.
This process does not include installation and implementation by default (unless the user requests it separately).

Quota Notes (Must Know)

Unauthorized Core API:
```
60 requests/hour
```
.
Search API:
```
10 requests/minute
```
(independent of Core quota).
It is necessary to note the retrieval time and quota status in the report to avoid non-reproducible results.

Workflow

Phase 1: Demand Convergence (Mandatory, Cannot Be Skipped)

Hard Gatekeeping: Phase 1 is the precondition for the entire process. No matter how clear the user's demand description is, this phase must be completed and confirmed explicitly by the user before entering Phase 2. It is forbidden to directly infer the demand based on the user's initial description and start retrieval. Even if the user says "Just search directly", you must first output the demand summary for the user to confirm.

Step 1: Demand Mining and Alignment

Goal: Convert "I want to see XX" into an executable, sortable, and interpretable retrieval target.

Required Confirmation Information (Minimum):

Topic (e.g.: agent memory, RAG, browser automation)
Quantity (Top 10 / Top 20)
Minimum stars (default 100)
Sorting mode (must choose one):
```
Relevance first
```
/
```
Stars first
```
(default: Relevance first)

Target form (must choose one or multiple):

Ready-to-use product

Framework for secondary development

Resource list/methodology

Recommended Supplementary Information (Optional):

Preferred tech stack (Python/TS/Go, etc.)
Usage scenario (learning, production, benchmarking)
Exclusions (tutorial repositories, archived repositories, pure paper reproductions, etc.)
Deployment preference (local-first / cloud-first / hybrid)

Phase Output (Fixed Format):

text

Core Requirements:
- Topic: xxx
- Quantity: Top N
- Minimum stars: >= 100
- Sorting mode: Relevance first / Stars first (default: Relevance first)
- Target form: xxx
- Preferences: xxx (optional)
- Exclusions: xxx (optional)

Confirm the above information with the user. Only proceed to Phase 2 after the user confirms explicitly; otherwise, stay here to continue alignment.

Phase 2: Retrieval Execution (The following phases are executed independently by the model, no user intervention is required until Phase 4 report delivery)

Step 2: Search Term Decomposition (5-10 groups)

Goal: Balance "recall rate" and "relevance" to avoid off-topic results caused by only searching with single words.

Term Decomposition Rules:

Each query is a combination of the following dimensions:

Core term: User's target word
Synonyms: Alternative expressions (e.g. long-term memory / stateful memory)
Scenario terms: coding, mcp, tool, platform, awesome, curated
Technical terms: agent, sdk, framework, database, os
Exclusion approach: Do not write too many negative examples in the query; leave them to the subsequent filtering phase

Output Format:

text

Query-1: "xxx"
Purpose: High recall of core topic

Query-2: "xxx"
Purpose: Fill synonym blind spots

Step 3: Execute Retrieval and Candidate Recall

Execution Principles:

Execute retrieval for each query (recommend 30-50 results per group).
Merge results to form a candidate pool.
Deduplicate by
```
owner/repo
```
.
Record retrieval time and API quota information.

Candidate Pool Fields (Minimum):

```
owner/repo
```
```
stars
```
```
description
```
```
repo_url
```
```
archived
```
```
language
```
```
updated_at
```
```
topics
```
```
license
```

Step 4: Deduplication and Hard Filtering

Default Hard Filters:

```
stars >= 100
```
```
archived = false
```
```
is:public
```

Optional Hard Filters (As Needed):

```
fork = false
```
Specified language:
```
language:xxx
```
Update timeliness: Last 6-12 months

Phase 3: Quality Refinement

Step 5: Noise Removal and Relevance Re-ranking

Goal: Solve the noise problem of "hitting memory but not actually agent memory".

Noise Removal Rules (Examples):

General engineering repositories unrelated to the topic (even with high stars)
Repositories with accidental keyword hits (only occasionally mention memory/agent in description)
Repositories with no substantial content or abnormal status

Sorting Principles (V1.1):

star

is no longer the primary sorting factor, only one of the recall thresholds. Recommended comprehensive sorting weights:

Demand relevance: 35%
Scenario applicability: 30%
Activity (update timeliness): 15%
Engineering maturity (documentation/examples/maintainability): 15%
stars: 5%

Step 6: Repository Ownership Type Classification (Mandatory)

Goal: Allow users to immediately understand "what role this repository plays", avoiding mixing frameworks, applications, and directories together.

Recommended Type Dictionary:

General framework layer
Application product layer (ready-to-use)
Memory layer/context infrastructure
MCP service layer
Directory list layer (awesome/curated)
Vertical scenario solution layer
Methodology/research layer

Step 7: In-depth Reading and Project Introduction Writing (Mandatory)

Goal: Not a "repetition of repository introduction", but output detailed introductions that "have decision-making value for users".

Minimum Requirements for In-depth Reading:

For each selected repository, at least check:

Core positioning section of README
Quick start/function chapter titles
Recent maintenance signals (update time, Issue/PR activity)

Writing Requirements for Project Introduction (Fixed):

The "Project Introduction" must include two detailed parts:

What it is: Its role and boundaries in the system architecture
Why it is recommended: Its value for the user's current goal (not general advantages)

Optional supplements:

Typical applicable scenarios (1-2 items)
Limitations or unsuitable scenarios (1 item)

Phase 4: Delivery and Iteration

Step 8: Single List Generation and Report Delivery (Final)

Delivery Structure (Fixed):

Demand summary
List of search terms (5-10 groups + purposes)
Filtering and re-ranking rules (clearly stated)
Result overview (original recall / after deduplication / after filtering)
Top N single list (table)
Conclusion and next-step suggestions

Fixed Fields for Top N Table:

Repository	Stars	Repository Ownership Type	Project Introduction (What it is + Recommendation Reason)	Additional Information	Link

Recommended Content for "Additional Information":

Language / License / Last update time
Complexity of getting started (low/medium/high)
Risk warning (if any)

Step 9: User Confirmation and Iteration (Optional)

Iteration Trigger Conditions:

User feedback such as "Too broad/Too narrow/Not accurate enough/Explanations are not detailed enough".

Iteration Actions:

Adjust search terms (add scenario terms or synonyms)
Adjust stars threshold (100 -> 200/500)
Add restrictions (language/direction/update time)
Adjust type weights (e.g., prioritize application layer or framework layer)

Default Parameters (V1.1)

Minimum stars:
```
100
```
Default output:
```
Top 10
```
Default filter:
```
archived=false
```
Default mandatory classification: Yes
Default granularity of project introduction: Detailed (at least "What it is + Why it is recommended")

Quality Check List (Self-inspection before delivery)

Whether demand alignment is completed and "target form" is confirmed
Whether there are 5-10 groups of queries with clear purposes
Whether retrieval time and quota status are recorded
Whether deduplication, hard filtering and noise removal are executed
Whether repository ownership type classification is completed
Whether each recommendation has a detailed project introduction (not a single sentence)
Whether the fixed table fields are used for delivery
Whether installation and implementation are avoided in this process",

github-repo-search

NPX Install

Tags

SKILL.md Content (Chinese)