huggingface-best
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHuggingFace Best Model Finder
HuggingFace最佳模型查找工具
Finds the best models for a task by querying official HF benchmark leaderboards, enriching
results with model size data, filtering for what fits on the user's device, and returning a
comparison table with benchmark scores.
通过查询官方HF基准排行榜,结合模型规模数据丰富结果,根据用户设备进行筛选,并返回带有基准分数的对比表格,为任务找到最佳模型。
Step 1: Parse the request
步骤1:解析请求
Extract from the user's message:
- Task: what they want the model to do (coding, math/reasoning, chat, OCR, RAG/retrieval, speech recognition, image classification, multimodal, agents, etc.)
- Device: hardware constraints (MacBook M-series 8/16/32/64GB unified memory, RTX GPU with VRAM amount, CPU-only, cloud/no constraint, etc.)
If device is not mentioned, skip filtering entirely and return the highest-performing models regardless of size. If the task is genuinely ambiguous, ask one clarifying question.
从用户消息中提取:
- 任务:用户希望模型完成的工作(编码、数学/推理、聊天、OCR、RAG/检索、语音识别、图像分类、多模态、Agent等)
- 设备:硬件限制(MacBook M系列8/16/32/64GB统一内存、带特定VRAM容量的RTX GPU、仅CPU、云端/无限制等)
如果未提及设备,则完全跳过筛选,返回性能最高的模型,无论其规模如何。如果任务确实不明确,提出一个澄清问题。
Device → max parameter budget
设备 → 最大参数预算
When a device is specified, extract its available memory (unified RAM for Apple Silicon, VRAM for discrete GPUs) and apply:
- fp16 max params (B) ≈ memory (GB) ÷ 2
- Q4 max params (B) ≈ memory (GB) × 2
Examples: 16GB → 8B fp16 / 32B Q4 — 24GB VRAM → 12B fp16 / 48B Q4 — 8GB → 4B fp16 / 16B Q4
当指定设备时,提取其可用内存(Apple Silicon的统一RAM、独立GPU的VRAM)并应用以下规则:
- fp16最大参数(B) ≈ 内存(GB)÷ 2
- Q4最大参数(B) ≈ 内存(GB)× 2
示例:16GB → 8B fp16 / 32B Q4 — 24GB VRAM → 12B fp16 / 48B Q4 — 8GB → 4B fp16 / 16B Q4
Step 2: Find relevant benchmark datasets
步骤2:查找相关基准数据集
Fetch the full list of official HF benchmarks:
bash
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/datasets?filter=benchmark:official&limit=500" | jq '[.[] | {id, tags, description}]'Read the returned list and select the datasets most relevant to the user's task — match on dataset id, tags, and description. Use your judgment; don't limit yourself to 2-3. Aim for comprehensive coverage: if 5 benchmarks clearly cover the task, use all 5.
获取官方HF基准的完整列表:
bash
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/datasets?filter=benchmark:official&limit=500" | jq '[.[] | {id, tags, description}]'读取返回的列表,选择与用户任务最相关的数据集——根据数据集ID、标签和描述进行匹配。请自行判断,不要局限于2-3个。力求全面覆盖:如果有5个基准明确涵盖该任务,则全部使用。
Step 3: Fetch top models from leaderboards
步骤3:从排行榜获取顶级模型
For each selected benchmark dataset:
bash
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/datasets/<namespace>/<repo>/leaderboard" | jq '[.[:15] | .[] | {rank, modelId, value, verified}]'Collect model IDs and scores across all benchmarks. If a leaderboard returns an error (404, 401, etc.), skip it and note it in the output.
对于每个选定的基准数据集:
bash
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)" \
"https://huggingface.co/api/datasets/<namespace>/<repo>/leaderboard" | jq '[.[:15] | .[] | {rank, modelId, value, verified}]'收集所有基准中的模型ID和分数。如果排行榜返回错误(404、401等),则跳过该排行榜并在输出中注明。
Step 4: Enrich with model metadata
步骤4:丰富模型元数据
For the top 10-15 candidate model IDs, get model infos.
bash
undefined针对前10-15个候选模型ID,获取模型信息。
bash
undefinedREST API
REST API
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)"
"https://huggingface.co/api/models/org/model1" | jq '{safetensors, tags, cardData}'
"https://huggingface.co/api/models/org/model1" | jq '{safetensors, tags, cardData}'
curl -s -H "Authorization: Bearer $(cat ~/.cache/huggingface/token)"
"https://huggingface.co/api/models/org/model1" | jq '{safetensors, tags, cardData}'
"https://huggingface.co/api/models/org/model1" | jq '{safetensors, tags, cardData}'
CLI (hf-cli)
CLI (hf-cli)
hf models info org/model1 --json | jq '{safetensors, tags, cardData}'
Extract from each response:
- **Parameters**: `safetensors.total` → convert to B (e.g., 7_241_748_480 → "7.2B")
- **License**: from model card tags (look for `license:apache-2.0`, `license:mit`, etc.)
- If `safetensors` is absent, parse size from the model name (look for "7b", "8b", "13b", "70b", "72b", etc.)
---hf models info org/model1 --json | jq '{safetensors, tags, cardData}'
从每个响应中提取:
- **参数**:`safetensors.total` → 转换为B(例如,7_241_748_480 → "7.2B")
- **许可证**:从模型卡片标签中获取(查找`license:apache-2.0`、`license:mit`等)
- 如果`safetensors`不存在,从模型名称中解析规模(查找"7b"、"8b"、"13b"、"70b"、"72b"等)
---Step 5: Filter and rank
步骤5:筛选和排序
If a device was specified:
- Remove models exceeding the fp16 parameter budget for the device
- Flag models that fit only with Q4 quantization (multiply budget by ~4 for Q4 capacity)
- If a highly-ranked model is slightly over budget, keep it with a "needs Q4" note — don't silently drop it
If no device was mentioned: skip all size filtering — just rank by benchmark score.
Then: rank by benchmark score (descending), keep top 5-8 models.
Include proprietary models (GPT-4, Claude, Gemini) if they appear on leaderboards, but flag them as "API only / not self-hostable". If the user explicitly asked for local/open models only, exclude them.
如果指定了设备:
- 移除超出设备fp16参数预算的模型
- 标记仅通过Q4量化才能适配的模型(将预算乘以约4得到Q4容量)
- 如果某个排名靠前的模型略微超出预算,保留它并添加“需要Q4量化”的注释——不要直接删除
如果未提及设备: 跳过所有规模筛选——仅按基准分数排序。
然后:按基准分数降序排序,保留前5-8个模型。
如果专有模型(GPT-4、Claude、Gemini)出现在排行榜中,将其纳入,但标记为“仅API可用 / 无法自托管”。如果用户明确要求仅本地/开源模型,则排除它们。
Step 6: Output
步骤6:输出
Comparison table
对比表格
markdown
| # | Model | Params | [Benchmark 1] | [Benchmark 2] | License | On device |
|---|-------|--------|--------------|--------------|---------|-----------|
| ⭐1 | [org/name](https://huggingface.co/org/name) | 7B | 85.2% | — | Apache 2.0 | Yes (fp16) |
| 2 | [org/name](https://huggingface.co/org/name) | 13B | 83.1% | 71.5% | MIT | Q4 only |
| 3 | [org/name](https://huggingface.co/org/name) | 70B | 90.0% | 81.0% | Llama | Too large |- Link model names to
https://huggingface.co/<model_id> - Use for benchmarks where the model wasn't evaluated
— - Star the top recommended pick with ⭐
- "On device" values: ,
Yes (fp16),Q4 only,Too largeAPI only
markdown
| # | Model | Params | [Benchmark 1] | [Benchmark 2] | License | On device |
|---|-------|--------|--------------|--------------|---------|-----------|
| ⭐1 | [org/name](https://huggingface.co/org/name) | 7B | 85.2% | — | Apache 2.0 | Yes (fp16) |
| 2 | [org/name](https://huggingface.co/org/name) | 13B | 83.1% | 71.5% | MIT | Q4 only |
| 3 | [org/name](https://huggingface.co/org/name) | 70B | 90.0% | 81.0% | Llama | Too large |- 将模型名称链接到
https://huggingface.co/<model_id> - 对于未评估该模型的基准,使用
— - 用⭐标记排名第一的推荐模型
- “On device”取值:、
Yes (fp16)、Q4 only、Too largeAPI only
Follow-up
后续跟进
After presenting the table, ask the user: "Would you like to run [top recommended model]?"
If they say yes, ask whether they'd prefer to:
- Run locally — ask about their device if not already known, then give appropriate setup instructions
- Run on HF Jobs — point them to the HF Jobs guide: https://huggingface.co/docs/huggingface_hub/en/guides/jobs
展示表格后,询问用户:“是否要运行**[排名第一的推荐模型]**?”
如果用户同意,询问他们偏好:
- 本地运行 — 如果尚未了解设备信息,询问设备详情,然后提供相应的设置说明
- 在HF Jobs上运行 — 引导他们查看HF Jobs指南:https://huggingface.co/docs/huggingface_hub/en/guides/jobs
Error handling
错误处理
- Leaderboard not found: skip, note "leaderboard unavailable" in output
- Model missing from hub_repo_details: fall back to parsing size from model name
- No benchmarks found for task: use the curated fallback table above, or try with
hub_repo_searchsorted byfilters=["<task>"]trendingScore - All leaderboards fail: fall back to for popular models tagged with the task, note that results are by popularity rather than benchmark score
hub_repo_search
- 未找到排行榜:跳过,在输出中注明“排行榜不可用”
- 模型在hub_repo_details中缺失:退而求其次,从模型名称解析规模
- 未找到适用于该任务的基准:使用上述精选的备用表格,或尝试使用并设置
hub_repo_search,按filters=["<task>"]排序trendingScore - 所有排行榜均失败:退而求其次,使用查找带有该任务标签的热门模型,并注明结果是按受欢迎程度而非基准分数排序
hub_repo_search