huggingface-local-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hugging Face Local Models

Hugging Face本地模型

Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with
llama-cli
or
llama-server
.
在Hugging Face Hub中搜索兼容llama.cpp的GGUF仓库,选择合适的量化版本,并通过
llama-cli
llama-server
启动模型。

Default Workflow

默认工作流程

  1. Search the Hub with
    apps=llama.cpp
    .
  2. Open
    https://huggingface.co/<repo>?local-app=llama.cpp
    .
  3. Prefer the exact HF local-app snippet and quant recommendation when it is visible.
  4. Confirm exact
    .gguf
    filenames with
    https://huggingface.co/api/models/<repo>/tree/main?recursive=true
    .
  5. Launch with
    llama-cli -hf <repo>:<QUANT>
    or
    llama-server -hf <repo>:<QUANT>
    .
  6. Fall back to
    --hf-repo
    plus
    --hf-file
    when the repo uses custom file naming.
  7. Convert from Transformers weights only if the repo does not already expose GGUF files.
  1. 使用
    apps=llama.cpp
    在Hub中搜索模型。
  2. 打开
    https://huggingface.co/<repo>?local-app=llama.cpp
  3. 当页面显示HF本地应用代码片段和量化版本推荐时,优先使用这些内容。
  4. 通过
    https://huggingface.co/api/models/<repo>/tree/main?recursive=true
    确认准确的
    .gguf
    文件名。
  5. 使用
    llama-cli -hf <repo>:<QUANT>
    llama-server -hf <repo>:<QUANT>
    启动模型。
  6. 当仓库使用自定义文件命名时,退而使用
    --hf-repo
    搭配
    --hf-file
    参数。
  7. 仅当仓库未提供GGUF文件时,才从Transformers权重转换模型。

Quick Start

快速开始

Install llama.cpp

安装llama.cpp

bash
brew install llama.cpp
winget install llama.cpp
bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
make
bash
brew install llama.cpp
winget install llama.cpp
bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
make

Authenticate for gated repos

认证 gated 仓库

bash
hf auth login
bash
hf auth login

Search the Hub

在Hub中搜索模型

text
https://huggingface.co/models?apps=llama.cpp&sort=trending
https://huggingface.co/models?search=Qwen3.6&apps=llama.cpp&sort=trending
https://huggingface.co/models?search=<term>&apps=llama.cpp&num_parameters=min:0,max:24B&sort=trending
text
https://huggingface.co/models?apps=llama.cpp&sort=trending
https://huggingface.co/models?search=Qwen3.6&apps=llama.cpp&sort=trending
https://huggingface.co/models?search=<term>&apps=llama.cpp&num_parameters=min:0,max:24B&sort=trending

Run directly from the Hub

直接从Hub运行模型

bash
llama-cli -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
bash
llama-cli -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M

Run an exact GGUF file

运行指定的GGUF文件

bash
llama-server \
    --hf-repo unsloth/Qwen3.6-35B-A3B-GGUF \
    --hf-file Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \
    -c 4096
bash
llama-server \
    --hf-repo unsloth/Qwen3.6-35B-A3B-GGUF \
    --hf-file Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \
    -c 4096

Convert only when no GGUF is available

仅当无GGUF文件时进行转换

bash
hf download <repo-without-gguf> --local-dir ./model-src
python convert_hf_to_gguf.py ./model-src \
    --outfile model-f16.gguf \
    --outtype f16
llama-quantize model-f16.gguf model-q4_k_m.gguf Q4_K_M
bash
hf download <repo-without-gguf> --local-dir ./model-src
python convert_hf_to_gguf.py ./model-src \
    --outfile model-f16.gguf \
    --outtype f16
llama-quantize model-f16.gguf model-q4_k_m.gguf Q4_K_M

Smoke test a local server

本地服务器冒烟测试

bash
llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer no-key" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a limerick about exception handling"}
    ]
  }'
bash
llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer no-key" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a limerick about exception handling"}
    ]
  }'

Quant Choice

量化版本选择

  • Prefer the exact quant that HF marks as compatible on the
    ?local-app=llama.cpp
    page.
  • Keep repo-native labels such as
    UD-Q4_K_M
    instead of normalizing them.
  • Default to
    Q4_K_M
    unless the repo page or hardware profile suggests otherwise.
  • Prefer
    Q5_K_M
    or
    Q6_K
    for code or technical workloads when memory allows.
  • Consider
    Q3_K_M
    ,
    Q4_K_S
    , or repo-specific
    IQ
    /
    UD-*
    variants for tighter RAM or VRAM budgets.
  • Treat
    mmproj-*.gguf
    files as projector weights, not the main checkpoint.
  • 优先选择HF在
    ?local-app=llama.cpp
    页面标记为兼容的量化版本。
  • 保留仓库原生的标签,如
    UD-Q4_K_M
    ,不进行标准化。
  • 除非仓库页面或硬件配置另有建议,默认使用
    Q4_K_M
  • 若内存允许,针对代码或技术类任务优先选择
    Q5_K_M
    Q6_K
  • 若RAM或VRAM预算有限,可考虑
    Q3_K_M
    Q4_K_S
    或仓库特定的
    IQ
    /
    UD-*
    变体。
  • mmproj-*.gguf
    文件视为投影权重,而非主检查点。

Load References

参考文档

  • Read hub-discovery.md for URL-first workflows, model search, tree API extraction, and command reconstruction.
  • Read quantization.md for format tables, model scaling, quality tradeoffs, and
    imatrix
    .
  • Read hardware.md for Metal, CUDA, ROCm, or CPU build and acceleration details.
  • 阅读hub-discovery.md了解基于URL的工作流程、模型搜索、树API提取和命令重构。
  • 阅读quantization.md了解格式表、模型缩放、质量权衡和
    imatrix
    相关内容。
  • 阅读hardware.md了解Metal、CUDA、ROCm或CPU的构建和加速细节。

Resources

资源

  • llama.cpp:
    https://github.com/ggml-org/llama.cpp
  • Hugging Face GGUF + llama.cpp docs:
    https://huggingface.co/docs/hub/gguf-llamacpp
  • Hugging Face Local Apps docs:
    https://huggingface.co/docs/hub/main/local-apps
  • Hugging Face Local Agents docs:
    https://huggingface.co/docs/hub/agents-local
  • GGUF converter Space:
    https://huggingface.co/spaces/ggml-org/gguf-my-repo
  • llama.cpp:
    https://github.com/ggml-org/llama.cpp
  • Hugging Face GGUF + llama.cpp文档:
    https://huggingface.co/docs/hub/gguf-llamacpp
  • Hugging Face本地应用文档:
    https://huggingface.co/docs/hub/main/local-apps
  • Hugging Face本地代理文档:
    https://huggingface.co/docs/hub/agents-local
  • GGUF转换器Space:
    https://huggingface.co/spaces/ggml-org/gguf-my-repo