llmfit-hardware-model-matcher

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

llmfit Hardware Model Matcher

llmfit 硬件模型匹配工具

Skill by ara.so — Daily 2026 Skills collection.

llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).

由ara.so开发的技能工具——属于Daily 2026技能合集。

llmfit 可检测系统的内存、CPU和GPU，然后从质量、速度、适配度、上下文维度为数百个LLM模型评分，精准告诉你哪些模型能在你的硬件上流畅运行。它内置交互式TUI和CLI，支持多GPU、MoE架构、动态量化，以及本地运行时提供商（Ollama、llama.cpp、MLX、Docker Model Runner）。

Installation

安装方式

macOS / Linux (Homebrew)

macOS / Linux（Homebrew）

brew install llmfit

brew install llmfit

Quick install script

快速安装脚本

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Without sudo, installs to ~/.local/bin

无需sudo，安装到~/.local/bin目录

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

undefined

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

undefined

Windows (Scoop)

Windows（Scoop）

scoop install llmfit

scoop install llmfit

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

docker run ghcr.io/alexsjones/llmfit

With jq for scripting

结合jq用于脚本开发

podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

undefined

podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

undefined

From source (Rust)

从源码安装（Rust）

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release

binary at target/release/llmfit

二进制文件位于target/release/llmfit

---

---

Core Concepts

核心概念

Fit tiers:
```
perfect
```
(runs great),
```
good
```
(runs well),
```
marginal
```
(runs but tight),
```
too_tight
```
(won't run)
Scoring dimensions: quality, speed (tok/s estimate), fit (memory headroom), context capacity
Run modes: GPU, CPU+GPU offload, CPU-only, MoE
Quantization: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
Providers: Ollama, llama.cpp, MLX, Docker Model Runner

适配等级：
```
perfect
```
（运行流畅）、
```
good
```
（运行良好）、
```
marginal
```
（可运行但资源紧张）、
```
too_tight
```
（无法运行）
评分维度：质量、速度（预估每秒生成token数）、适配度（内存剩余空间）、上下文容量
运行模式：GPU、CPU+GPU卸载、仅CPU、MoE
量化处理：自动为你的硬件选择最优量化方案（如Q4_K_M、Q5_K_S、mlx-4bit）
运行提供商：Ollama、llama.cpp、MLX、Docker Model Runner

Key Commands

关键命令

Launch Interactive TUI

启动交互式TUI

llmfit

llmfit

CLI Table Output

CLI表格输出

llmfit --cli

llmfit --cli

Show System Hardware Detection

查看系统硬件检测结果

llmfit system
llmfit --json system   # JSON output

llmfit system
llmfit --json system   # JSON格式输出

List All Models

列出所有模型

llmfit list

llmfit list

Search Models

搜索模型

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

Fit Analysis

适配性分析

undefined

undefined

All runnable models ranked by fit

按适配度排序的所有可运行模型

llmfit fit

Only perfect fits, top 5

仅显示完美适配的前5个模型

llmfit fit --perfect -n 5

JSON output

JSON格式输出

llmfit --json fit -n 10

undefined

llmfit --json fit -n 10

undefined

Model Detail

模型详情

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

Recommendations

模型推荐

undefined

undefined

Top 5 recommendations (JSON default)

前5个推荐模型（默认JSON格式）

llmfit recommend --json --limit 5

Filter by use case: general, coding, reasoning, chat, multimodal, embedding

按使用场景筛选：通用、编码、推理、对话、多模态、嵌入

llmfit recommend --json --use-case coding --limit 3 llmfit recommend --json --use-case reasoning --limit 5

undefined

llmfit recommend --json --use-case coding --limit 3 llmfit recommend --json --use-case reasoning --limit 5

undefined

Hardware Planning (invert: what hardware do I need?)

硬件规划（反向查询：我需要什么硬件？）

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

REST API Server (for cluster scheduling)

REST API服务（用于集群调度）

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

Hardware Overrides

硬件配置覆盖

When autodetection fails (VMs, broken nvidia-smi, passthrough setups):

undefined

当自动检测失败时（如虚拟机、nvidia-smi故障、透传配置）：

undefined

Override GPU VRAM

覆盖GPU显存配置

llmfit --memory=32G llmfit --memory=24G --cli llmfit --memory=24G fit --perfect -n 5 llmfit --memory=24G recommend --json

Megabytes

以兆字节为单位

llmfit --memory=32000M

Works with any subcommand

该参数可与任何子命令配合使用

llmfit --memory=16G info "Llama-3.1-70B"


Accepted suffixes: `G`/`GB`/`GiB`, `M`/`MB`/`MiB`, `T`/`TB`/`TiB` (case-insensitive).

llmfit --memory=16G info "Llama-3.1-70B"


支持的单位后缀：`G`/`GB`/`GiB`、`M`/`MB`/`MiB`、`T`/`TB`/`TiB`（不区分大小写）。

Context Length Cap

上下文长度限制

undefined

undefined

Estimate memory fit at 4K context

估算4K上下文下的内存适配情况

llmfit --max-context 4096 --cli

With subcommands

与子命令配合使用

llmfit --max-context 8192 fit --perfect -n 5 llmfit --max-context 16384 recommend --json --limit 5

Environment variable alternative

也可通过环境变量设置

export OLLAMA_CONTEXT_LENGTH=8192 llmfit recommend --json

---

export OLLAMA_CONTEXT_LENGTH=8192 llmfit recommend --json

---

REST API Reference

REST API参考

Start the server:

llmfit serve --host 0.0.0.0 --port 8787

启动服务：

llmfit serve --host 0.0.0.0 --port 8787

Endpoints

接口端点

undefined

undefined

Health check

健康检查

curl http://localhost:8787/health

Node hardware info

节点硬件信息

curl http://localhost:8787/api/v1/system

Full model list with filters

带筛选条件的完整模型列表

curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

Top runnable models for this node (key scheduling endpoint)

当前节点的最优可运行模型（集群调度核心接口）

curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

Search by model name/provider

按模型名称/提供商搜索

curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

undefined

curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

undefined

Query Parameters for

/models

and

/models/top

/models

和

/models/top

的查询参数

Param	Values	Description
`limit` / `n`	integer	Max rows returned
`min_fit`	`perfect\|good\|marginal\|too_tight`	Minimum fit tier
`perfect`	`true\|false`	Force perfect-only
`runtime`	`any\|mlx\|llamacpp`	Filter by runtime
`use_case`	`general\|coding\|reasoning\|chat\|multimodal\|embedding`	Use case filter
`provider`	string	Substring match on provider
`search`	string	Free-text across name/provider/size/use-case
`sort`	`score\|tps\|params\|mem\|ctx\|date\|use_case`	Sort column
`include_too_tight`	`true\|false`	Include non-runnable models
`max_context`	integer	Per-request context cap

参数	可选值	描述
`limit` / `n`	整数	返回结果的最大行数
`min_fit`	`perfect\|good\|marginal\|too_tight`	最小适配等级
`perfect`	`true\|false`	仅显示完美适配的模型
`runtime`	`any\|mlx\|llamacpp`	按运行时筛选
`use_case`	`general\|coding\|reasoning\|chat\|multimodal\|embedding`	按使用场景筛选
`provider`	字符串	按提供商名称子串匹配
`search`	字符串	按名称/提供商/规模/使用场景进行自由文本搜索
`sort`	`score\|tps\|params\|mem\|ctx\|date\|use_case`	排序字段
`include_too_tight`	`true\|false`	是否包含无法运行的模型
`max_context`	整数	单次请求的上下文长度限制

Scripting & Automation Examples

脚本与自动化示例

Bash: Get top coding models as JSON

Bash：获取编码场景的最优模型（JSON格式）

bash

#!/bin/bash

bash

#!/bin/bash

Get top 3 coding models that fit perfectly

获取3个完美适配的编码场景最优模型

llmfit recommend --json --use-case coding --limit 3 |
jq -r '.models[] | "(.name) ((.score)) - (.quantization)"'

undefined

llmfit recommend --json --use-case coding --limit 3 |
jq -r '.models[] | "(.name) ((.score)) - (.quantization)"'

undefined

Bash: Check if a specific model fits

Bash：检查特定模型是否适配

bash

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi

bash

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL 可流畅运行（适配等级：$FIT）"
else
  echo "$MODEL 可能无法流畅运行（适配等级：$FIT）"
fi

Bash: Auto-pull top Ollama model

Bash：自动拉取最优Ollama模型

bash

#!/bin/bash

bash

#!/bin/bash

Get the top fitting model name and pull it with Ollama

获取适配度最高的模型名称并通过Ollama拉取

TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name') echo "Pulling: $TOP_MODEL" ollama pull "$TOP_MODEL"

undefined

TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name') echo "正在拉取：$TOP_MODEL" ollama pull "$TOP_MODEL"

undefined

Python: Query the REST API

Python：调用REST API查询

python

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

python

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

Example usage

使用示例

system = get_system_info() print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3) for m in models.get("models", []): print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")

undefined

system = get_system_info() print(f"GPU: {system.get('gpu_name')} | 显存: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3) for m in models.get("models", []): print(f"{m['name']}: 评分={m['score']}, 适配等级={m['fit']}, 量化方案={m['quantization']}")

undefined

Python: Hardware-aware model selector for agents

Python：面向Agent的硬件感知模型选择器

python

import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """Use llmfit to select the best model for a given task."""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """Get hardware requirements for running a specific model."""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

python

import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """使用llmfit为指定任务选择最优模型。"""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """获取运行指定模型所需的硬件配置要求。"""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

Select best coding model

选择最优编码模型

best = get_best_model_for_task("coding") if best: print(f"Best coding model: {best['name']}") print(f" Quantization: {best['quantization']}") print(f" Estimated tok/s: {best['tps']}") print(f" Memory usage: {best['mem_pct']}%")

best = get_best_model_for_task("coding") if best: print(f"最优编码模型：{best['name']}") print(f" 量化方案：{best['quantization']}") print(f" 预估每秒token数：{best['tps']}") print(f" 内存占用率：{best['mem_pct']}%")

Plan hardware for a specific model

规划指定模型的硬件需求

plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192) print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB") print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")

undefined

plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192) print(f"所需最小显存：{plan['hardware']['min_vram_gb']}GB") print(f"推荐显存：{plan['hardware']['recommended_vram_gb']}GB")

undefined

Docker Compose: Node scheduler pattern

Docker Compose：节点调度模式

yaml

version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # pass GPU through

yaml

version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # 透传GPU

TUI Key Reference

TUI快捷键参考

Key	Action
`↑` / `↓` or `j` / `k`	Navigate models
`/`	Search (name, provider, params, use case)
`Esc` / `Enter`	Exit search
`Ctrl-U`	Clear search
`f`	Cycle fit filter: All → Runnable → Perfect → Good → Marginal
`a`	Cycle availability: All → GGUF Avail → Installed
`s`	Cycle sort: Score → Params → Mem% → Ctx → Date → Use Case
`t`	Cycle color theme (auto-saved)
`v`	Visual mode (multi-select for comparison)
`V`	Select mode (column-based filtering)
`p`	Plan mode (what hardware needed for this model?)
`P`	Provider filter popup
`U`	Use-case filter popup
`C`	Capability filter popup
`m`	Mark model for comparison
`c`	Compare view (marked vs selected)
`d`	Download model (via detected runtime)
`r`	Refresh installed models from runtimes
`Enter`	Toggle detail view
`g` / `G`	Jump to top/bottom
`q`	Quit

按键	操作
`↑` / `↓` 或 `j` / `k`	导航模型列表
`/`	搜索（名称、提供商、参数、使用场景）
`Esc` / `Enter`	退出搜索
`Ctrl-U`	清空搜索内容
`f`	切换适配等级筛选：全部→可运行→完美适配→良好适配→资源紧张
`a`	切换可用状态筛选：全部→GGUF可用→已安装
`s`	切换排序方式：评分→参数→内存占用率→上下文→日期→使用场景
`t`	切换颜色主题（自动保存）
`v`	可视化模式（多选用于对比）
`V`	选择模式（基于列的筛选）
`p`	规划模式（运行该模型需要什么硬件？）
`P`	提供商筛选弹窗
`U`	使用场景筛选弹窗
`C`	能力筛选弹窗
`m`	标记模型用于对比
`c`	对比视图（已标记模型与选中模型）
`d`	下载模型（通过检测到的运行时）
`r`	从运行时刷新已安装模型列表
`Enter`	切换详情视图
`g` / `G`	跳转到列表顶部/底部
`q`	退出TUI

Themes

主题

cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox
Theme saved to

~/.config/llmfit/theme

键循环切换：默认→Dracula→Solarized→Nord→Monokai→Gruvbox
主题设置保存到

~/.config/llmfit/theme

GPU Detection Details

GPU检测细节

GPU Vendor	Detection Method
NVIDIA	`nvidia-smi` (multi-GPU, aggregates VRAM)
AMD	`rocm-smi`
Intel Arc	sysfs (discrete) / `lspci` (integrated)
Apple Silicon	`system_profiler` (unified memory = VRAM)
Ascend	`npu-smi`

GPU厂商	检测方式
NVIDIA	`nvidia-smi` （支持多GPU，汇总显存）
AMD	`rocm-smi`
Intel Arc	sysfs（独立显卡）/ `lspci` （集成显卡）
Apple Silicon	`system_profiler` （统一内存=显存）
Ascend	`npu-smi`

Common Patterns

常见使用场景

"What can I run on my 16GB M2 Mac?"

"我的16GB M2 Mac能运行哪些模型？"

llmfit fit --perfect -n 10

llmfit fit --perfect -n 10

or interactively

或使用交互式界面

llmfit

press 'f' to filter to Perfect fit

按'f'键筛选到完美适配等级

undefined

undefined

"I have a 3090 (24GB VRAM), what coding models fit?"

"我有3090（24GB显存），哪些编码模型适配？"

llmfit recommend --json --use-case coding | jq '.models[]'

llmfit recommend --json --use-case coding | jq '.models[]'

or with manual override if detection fails

若自动检测失败，可手动覆盖显存配置

llmfit --memory=24G recommend --json --use-case coding

undefined

llmfit --memory=24G recommend --json --use-case coding

undefined

"Can Llama 70B run on my machine?"

"我的机器能运行Llama 70B吗？"

llmfit info "Llama-3.1-70B"

llmfit info "Llama-3.1-70B"

Plan what hardware you'd need

规划运行该模型所需的硬件配置

llmfit plan "Llama-3.1-70B" --context 4096 --json

undefined

llmfit plan "Llama-3.1-70B" --context 4096 --json

undefined

"Show me only models already installed in Ollama"

"只显示已在Ollama中安装的模型"

llmfit

llmfit

press 'a' to cycle to Installed filter

按'a'键切换到已安装筛选

or

或

llmfit fit -n 20 # run, press 'i' in TUI for installed-first

undefined

llmfit fit -n 20 # 运行后在TUI中按'i'键优先显示已安装模型

undefined

"Script: find best model and start Ollama"

"脚本：找到最优模型并启动Ollama"

bash

MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"

bash

MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"

"API: poll node capabilities for cluster scheduler"

"API：为集群调度轮询节点能力"

bash

undefined

bash

undefined

Check node, get top 3 good+ models for reasoning

检查节点，获取3个适配等级为良好及以上的推理模型

curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" |
jq '.models[].name'

---

curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" |
jq '.models[].name'

---

Troubleshooting

故障排查

GPU not detected / wrong VRAM reported

undefined

GPU未被检测到 / 显存报告错误

undefined

Verify detection

验证检测结果

llmfit system

Manual override

手动覆盖显存配置

llmfit --memory=24G --cli


**`nvidia-smi` not found but you have an NVIDIA GPU**
```sh

llmfit --memory=24G --cli


**已安装NVIDIA GPU但找不到`nvidia-smi`**
```sh

Install CUDA toolkit or nvidia-utils, then retry

安装CUDA工具包或nvidia-utils，然后重试

Or override manually:

或手动覆盖配置：

llmfit --memory=8G fit --perfect


**Models show as too_tight but you have enough RAM**
```sh

llmfit --memory=8G fit --perfect


**模型显示为too_tight但内存足够**
```sh

llmfit may be using context-inflated estimates; cap context

llmfit可能使用了上下文膨胀后的估算值，可限制上下文长度

llmfit --max-context 2048 fit --perfect -n 10


**REST API: test endpoints**
```sh

llmfit --max-context 2048 fit --perfect -n 10


**REST API：测试接口**
```sh

Spawn server and run validation suite

启动服务并运行验证套件

python3 scripts/test_api.py --spawn

Test already-running server

测试已运行的服务

python3 scripts/test_api.py --base-url http://127.0.0.1:8787


**Apple Silicon: VRAM shows as system RAM (expected)**
```sh

python3 scripts/test_api.py --base-url http://127.0.0.1:8787


**Apple Silicon：显存显示为系统内存（正常现象）**
```sh

This is correct — Apple Silicon uses unified memory

这是正确的——Apple Silicon使用统一内存

llmfit accounts for this automatically

llmfit会自动适配这种情况

llmfit system # should show backend: Metal


**Context length environment variable**
```sh
export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # uses 4096 as context cap

llmfit system # 应显示backend: Metal


**上下文长度环境变量**
```sh
export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # 使用4096作为上下文长度限制