ollama-optimizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ollama Optimizer

Ollama 优化工具

Optimize Ollama configuration based on system hardware analysis.

基于系统硬件分析优化Ollama配置。

Workflow

工作流程

Phase 1: System Detection

阶段1：系统检测

Run the detection script to gather hardware information:

bash

python3 scripts/detect_system.py

Parse the JSON output to identify:

OS and version
CPU model and core count
Total RAM / unified memory
GPU type, VRAM, and driver version
Current Ollama installation and environment variables

运行检测脚本以收集硬件信息：

bash

python3 scripts/detect_system.py

解析JSON输出以识别：

操作系统及版本
CPU型号与核心数
总内存/统一内存
GPU类型、显存及驱动版本
当前Ollama安装情况与环境变量

Phase 2: Analyze and Recommend

阶段2：分析与建议

Based on detected hardware, determine the optimization profile:

Hardware Tier Classification:

Tier	Criteria	Max Model	Key Optimizations
CPU-only	No GPU detected	3B	num_thread tuning, Q4_K_M quant
Low VRAM	<6GB VRAM	3B	Flash attention, KV cache q4_0
Entry	6-8GB VRAM	8B	Flash attention, KV cache q8_0
Prosumer	10-12GB VRAM	14B	Flash attention, full offload
Workstation	16-24GB VRAM	32B	Standard config, Q5_K_M option
High-end	48GB+ VRAM	70B+	Multiple models, Q5/Q6 quants

Apple Silicon Special Case:

Unified memory = shared CPU/GPU RAM
8GB Mac → treat as 6GB VRAM tier
16GB Mac → treat as 12GB VRAM tier
32GB+ Mac → treat as workstation tier

基于检测到的硬件，确定优化配置文件：

硬件层级分类：

层级	判定标准	最大支持模型	核心优化措施
仅CPU	未检测到GPU	3B	num_thread调优、Q4_K_M量化
低显存	显存<6GB	3B	Flash注意力、KV缓存q4_0
入门级	显存6-8GB	8B	Flash注意力、KV缓存q8_0
消费级高端	显存10-12GB	14B	Flash注意力、全卸载
工作站级	显存16-24GB	32B	标准配置、可选Q5_K_M
高端	显存48GB+	70B+	多模型支持、Q5/Q6量化

Apple Silicon 特殊情况：

统一内存=CPU/GPU共享内存
8GB内存Mac→视为6GB显存层级
16GB内存Mac→视为12GB显存层级
32GB+内存Mac→视为工作站层级

Phase 3: Generate Optimization Plan

阶段3：生成优化方案

Create a structured optimization guide with these sections:

创建包含以下部分的结构化优化指南：

1. System Overview

1. 系统概述

Present detected hardware specs and highlight constraints (e.g., "8GB unified memory limits to 7B models").

展示检测到的硬件规格并强调限制条件（例如：“8GB统一内存限制最大支持7B模型”）。

2. Dependency Assessment

2. 依赖项评估

List what's needed based on the platform:

macOS: Ollama only (Metal automatic)
Linux NVIDIA: Ollama + NVIDIA driver 450+
Linux AMD: Ollama + ROCm 5.0+
Windows: Ollama + NVIDIA driver 452+

根据平台列出所需依赖：

macOS：仅需Ollama（Metal自动启用）
Linux NVIDIA：Ollama + NVIDIA驱动450+
Linux AMD：Ollama + ROCm 5.0+
Windows：Ollama + NVIDIA驱动452+

3. Configuration Recommendations

3. 配置建议

Essential environment variables:

bash

undefined

核心环境变量：

bash

undefined

Always recommended

始终推荐启用

export OLLAMA_FLASH_ATTENTION=1

Memory-constrained systems (<12GB)

内存受限系统（<12GB）

export OLLAMA_KV_CACHE_TYPE=q8_0 # or q4_0 for severe constraints


**Model selection guidance:**
- Recommend specific models from `ollama list` output
- Suggest appropriate quantization (Q4_K_M default, Q5_K_M if headroom exists)
- Warn if current models exceed hardware capacity

**Modelfile tuning (when needed):**

PARAMETER num_gpu <layers> # Partial offload for limited VRAM PARAMETER num_thread <cores> # CPU threads (physical cores, not hyperthreads) PARAMETER num_ctx <size> # Reduce context for memory savings

undefined

export OLLAMA_KV_CACHE_TYPE=q8_0 # 内存严重受限可使用q4_0


**模型选择指南：**
- 从`ollama list`输出中推荐特定模型
- 建议合适的量化方式（默认Q4_K_M，若有剩余资源可选Q5_K_M）
- 若当前模型超出硬件能力则发出警告

**Modelfile调优（必要时）：**

PARAMETER num_gpu <layers> # 显存有限时部分卸载 PARAMETER num_thread <cores> # CPU线程数（物理核心，而非超线程） PARAMETER num_ctx <size> # 减少上下文以节省内存

undefined

4. Execution Checklist

4. 执行清单

Provide copy-paste commands in order:

Set environment variables
Restart Ollama service
Pull recommended models
Test with
```
ollama run <model> --verbose
```

提供可直接复制粘贴的命令，按顺序执行：

设置环境变量
重启Ollama服务
拉取推荐的模型
使用
```
ollama run <model> --verbose
```
进行测试

5. Verification Commands

5. 验证命令

bash

undefined

bash

undefined

Benchmark current performance

基准测试当前性能

python3 scripts/benchmark_ollama.py --model <model>

Check GPU memory usage (NVIDIA)

检查GPU内存使用情况（NVIDIA）

nvidia-smi

Verify config is applied

验证配置是否生效

ollama run <model> "test" --verbose 2>&1 | head -20

undefined

ollama run <model> "test" --verbose 2>&1 | head -20

undefined

Reference Files

参考文件

VRAM Requirements - Model sizing and quantization guide
Environment Variables - Complete env var reference
Platform-Specific Setup - OS-specific installation and configuration

显存要求 - 模型尺寸与量化指南
环境变量 - 完整环境变量参考
平台特定设置 - 操作系统专属安装与配置指南

Output Format

输出格式

Generate an

ollama-optimization-guide.md

file in the current directory with:

markdown

undefined

在当前目录生成

ollama-optimization-guide.md

文件，格式如下：

markdown

undefined

Ollama Optimization Guide

Ollama 优化指南

Generated: <timestamp> System: <OS> | <CPU> | <RAM>GB RAM | <GPU>

生成时间： <时间戳> 系统信息： <操作系统> | <CPU> | <内存>GB 内存 | <GPU>

System Overview

系统概述

<硬件摘要与限制条件>

Current Configuration

当前配置

<现有Ollama安装情况与环境变量>

Recommendations

优化建议

Environment Variables

环境变量

<设置变量的Shell命令>

Model Selection

模型选择

<推荐模型及理由>

Performance Tuning

性能调优

<必要时的Modelfile调整>

Execution Checklist

执行清单

<step 1>
<step 2> ...

<步骤1>
<步骤2> ...

Verification

验证

<基准测试命令与预期结果>

Rollback

回滚

<commands to revert changes if needed> ```

<必要时的变更回滚命令>

undefined

Quick Optimization Commands

快速优化命令

For users who want immediate results without full analysis:

macOS (Apple Silicon):

bash

export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.2:3b  # Safe for 8GB, fast

Linux/Windows with 8GB NVIDIA GPU:

bash

export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.1:8b-instruct-q4_K_M

CPU-only systems:

bash

export CUDA_VISIBLE_DEVICES=-1
ollama pull llama3.2:3b

适用于希望立即获得效果而无需完整分析的用户：

macOS（Apple Silicon）：

bash

export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.2:3b  # 8GB内存Mac适用，速度快

配备8GB NVIDIA GPU的Linux/Windows系统：

bash

export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q8_0
ollama pull llama3.1:8b-instruct-q4_K_M

仅CPU系统：

bash

export CUDA_VISIBLE_DEVICES=-1
ollama pull llama3.2:3b

Create Modelfile with: PARAMETER num_thread 4

创建Modelfile并添加：PARAMETER num_thread 4

undefined

undefined