ollama

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Chinese

Ollama makes running LLMs locally as easy as

docker run

. 2025 updates include Windows/AMD support, Multimodal input, and Tool Calling.

Ollama让在本地运行LLM变得像

docker run

一样简单。2025年更新包括Windows/AMD支持、**多模态（Multimodal）**输入以及工具调用功能。

Docker-like file to define a custom model (System prompt + Base model).

dockerfile

FROM llama3
SYSTEM You are Mario from Super Mario Bros.

类似Docker的文件，用于定义自定义模型（系统提示词 + 基础模型）。

dockerfile

FROM llama3
SYSTEM You are Mario from Super Mario Bros.

Ollama runs a local server (

localhost:11434

) compatible with OpenAI SDK.

Ollama会启动一个本地服务器（

localhost:11434

），兼容OpenAI SDK。

Do:

Use high-speed RAM: Local LLM speed depends on memory bandwidth.
Use Quantized Models:
```
q4_k_m
```
is the sweet spot for speed/quality balance.
Unload:
```
ollama stop
```
when done to free VRAM for games/rendering.

Don't:

Don't expect GPT-4 level: Smaller local models (8B) are smart but lack deep reasoning.

建议：

不建议：