ollama

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ollama Skill

Ollama 技能文档

Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.

为Ollama开发提供全面支持——Ollama是一款本地AI模型运行时，可通过编程方式运行和交互大语言模型。

When to Use This Skill

何时使用此技能

This skill should be triggered when:

Running local AI models with Ollama
Building applications that interact with Ollama's API
Implementing chat completions, embeddings, or streaming responses
Setting up Ollama authentication or cloud models
Configuring Ollama server (environment variables, ports, proxies)
Using Ollama with OpenAI-compatible libraries
Troubleshooting Ollama installations or GPU compatibility
Implementing tool calling, structured outputs, or vision capabilities
Working with Ollama in Docker or behind proxies
Creating, copying, pushing, or managing Ollama models

当出现以下场景时，应使用此技能：

使用Ollama运行本地AI模型
构建与Ollama API交互的应用程序
实现聊天补全、嵌入或流式响应
设置Ollama身份验证或云模型
配置Ollama服务器（环境变量、端口、代理）
将Ollama与OpenAI兼容库配合使用
排查Ollama安装或GPU兼容性问题
实现工具调用、结构化输出或视觉功能
在Docker或代理环境下使用Ollama
创建、复制、推送或管理Ollama模型

Quick Reference

快速参考

1. Basic Chat Completion (cURL)

1. 基础聊天补全（cURL）

Generate a simple chat response:

bash

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

生成简单的聊天响应：

bash

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

2. Simple Text Generation (cURL)

2. 简单文本生成（cURL）

Generate a text response from a prompt:

bash

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

根据提示生成文本响应：

bash

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

3. Python Chat with OpenAI Library

3. 使用OpenAI库的Python聊天

Use Ollama with the OpenAI Python library:

python

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

将Ollama与OpenAI Python库配合使用：

python

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

4. Vision Model (Image Analysis)

4. 视觉模型（图像分析）

Ask questions about images:

python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KG...",
                },
            ],
        }
    ],
    max_tokens=300,
)

针对图像提问：

python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KG...",
                },
            ],
        }
    ],
    max_tokens=300,
)

5. Generate Embeddings

5. 生成嵌入向量

Create vector embeddings for text:

python

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

为文本创建向量嵌入：

python

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

6. Structured Outputs (JSON Schema)

6. 结构化输出（JSON Schema）

Get structured JSON responses:

python

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)

获取结构化JSON响应：

python

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)

7. JavaScript/TypeScript Chat

7. JavaScript/TypeScript聊天

Use Ollama with the OpenAI JavaScript library:

javascript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});

将Ollama与OpenAI JavaScript库配合使用：

javascript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});

8. Authentication for Cloud Models

8. 云模型身份验证

bash

undefined

登录以使用云模型：

bash

undefined

Sign in from CLI

ollama signin

Then use cloud models

ollama run gpt-oss:120b-cloud


Or use API keys for direct cloud access:

```bash
export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

ollama run gpt-oss:120b-cloud


或使用API密钥直接访问云：

```bash
export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

9. Configure Ollama Server

9. 配置Ollama服务器

Set environment variables for server configuration:

macOS:

bash

undefined

设置环境变量以配置服务器：

macOS:

bash

undefined

Set environment variable

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

Restart Ollama application


**Linux (systemd):**
```bash


**Linux (systemd):**
```bash

Edit service

systemctl edit ollama.service

Add under [Service]

Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload and restart

systemctl daemon-reload systemctl restart ollama


**Windows:**

Quit Ollama from task bar
Search "environment variables" in Settings
Edit or create OLLAMA_HOST variable
Set value: 0.0.0.0:11434
Restart Ollama from Start menu

undefined

systemctl daemon-reload systemctl restart ollama


**Windows:**

Quit Ollama from task bar
Search "environment variables" in Settings
Edit or create OLLAMA_HOST variable
Set value: 0.0.0.0:11434
Restart Ollama from Start menu

undefined

10. Check Model GPU Loading

10. 检查模型GPU加载情况

Verify if your model is using GPU:

bash

ollama ps

Output shows:

```
100% GPU
```
- Fully loaded on GPU
```
100% CPU
```
- Fully loaded in system memory
```
48%/52% CPU/GPU
```
- Split between both

验证模型是否使用GPU：

bash

ollama ps

输出说明：

```
100% GPU
```
- 完全加载到GPU
```
100% CPU
```
- 完全加载到系统内存
```
48%/52% CPU/GPU
```
- 同时使用CPU和GPU

Key Concepts

核心概念

Base URLs

基础URL

Local API (default):
```
http://localhost:11434/api
```
Cloud API:
```
https://ollama.com/api
```
OpenAI Compatible:
```
/v1/
```
endpoints for OpenAI libraries

本地API（默认）:
```
http://localhost:11434/api
```
云API:
```
https://ollama.com/api
```
OpenAI兼容: 供OpenAI库使用的
```
/v1/
```
端点

Authentication

身份验证

Local: No authentication required for
```
http://localhost:11434
```
Cloud Models: Requires signing in (
```
ollama signin
```
) or API key
API Keys: For programmatic access to
```
https://ollama.com/api
```

本地:
```
http://localhost:11434
```
无需身份验证
云模型: 需要登录（
```
ollama signin
```
）或API密钥
API密钥: 用于以编程方式访问
```
https://ollama.com/api
```

Models

模型

Local Models: Run on your machine (e.g.,
```
gemma3
```
,
```
llama3.2
```
,
```
qwen3
```
)

Cloud Models: Suffix

-cloud

(e.g.,

gpt-oss:120b-cloud

qwen3-coder:480b-cloud

)

Vision Models: Support image inputs (e.g.,
```
llava
```
)

本地模型: 在本地机器运行（例如
```
gemma3
```
,
```
llama3.2
```
,
```
qwen3
```
）

云模型: 后缀为

-cloud

（例如

gpt-oss:120b-cloud

qwen3-coder:480b-cloud

）

视觉模型: 支持图像输入（例如
```
llava
```
）

Common Environment Variables

常见环境变量

```
OLLAMA_HOST
```
- Change bind address (default:
```
127.0.0.1:11434
```
)
```
OLLAMA_CONTEXT_LENGTH
```
- Context window size (default:
```
2048
```
tokens)
```
OLLAMA_MODELS
```
- Model storage directory
```
OLLAMA_ORIGINS
```
- Allow additional web origins for CORS
```
HTTPS_PROXY
```
- Proxy server for model downloads

```
OLLAMA_HOST
```
- 修改绑定地址（默认：
```
127.0.0.1:11434
```
）
```
OLLAMA_CONTEXT_LENGTH
```
- 上下文窗口大小（默认：
```
2048
```
tokens）
```
OLLAMA_MODELS
```
- 模型存储目录
```
OLLAMA_ORIGINS
```
- 允许额外的Web来源以支持CORS
```
HTTPS_PROXY
```
- 模型下载使用的代理服务器

Error Handling

错误处理

Status Codes:

```
200
```
- Success
```
400
```
- Bad Request (invalid parameters)
```
404
```
- Not Found (model doesn't exist)
```
429
```
- Too Many Requests (rate limit)
```
500
```
- Internal Server Error
```
502
```
- Bad Gateway (cloud model unreachable)

Error Format:

json

{
  "error": "the model failed to generate a response"
}

状态码:

```
200
```
- 成功
```
400
```
- 请求错误（参数无效）
```
404
```
- 未找到（模型不存在）
```
429
```
- 请求过多（速率限制）
```
500
```
- 内部服务器错误
```
502
```
- 网关错误（云模型无法访问）

错误格式:

json

{
  "error": "the model failed to generate a response"
}

Streaming vs Non-Streaming

流式与非流式响应

Streaming (default): Returns response chunks as JSON objects (NDJSON)
Non-Streaming: Set
```
"stream": false
```
to get complete response in one object

流式（默认）: 以JSON对象（NDJSON）形式返回响应片段
非流式: 设置
```
"stream": false
```
以一次性获取完整响应

Reference Files

参考文件

This skill includes comprehensive documentation in

references/

llms-txt.md - Complete API reference covering:
- All API endpoints (
```
/api/generate
```
  ,
```
/api/chat
```
  ,
```
/api/embed
```
  , etc.)
- Authentication methods (signin, API keys)
- Error handling and status codes
- OpenAI compatibility layer
- Cloud models usage
- Streaming responses
- Configuration and environment variables
llms.md - Documentation index listing all available topics:
- API reference (version, model details, chat, generate, embeddings)
- Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
- CLI reference
- Cloud integration
- Platform-specific guides (Linux, macOS, Windows, Docker)
- IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)

Use the reference files when you need:

Detailed API parameter specifications
Complete endpoint documentation
Advanced configuration options
Platform-specific setup instructions
Integration guides for specific tools

此技能在

references/

目录中包含完整文档：

llms-txt.md - 完整API参考，涵盖：
- 所有API端点（
```
/api/generate
```
  ,
```
/api/chat
```
  ,
```
/api/embed
```
  等）
- 身份验证方法（登录、API密钥）
- 错误处理和状态码
- OpenAI兼容层
- 云模型使用
- 流式响应
- 配置和环境变量
llms.md - 文档索引，列出所有可用主题：
- API参考（版本、模型详情、聊天、生成、嵌入）
- 功能（嵌入、流式、结构化输出、工具调用、视觉）
- CLI参考
- 云集成
- 平台特定指南（Linux、macOS、Windows、Docker）
- IDE集成（VS Code、JetBrains、Xcode、Zed、Cline）

当你需要以下信息时使用参考文件：

详细的API参数规范
完整的端点文档
高级配置选项
平台特定设置说明
特定工具的集成指南

Working with This Skill

使用此技能的指南

For Beginners

初学者

Start with these common patterns:

Simple generation: Use
```
/api/generate
```
endpoint with a prompt
Chat interface: Use
```
/api/chat
```
with messages array
OpenAI compatibility: Use OpenAI libraries with
```
base_url='http://localhost:11434/v1/'
```
Check GPU usage: Run
```
ollama ps
```
to verify model loading

Read

llms-txt.md

section on "Introduction" and "Quickstart" for foundational concepts.

从以下常见模式开始：

简单生成: 使用
```
/api/generate
```
端点和提示词
聊天界面: 使用
```
/api/chat
```
和消息数组
OpenAI兼容: 使用OpenAI库并设置
```
base_url='http://localhost:11434/v1/'
```
检查GPU使用: 运行
```
ollama ps
```
验证模型加载情况

阅读

llms-txt.md

中的"简介"和"快速入门"部分以了解基础概念。

For Intermediate Users

中级用户

Focus on:

Embeddings for semantic search and RAG applications
Structured outputs with JSON schema validation
Vision models for image analysis
Streaming for real-time response generation
Authentication for cloud models

Check the specific API endpoints in

llms-txt.md

for detailed parameter options.

重点关注：

嵌入向量用于语义搜索和RAG应用
结构化输出与JSON schema验证
视觉模型用于图像分析
流式响应用于实时生成
身份验证用于云模型

查看

llms-txt.md

中的特定API端点以获取详细参数选项。

For Advanced Users

高级用户

Explore:

Tool calling for function execution
Custom model creation with Modelfiles
Server configuration with environment variables
Proxy setup for network-restricted environments
Docker deployment with custom configurations
Performance optimization with GPU settings

Refer to platform-specific sections in

llms.md

and configuration details in

llms-txt.md

探索：

工具调用用于函数执行
自定义模型创建使用Modelfiles
服务器配置使用环境变量
代理设置用于网络受限环境
Docker部署与自定义配置
性能优化使用GPU设置

参考

llms.md

中的平台特定部分和

llms-txt.md

中的配置详情。

Common Use Cases

常见用例

Building a chatbot:

Use
```
/api/chat
```
endpoint
Maintain message history in your application
Stream responses for better UX
Handle errors gracefully

Creating embeddings for search:

Use
```
/api/embed
```
endpoint
Store embeddings in vector database
Perform similarity search
Implement RAG (Retrieval Augmented Generation)

Running behind a firewall:

Set
```
HTTPS_PROXY
```
environment variable
Configure proxy in Docker if containerized
Ensure certificates are trusted

Using cloud models:

Run
```
ollama signin
```
once
Pull cloud models with
```
-cloud
```
suffix
Use same API endpoints as local models

构建聊天机器人:

使用
```
/api/chat
```
端点
在应用中维护消息历史
流式响应以提升用户体验
优雅处理错误

创建搜索用嵌入向量:

使用
```
/api/embed
```
端点
将嵌入向量存储在向量数据库中
执行相似性搜索
实现RAG（检索增强生成）

在防火墙后使用:

设置
```
HTTPS_PROXY
```
环境变量
若使用容器化，在Docker中配置代理
确保证书受信任

使用云模型:

运行一次
```
ollama signin
```
拉取带
```
-cloud
```
后缀的云模型
使用与本地模型相同的API端点

Troubleshooting

故障排除

Model Not Loading on GPU

模型未加载到GPU

Check:

bash

ollama ps

Solutions:

Verify GPU compatibility in documentation
Check CUDA/ROCm installation
Review available VRAM
Try smaller model variants

检查:

bash

ollama ps

解决方案:

在文档中验证GPU兼容性
检查CUDA/ROCm安装
查看可用VRAM
尝试更小的模型变体

Cannot Access Ollama Remotely

无法远程访问Ollama

Problem: Ollama only accessible from localhost

Solution:

bash

undefined

问题: Ollama仅可从本地主机访问

解决方案:

bash

undefined

Set OLLAMA_HOST to bind to all interfaces

export OLLAMA_HOST="0.0.0.0:11434"


See "How do I configure Ollama server?" in `llms-txt.md` for platform-specific instructions.

export OLLAMA_HOST="0.0.0.0:11434"


查看`llms-txt.md`中的"如何配置Ollama服务器？"获取平台特定说明。

Proxy Issues

代理问题

Problem: Cannot download models behind proxy

Solution:

bash

undefined

问题: 在代理后无法下载模型

解决方案:

bash

undefined

Set proxy (HTTPS only, not HTTP)

export HTTPS_PROXY=https://proxy.example.com

Restart Ollama


See "How do I use Ollama behind a proxy?" in `llms-txt.md`.


查看`llms-txt.md`中的"如何在代理后使用Ollama？"。

CORS Errors in Browser

浏览器中的CORS错误

Problem: Browser extension or web app cannot access Ollama

Solution:

bash

undefined

问题: 浏览器扩展或Web应用无法访问Ollama

解决方案:

bash

undefined

Allow specific origins

export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"


See "How can I allow additional web origins?" in `llms-txt.md`.

export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"


查看`llms-txt.md`中的"如何允许额外的Web来源？"。

Resources

资源

Official Documentation

官方文档

Main docs: https://docs.ollama.com
API Reference: https://docs.ollama.com/api
Model Library: https://ollama.com/models

Official Libraries

官方库

Python: https://github.com/ollama/ollama-python
JavaScript: https://github.com/ollama/ollama-js

Python: https://github.com/ollama/ollama-python
JavaScript: https://github.com/ollama/ollama-js

Community

社区

GitHub: https://github.com/ollama/ollama
Community Libraries: See GitHub README for full list

GitHub: https://github.com/ollama/ollama
社区库: 查看GitHub README获取完整列表

Notes

说明

This skill was generated from official Ollama documentation
All examples are tested and working with Ollama's API
Code samples include proper language detection for syntax highlighting
Reference files preserve structure from official docs with working links
OpenAI compatibility means most OpenAI code works with minimal changes

此技能由官方Ollama文档生成
所有示例均经过测试并可与Ollama API配合使用
代码示例包含正确的语言检测以支持语法高亮
参考文件保留官方文档结构并包含有效链接
OpenAI兼容性意味着大多数OpenAI代码只需少量修改即可使用

Quick Command Reference

快速命令参考

bash

undefined

bash

undefined

CLI Commands

ollama signin # Sign in to ollama.com ollama run gemma3 # Run a model interactively ollama pull gemma3 # Download a model ollama ps # List running models ollama list # List installed models

ollama signin # 登录ollama.com ollama run gemma3 # 交互式运行模型 ollama pull gemma3 # 下载模型 ollama ps # 列出运行中的模型 ollama list # 列出已安装的模型

Check API Status

检查API状态

curl http://localhost:11434/api/version

Environment Variables (Common)

常见环境变量

export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com"

undefined

export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com"

undefined