ollama

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ollama Skill

Ollama 技能文档

Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
为Ollama开发提供全面支持——Ollama是一款本地AI模型运行时,可通过编程方式运行和交互大语言模型。

When to Use This Skill

何时使用此技能

This skill should be triggered when:
  • Running local AI models with Ollama
  • Building applications that interact with Ollama's API
  • Implementing chat completions, embeddings, or streaming responses
  • Setting up Ollama authentication or cloud models
  • Configuring Ollama server (environment variables, ports, proxies)
  • Using Ollama with OpenAI-compatible libraries
  • Troubleshooting Ollama installations or GPU compatibility
  • Implementing tool calling, structured outputs, or vision capabilities
  • Working with Ollama in Docker or behind proxies
  • Creating, copying, pushing, or managing Ollama models
当出现以下场景时,应使用此技能:
  • 使用Ollama运行本地AI模型
  • 构建与Ollama API交互的应用程序
  • 实现聊天补全、嵌入或流式响应
  • 设置Ollama身份验证或云模型
  • 配置Ollama服务器(环境变量、端口、代理)
  • 将Ollama与OpenAI兼容库配合使用
  • 排查Ollama安装或GPU兼容性问题
  • 实现工具调用、结构化输出或视觉功能
  • 在Docker或代理环境下使用Ollama
  • 创建、复制、推送或管理Ollama模型

Quick Reference

快速参考

1. Basic Chat Completion (cURL)

1. 基础聊天补全(cURL)

Generate a simple chat response:
bash
curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'
生成简单的聊天响应:
bash
curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

2. Simple Text Generation (cURL)

2. 简单文本生成(cURL)

Generate a text response from a prompt:
bash
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'
根据提示生成文本响应:
bash
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

3. Python Chat with OpenAI Library

3. 使用OpenAI库的Python聊天

Use Ollama with the OpenAI Python library:
python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)
将Ollama与OpenAI Python库配合使用:
python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

4. Vision Model (Image Analysis)

4. 视觉模型(图像分析)

Ask questions about images:
python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KG...",
                },
            ],
        }
    ],
    max_tokens=300,
)
针对图像提问:
python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KG...",
                },
            ],
        }
    ],
    max_tokens=300,
)

5. Generate Embeddings

5. 生成嵌入向量

Create vector embeddings for text:
python
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)
为文本创建向量嵌入:
python
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

6. Structured Outputs (JSON Schema)

6. 结构化输出(JSON Schema)

Get structured JSON responses:
python
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)
获取结构化JSON响应:
python
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)

7. JavaScript/TypeScript Chat

7. JavaScript/TypeScript聊天

Use Ollama with the OpenAI JavaScript library:
javascript
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});
将Ollama与OpenAI JavaScript库配合使用:
javascript
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});

8. Authentication for Cloud Models

8. 云模型身份验证

Sign in to use cloud models:
bash
undefined
登录以使用云模型:
bash
undefined

Sign in from CLI

Sign in from CLI

ollama signin
ollama signin

Then use cloud models

Then use cloud models

ollama run gpt-oss:120b-cloud

Or use API keys for direct cloud access:

```bash
export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'
ollama run gpt-oss:120b-cloud

或使用API密钥直接访问云:

```bash
export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

9. Configure Ollama Server

9. 配置Ollama服务器

Set environment variables for server configuration:
macOS:
bash
undefined
设置环境变量以配置服务器:
macOS:
bash
undefined

Set environment variable

Set environment variable

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

Restart Ollama application

Restart Ollama application


**Linux (systemd):**
```bash

**Linux (systemd):**
```bash

Edit service

Edit service

systemctl edit ollama.service
systemctl edit ollama.service

Add under [Service]

Add under [Service]

Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload and restart

Reload and restart

systemctl daemon-reload systemctl restart ollama

**Windows:**
  1. Quit Ollama from task bar
  2. Search "environment variables" in Settings
  3. Edit or create OLLAMA_HOST variable
  4. Set value: 0.0.0.0:11434
  5. Restart Ollama from Start menu
undefined
systemctl daemon-reload systemctl restart ollama

**Windows:**
  1. Quit Ollama from task bar
  2. Search "environment variables" in Settings
  3. Edit or create OLLAMA_HOST variable
  4. Set value: 0.0.0.0:11434
  5. Restart Ollama from Start menu
undefined

10. Check Model GPU Loading

10. 检查模型GPU加载情况

Verify if your model is using GPU:
bash
ollama ps
Output shows:
  • 100% GPU
    - Fully loaded on GPU
  • 100% CPU
    - Fully loaded in system memory
  • 48%/52% CPU/GPU
    - Split between both
验证模型是否使用GPU:
bash
ollama ps
输出说明:
  • 100% GPU
    - 完全加载到GPU
  • 100% CPU
    - 完全加载到系统内存
  • 48%/52% CPU/GPU
    - 同时使用CPU和GPU

Key Concepts

核心概念

Base URLs

基础URL

  • Local API (default):
    http://localhost:11434/api
  • Cloud API:
    https://ollama.com/api
  • OpenAI Compatible:
    /v1/
    endpoints for OpenAI libraries
  • 本地API(默认):
    http://localhost:11434/api
  • 云API:
    https://ollama.com/api
  • OpenAI兼容: 供OpenAI库使用的
    /v1/
    端点

Authentication

身份验证

  • Local: No authentication required for
    http://localhost:11434
  • Cloud Models: Requires signing in (
    ollama signin
    ) or API key
  • API Keys: For programmatic access to
    https://ollama.com/api
  • 本地:
    http://localhost:11434
    无需身份验证
  • 云模型: 需要登录(
    ollama signin
    )或API密钥
  • API密钥: 用于以编程方式访问
    https://ollama.com/api

Models

模型

  • Local Models: Run on your machine (e.g.,
    gemma3
    ,
    llama3.2
    ,
    qwen3
    )
  • Cloud Models: Suffix
    -cloud
    (e.g.,
    gpt-oss:120b-cloud
    ,
    qwen3-coder:480b-cloud
    )
  • Vision Models: Support image inputs (e.g.,
    llava
    )
  • 本地模型: 在本地机器运行(例如
    gemma3
    ,
    llama3.2
    ,
    qwen3
  • 云模型: 后缀为
    -cloud
    (例如
    gpt-oss:120b-cloud
    ,
    qwen3-coder:480b-cloud
  • 视觉模型: 支持图像输入(例如
    llava

Common Environment Variables

常见环境变量

  • OLLAMA_HOST
    - Change bind address (default:
    127.0.0.1:11434
    )
  • OLLAMA_CONTEXT_LENGTH
    - Context window size (default:
    2048
    tokens)
  • OLLAMA_MODELS
    - Model storage directory
  • OLLAMA_ORIGINS
    - Allow additional web origins for CORS
  • HTTPS_PROXY
    - Proxy server for model downloads
  • OLLAMA_HOST
    - 修改绑定地址(默认:
    127.0.0.1:11434
  • OLLAMA_CONTEXT_LENGTH
    - 上下文窗口大小(默认:
    2048
    tokens)
  • OLLAMA_MODELS
    - 模型存储目录
  • OLLAMA_ORIGINS
    - 允许额外的Web来源以支持CORS
  • HTTPS_PROXY
    - 模型下载使用的代理服务器

Error Handling

错误处理

Status Codes:
  • 200
    - Success
  • 400
    - Bad Request (invalid parameters)
  • 404
    - Not Found (model doesn't exist)
  • 429
    - Too Many Requests (rate limit)
  • 500
    - Internal Server Error
  • 502
    - Bad Gateway (cloud model unreachable)
Error Format:
json
{
  "error": "the model failed to generate a response"
}
状态码:
  • 200
    - 成功
  • 400
    - 请求错误(参数无效)
  • 404
    - 未找到(模型不存在)
  • 429
    - 请求过多(速率限制)
  • 500
    - 内部服务器错误
  • 502
    - 网关错误(云模型无法访问)
错误格式:
json
{
  "error": "the model failed to generate a response"
}

Streaming vs Non-Streaming

流式与非流式响应

  • Streaming (default): Returns response chunks as JSON objects (NDJSON)
  • Non-Streaming: Set
    "stream": false
    to get complete response in one object
  • 流式(默认): 以JSON对象(NDJSON)形式返回响应片段
  • 非流式: 设置
    "stream": false
    以一次性获取完整响应

Reference Files

参考文件

This skill includes comprehensive documentation in
references/
:
  • llms-txt.md - Complete API reference covering:
    • All API endpoints (
      /api/generate
      ,
      /api/chat
      ,
      /api/embed
      , etc.)
    • Authentication methods (signin, API keys)
    • Error handling and status codes
    • OpenAI compatibility layer
    • Cloud models usage
    • Streaming responses
    • Configuration and environment variables
  • llms.md - Documentation index listing all available topics:
    • API reference (version, model details, chat, generate, embeddings)
    • Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
    • CLI reference
    • Cloud integration
    • Platform-specific guides (Linux, macOS, Windows, Docker)
    • IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)
Use the reference files when you need:
  • Detailed API parameter specifications
  • Complete endpoint documentation
  • Advanced configuration options
  • Platform-specific setup instructions
  • Integration guides for specific tools
此技能在
references/
目录中包含完整文档:
  • llms-txt.md - 完整API参考,涵盖:
    • 所有API端点(
      /api/generate
      ,
      /api/chat
      ,
      /api/embed
      等)
    • 身份验证方法(登录、API密钥)
    • 错误处理和状态码
    • OpenAI兼容层
    • 云模型使用
    • 流式响应
    • 配置和环境变量
  • llms.md - 文档索引,列出所有可用主题:
    • API参考(版本、模型详情、聊天、生成、嵌入)
    • 功能(嵌入、流式、结构化输出、工具调用、视觉)
    • CLI参考
    • 云集成
    • 平台特定指南(Linux、macOS、Windows、Docker)
    • IDE集成(VS Code、JetBrains、Xcode、Zed、Cline)
当你需要以下信息时使用参考文件:
  • 详细的API参数规范
  • 完整的端点文档
  • 高级配置选项
  • 平台特定设置说明
  • 特定工具的集成指南

Working with This Skill

使用此技能的指南

For Beginners

初学者

Start with these common patterns:
  1. Simple generation: Use
    /api/generate
    endpoint with a prompt
  2. Chat interface: Use
    /api/chat
    with messages array
  3. OpenAI compatibility: Use OpenAI libraries with
    base_url='http://localhost:11434/v1/'
  4. Check GPU usage: Run
    ollama ps
    to verify model loading
Read
llms-txt.md
section on "Introduction" and "Quickstart" for foundational concepts.
从以下常见模式开始:
  1. 简单生成: 使用
    /api/generate
    端点和提示词
  2. 聊天界面: 使用
    /api/chat
    和消息数组
  3. OpenAI兼容: 使用OpenAI库并设置
    base_url='http://localhost:11434/v1/'
  4. 检查GPU使用: 运行
    ollama ps
    验证模型加载情况
阅读
llms-txt.md
中的"简介"和"快速入门"部分以了解基础概念。

For Intermediate Users

中级用户

Focus on:
  • Embeddings for semantic search and RAG applications
  • Structured outputs with JSON schema validation
  • Vision models for image analysis
  • Streaming for real-time response generation
  • Authentication for cloud models
Check the specific API endpoints in
llms-txt.md
for detailed parameter options.
重点关注:
  • 嵌入向量用于语义搜索和RAG应用
  • 结构化输出与JSON schema验证
  • 视觉模型用于图像分析
  • 流式响应用于实时生成
  • 身份验证用于云模型
查看
llms-txt.md
中的特定API端点以获取详细参数选项。

For Advanced Users

高级用户

Explore:
  • Tool calling for function execution
  • Custom model creation with Modelfiles
  • Server configuration with environment variables
  • Proxy setup for network-restricted environments
  • Docker deployment with custom configurations
  • Performance optimization with GPU settings
Refer to platform-specific sections in
llms.md
and configuration details in
llms-txt.md
.
探索:
  • 工具调用用于函数执行
  • 自定义模型创建使用Modelfiles
  • 服务器配置使用环境变量
  • 代理设置用于网络受限环境
  • Docker部署与自定义配置
  • 性能优化使用GPU设置
参考
llms.md
中的平台特定部分和
llms-txt.md
中的配置详情。

Common Use Cases

常见用例

Building a chatbot:
  1. Use
    /api/chat
    endpoint
  2. Maintain message history in your application
  3. Stream responses for better UX
  4. Handle errors gracefully
Creating embeddings for search:
  1. Use
    /api/embed
    endpoint
  2. Store embeddings in vector database
  3. Perform similarity search
  4. Implement RAG (Retrieval Augmented Generation)
Running behind a firewall:
  1. Set
    HTTPS_PROXY
    environment variable
  2. Configure proxy in Docker if containerized
  3. Ensure certificates are trusted
Using cloud models:
  1. Run
    ollama signin
    once
  2. Pull cloud models with
    -cloud
    suffix
  3. Use same API endpoints as local models
构建聊天机器人:
  1. 使用
    /api/chat
    端点
  2. 在应用中维护消息历史
  3. 流式响应以提升用户体验
  4. 优雅处理错误
创建搜索用嵌入向量:
  1. 使用
    /api/embed
    端点
  2. 将嵌入向量存储在向量数据库中
  3. 执行相似性搜索
  4. 实现RAG(检索增强生成)
在防火墙后使用:
  1. 设置
    HTTPS_PROXY
    环境变量
  2. 若使用容器化,在Docker中配置代理
  3. 确保证书受信任
使用云模型:
  1. 运行一次
    ollama signin
  2. 拉取带
    -cloud
    后缀的云模型
  3. 使用与本地模型相同的API端点

Troubleshooting

故障排除

Model Not Loading on GPU

模型未加载到GPU

Check:
bash
ollama ps
Solutions:
  • Verify GPU compatibility in documentation
  • Check CUDA/ROCm installation
  • Review available VRAM
  • Try smaller model variants
检查:
bash
ollama ps
解决方案:
  • 在文档中验证GPU兼容性
  • 检查CUDA/ROCm安装
  • 查看可用VRAM
  • 尝试更小的模型变体

Cannot Access Ollama Remotely

无法远程访问Ollama

Problem: Ollama only accessible from localhost
Solution:
bash
undefined
问题: Ollama仅可从本地主机访问
解决方案:
bash
undefined

Set OLLAMA_HOST to bind to all interfaces

Set OLLAMA_HOST to bind to all interfaces

export OLLAMA_HOST="0.0.0.0:11434"

See "How do I configure Ollama server?" in `llms-txt.md` for platform-specific instructions.
export OLLAMA_HOST="0.0.0.0:11434"

查看`llms-txt.md`中的"如何配置Ollama服务器?"获取平台特定说明。

Proxy Issues

代理问题

Problem: Cannot download models behind proxy
Solution:
bash
undefined
问题: 在代理后无法下载模型
解决方案:
bash
undefined

Set proxy (HTTPS only, not HTTP)

Set proxy (HTTPS only, not HTTP)

export HTTPS_PROXY=https://proxy.example.com
export HTTPS_PROXY=https://proxy.example.com

Restart Ollama

Restart Ollama


See "How do I use Ollama behind a proxy?" in `llms-txt.md`.

查看`llms-txt.md`中的"如何在代理后使用Ollama?"。

CORS Errors in Browser

浏览器中的CORS错误

Problem: Browser extension or web app cannot access Ollama
Solution:
bash
undefined
问题: 浏览器扩展或Web应用无法访问Ollama
解决方案:
bash
undefined

Allow specific origins

Allow specific origins

export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"

See "How can I allow additional web origins?" in `llms-txt.md`.
export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"

查看`llms-txt.md`中的"如何允许额外的Web来源?"。

Resources

资源

Official Documentation

官方文档

Official Libraries

官方库

Community

社区

Notes

说明

  • This skill was generated from official Ollama documentation
  • All examples are tested and working with Ollama's API
  • Code samples include proper language detection for syntax highlighting
  • Reference files preserve structure from official docs with working links
  • OpenAI compatibility means most OpenAI code works with minimal changes
  • 此技能由官方Ollama文档生成
  • 所有示例均经过测试并可与Ollama API配合使用
  • 代码示例包含正确的语言检测以支持语法高亮
  • 参考文件保留官方文档结构并包含有效链接
  • OpenAI兼容性意味着大多数OpenAI代码只需少量修改即可使用

Quick Command Reference

快速命令参考

bash
undefined
bash
undefined

CLI Commands

CLI Commands

ollama signin # Sign in to ollama.com ollama run gemma3 # Run a model interactively ollama pull gemma3 # Download a model ollama ps # List running models ollama list # List installed models
ollama signin # 登录ollama.com ollama run gemma3 # 交互式运行模型 ollama pull gemma3 # 下载模型 ollama ps # 列出运行中的模型 ollama list # 列出已安装的模型

Check API Status

检查API状态

Environment Variables (Common)

常见环境变量

export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com"
undefined
export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com"
undefined