ollama
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOllama Skill
Ollama 技能文档
Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
为Ollama开发提供全面支持——Ollama是一款本地AI模型运行时,可通过编程方式运行和交互大语言模型。
When to Use This Skill
何时使用此技能
This skill should be triggered when:
- Running local AI models with Ollama
- Building applications that interact with Ollama's API
- Implementing chat completions, embeddings, or streaming responses
- Setting up Ollama authentication or cloud models
- Configuring Ollama server (environment variables, ports, proxies)
- Using Ollama with OpenAI-compatible libraries
- Troubleshooting Ollama installations or GPU compatibility
- Implementing tool calling, structured outputs, or vision capabilities
- Working with Ollama in Docker or behind proxies
- Creating, copying, pushing, or managing Ollama models
当出现以下场景时,应使用此技能:
- 使用Ollama运行本地AI模型
- 构建与Ollama API交互的应用程序
- 实现聊天补全、嵌入或流式响应
- 设置Ollama身份验证或云模型
- 配置Ollama服务器(环境变量、端口、代理)
- 将Ollama与OpenAI兼容库配合使用
- 排查Ollama安装或GPU兼容性问题
- 实现工具调用、结构化输出或视觉功能
- 在Docker或代理环境下使用Ollama
- 创建、复制、推送或管理Ollama模型
Quick Reference
快速参考
1. Basic Chat Completion (cURL)
1. 基础聊天补全(cURL)
Generate a simple chat response:
bash
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'生成简单的聊天响应:
bash
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'2. Simple Text Generation (cURL)
2. 简单文本生成(cURL)
Generate a text response from a prompt:
bash
curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Why is the sky blue?"
}'根据提示生成文本响应:
bash
curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Why is the sky blue?"
}'3. Python Chat with OpenAI Library
3. 使用OpenAI库的Python聊天
Use Ollama with the OpenAI Python library:
python
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)将Ollama与OpenAI Python库配合使用:
python
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)4. Vision Model (Image Analysis)
4. 视觉模型(图像分析)
Ask questions about images:
python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "data:image/png;base64,iVBORw0KG...",
},
],
}
],
max_tokens=300,
)针对图像提问:
python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "data:image/png;base64,iVBORw0KG...",
},
],
}
],
max_tokens=300,
)5. Generate Embeddings
5. 生成嵌入向量
Create vector embeddings for text:
python
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)为文本创建向量嵌入:
python
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)6. Structured Outputs (JSON Schema)
6. 结构化输出(JSON Schema)
Get structured JSON responses:
python
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
class FriendInfo(BaseModel):
name: str
age: int
is_available: bool
class FriendList(BaseModel):
friends: list[FriendInfo]
completion = client.beta.chat.completions.parse(
temperature=0,
model="llama3.1:8b",
messages=[
{"role": "user", "content": "Return a list of friends in JSON format"}
],
response_format=FriendList,
)
friends_response = completion.choices[0].message
if friends_response.parsed:
print(friends_response.parsed)获取结构化JSON响应:
python
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
class FriendInfo(BaseModel):
name: str
age: int
is_available: bool
class FriendList(BaseModel):
friends: list[FriendInfo]
completion = client.beta.chat.completions.parse(
temperature=0,
model="llama3.1:8b",
messages=[
{"role": "user", "content": "Return a list of friends in JSON format"}
],
response_format=FriendList,
)
friends_response = completion.choices[0].message
if friends_response.parsed:
print(friends_response.parsed)7. JavaScript/TypeScript Chat
7. JavaScript/TypeScript聊天
Use Ollama with the OpenAI JavaScript library:
javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "llama3.2",
});将Ollama与OpenAI JavaScript库配合使用:
javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "llama3.2",
});8. Authentication for Cloud Models
8. 云模型身份验证
Sign in to use cloud models:
bash
undefined登录以使用云模型:
bash
undefinedSign in from CLI
Sign in from CLI
ollama signin
ollama signin
Then use cloud models
Then use cloud models
ollama run gpt-oss:120b-cloud
Or use API keys for direct cloud access:
```bash
export OLLAMA_API_KEY=your_api_key
curl https://ollama.com/api/generate \
-H "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"prompt": "Why is the sky blue?",
"stream": false
}'ollama run gpt-oss:120b-cloud
或使用API密钥直接访问云:
```bash
export OLLAMA_API_KEY=your_api_key
curl https://ollama.com/api/generate \
-H "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"prompt": "Why is the sky blue?",
"stream": false
}'9. Configure Ollama Server
9. 配置Ollama服务器
Set environment variables for server configuration:
macOS:
bash
undefined设置环境变量以配置服务器:
macOS:
bash
undefinedSet environment variable
Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
Restart Ollama application
Restart Ollama application
**Linux (systemd):**
```bash
**Linux (systemd):**
```bashEdit service
Edit service
systemctl edit ollama.service
systemctl edit ollama.service
Add under [Service]
Add under [Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_HOST=0.0.0.0:11434"
Reload and restart
Reload and restart
systemctl daemon-reload
systemctl restart ollama
**Windows:**- Quit Ollama from task bar
- Search "environment variables" in Settings
- Edit or create OLLAMA_HOST variable
- Set value: 0.0.0.0:11434
- Restart Ollama from Start menu
undefinedsystemctl daemon-reload
systemctl restart ollama
**Windows:**- Quit Ollama from task bar
- Search "environment variables" in Settings
- Edit or create OLLAMA_HOST variable
- Set value: 0.0.0.0:11434
- Restart Ollama from Start menu
undefined10. Check Model GPU Loading
10. 检查模型GPU加载情况
Verify if your model is using GPU:
bash
ollama psOutput shows:
- - Fully loaded on GPU
100% GPU - - Fully loaded in system memory
100% CPU - - Split between both
48%/52% CPU/GPU
验证模型是否使用GPU:
bash
ollama ps输出说明:
- - 完全加载到GPU
100% GPU - - 完全加载到系统内存
100% CPU - - 同时使用CPU和GPU
48%/52% CPU/GPU
Key Concepts
核心概念
Base URLs
基础URL
- Local API (default):
http://localhost:11434/api - Cloud API:
https://ollama.com/api - OpenAI Compatible: endpoints for OpenAI libraries
/v1/
- 本地API(默认):
http://localhost:11434/api - 云API:
https://ollama.com/api - OpenAI兼容: 供OpenAI库使用的端点
/v1/
Authentication
身份验证
- Local: No authentication required for
http://localhost:11434 - Cloud Models: Requires signing in () or API key
ollama signin - API Keys: For programmatic access to
https://ollama.com/api
- 本地: 无需身份验证
http://localhost:11434 - 云模型: 需要登录()或API密钥
ollama signin - API密钥: 用于以编程方式访问
https://ollama.com/api
Models
模型
- Local Models: Run on your machine (e.g., ,
gemma3,llama3.2)qwen3 - Cloud Models: Suffix (e.g.,
-cloud,gpt-oss:120b-cloud)qwen3-coder:480b-cloud - Vision Models: Support image inputs (e.g., )
llava
- 本地模型: 在本地机器运行(例如,
gemma3,llama3.2)qwen3 - 云模型: 后缀为(例如
-cloud,gpt-oss:120b-cloud)qwen3-coder:480b-cloud - 视觉模型: 支持图像输入(例如)
llava
Common Environment Variables
常见环境变量
- - Change bind address (default:
OLLAMA_HOST)127.0.0.1:11434 - - Context window size (default:
OLLAMA_CONTEXT_LENGTHtokens)2048 - - Model storage directory
OLLAMA_MODELS - - Allow additional web origins for CORS
OLLAMA_ORIGINS - - Proxy server for model downloads
HTTPS_PROXY
- - 修改绑定地址(默认:
OLLAMA_HOST)127.0.0.1:11434 - - 上下文窗口大小(默认:
OLLAMA_CONTEXT_LENGTHtokens)2048 - - 模型存储目录
OLLAMA_MODELS - - 允许额外的Web来源以支持CORS
OLLAMA_ORIGINS - - 模型下载使用的代理服务器
HTTPS_PROXY
Error Handling
错误处理
Status Codes:
- - Success
200 - - Bad Request (invalid parameters)
400 - - Not Found (model doesn't exist)
404 - - Too Many Requests (rate limit)
429 - - Internal Server Error
500 - - Bad Gateway (cloud model unreachable)
502
Error Format:
json
{
"error": "the model failed to generate a response"
}状态码:
- - 成功
200 - - 请求错误(参数无效)
400 - - 未找到(模型不存在)
404 - - 请求过多(速率限制)
429 - - 内部服务器错误
500 - - 网关错误(云模型无法访问)
502
错误格式:
json
{
"error": "the model failed to generate a response"
}Streaming vs Non-Streaming
流式与非流式响应
- Streaming (default): Returns response chunks as JSON objects (NDJSON)
- Non-Streaming: Set to get complete response in one object
"stream": false
- 流式(默认): 以JSON对象(NDJSON)形式返回响应片段
- 非流式: 设置以一次性获取完整响应
"stream": false
Reference Files
参考文件
This skill includes comprehensive documentation in :
references/-
llms-txt.md - Complete API reference covering:
- All API endpoints (,
/api/generate,/api/chat, etc.)/api/embed - Authentication methods (signin, API keys)
- Error handling and status codes
- OpenAI compatibility layer
- Cloud models usage
- Streaming responses
- Configuration and environment variables
- All API endpoints (
-
llms.md - Documentation index listing all available topics:
- API reference (version, model details, chat, generate, embeddings)
- Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
- CLI reference
- Cloud integration
- Platform-specific guides (Linux, macOS, Windows, Docker)
- IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)
Use the reference files when you need:
- Detailed API parameter specifications
- Complete endpoint documentation
- Advanced configuration options
- Platform-specific setup instructions
- Integration guides for specific tools
此技能在目录中包含完整文档:
references/-
llms-txt.md - 完整API参考,涵盖:
- 所有API端点(,
/api/generate,/api/chat等)/api/embed - 身份验证方法(登录、API密钥)
- 错误处理和状态码
- OpenAI兼容层
- 云模型使用
- 流式响应
- 配置和环境变量
- 所有API端点(
-
llms.md - 文档索引,列出所有可用主题:
- API参考(版本、模型详情、聊天、生成、嵌入)
- 功能(嵌入、流式、结构化输出、工具调用、视觉)
- CLI参考
- 云集成
- 平台特定指南(Linux、macOS、Windows、Docker)
- IDE集成(VS Code、JetBrains、Xcode、Zed、Cline)
当你需要以下信息时使用参考文件:
- 详细的API参数规范
- 完整的端点文档
- 高级配置选项
- 平台特定设置说明
- 特定工具的集成指南
Working with This Skill
使用此技能的指南
For Beginners
初学者
Start with these common patterns:
- Simple generation: Use endpoint with a prompt
/api/generate - Chat interface: Use with messages array
/api/chat - OpenAI compatibility: Use OpenAI libraries with
base_url='http://localhost:11434/v1/' - Check GPU usage: Run to verify model loading
ollama ps
Read section on "Introduction" and "Quickstart" for foundational concepts.
llms-txt.md从以下常见模式开始:
- 简单生成: 使用端点和提示词
/api/generate - 聊天界面: 使用和消息数组
/api/chat - OpenAI兼容: 使用OpenAI库并设置
base_url='http://localhost:11434/v1/' - 检查GPU使用: 运行验证模型加载情况
ollama ps
阅读中的"简介"和"快速入门"部分以了解基础概念。
llms-txt.mdFor Intermediate Users
中级用户
Focus on:
- Embeddings for semantic search and RAG applications
- Structured outputs with JSON schema validation
- Vision models for image analysis
- Streaming for real-time response generation
- Authentication for cloud models
Check the specific API endpoints in for detailed parameter options.
llms-txt.md重点关注:
- 嵌入向量用于语义搜索和RAG应用
- 结构化输出与JSON schema验证
- 视觉模型用于图像分析
- 流式响应用于实时生成
- 身份验证用于云模型
查看中的特定API端点以获取详细参数选项。
llms-txt.mdFor Advanced Users
高级用户
Explore:
- Tool calling for function execution
- Custom model creation with Modelfiles
- Server configuration with environment variables
- Proxy setup for network-restricted environments
- Docker deployment with custom configurations
- Performance optimization with GPU settings
Refer to platform-specific sections in and configuration details in .
llms.mdllms-txt.md探索:
- 工具调用用于函数执行
- 自定义模型创建使用Modelfiles
- 服务器配置使用环境变量
- 代理设置用于网络受限环境
- Docker部署与自定义配置
- 性能优化使用GPU设置
参考中的平台特定部分和中的配置详情。
llms.mdllms-txt.mdCommon Use Cases
常见用例
Building a chatbot:
- Use endpoint
/api/chat - Maintain message history in your application
- Stream responses for better UX
- Handle errors gracefully
Creating embeddings for search:
- Use endpoint
/api/embed - Store embeddings in vector database
- Perform similarity search
- Implement RAG (Retrieval Augmented Generation)
Running behind a firewall:
- Set environment variable
HTTPS_PROXY - Configure proxy in Docker if containerized
- Ensure certificates are trusted
Using cloud models:
- Run once
ollama signin - Pull cloud models with suffix
-cloud - Use same API endpoints as local models
构建聊天机器人:
- 使用端点
/api/chat - 在应用中维护消息历史
- 流式响应以提升用户体验
- 优雅处理错误
创建搜索用嵌入向量:
- 使用端点
/api/embed - 将嵌入向量存储在向量数据库中
- 执行相似性搜索
- 实现RAG(检索增强生成)
在防火墙后使用:
- 设置环境变量
HTTPS_PROXY - 若使用容器化,在Docker中配置代理
- 确保证书受信任
使用云模型:
- 运行一次
ollama signin - 拉取带后缀的云模型
-cloud - 使用与本地模型相同的API端点
Troubleshooting
故障排除
Model Not Loading on GPU
模型未加载到GPU
Check:
bash
ollama psSolutions:
- Verify GPU compatibility in documentation
- Check CUDA/ROCm installation
- Review available VRAM
- Try smaller model variants
检查:
bash
ollama ps解决方案:
- 在文档中验证GPU兼容性
- 检查CUDA/ROCm安装
- 查看可用VRAM
- 尝试更小的模型变体
Cannot Access Ollama Remotely
无法远程访问Ollama
Problem: Ollama only accessible from localhost
Solution:
bash
undefined问题: Ollama仅可从本地主机访问
解决方案:
bash
undefinedSet OLLAMA_HOST to bind to all interfaces
Set OLLAMA_HOST to bind to all interfaces
export OLLAMA_HOST="0.0.0.0:11434"
See "How do I configure Ollama server?" in `llms-txt.md` for platform-specific instructions.export OLLAMA_HOST="0.0.0.0:11434"
查看`llms-txt.md`中的"如何配置Ollama服务器?"获取平台特定说明。Proxy Issues
代理问题
Problem: Cannot download models behind proxy
Solution:
bash
undefined问题: 在代理后无法下载模型
解决方案:
bash
undefinedSet proxy (HTTPS only, not HTTP)
Set proxy (HTTPS only, not HTTP)
export HTTPS_PROXY=https://proxy.example.com
export HTTPS_PROXY=https://proxy.example.com
Restart Ollama
Restart Ollama
See "How do I use Ollama behind a proxy?" in `llms-txt.md`.
查看`llms-txt.md`中的"如何在代理后使用Ollama?"。CORS Errors in Browser
浏览器中的CORS错误
Problem: Browser extension or web app cannot access Ollama
Solution:
bash
undefined问题: 浏览器扩展或Web应用无法访问Ollama
解决方案:
bash
undefinedAllow specific origins
Allow specific origins
export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"
See "How can I allow additional web origins?" in `llms-txt.md`.export OLLAMA_ORIGINS="chrome-extension://,moz-extension://"
查看`llms-txt.md`中的"如何允许额外的Web来源?"。Resources
资源
Official Documentation
官方文档
- Main docs: https://docs.ollama.com
- API Reference: https://docs.ollama.com/api
- Model Library: https://ollama.com/models
- 主文档: https://docs.ollama.com
- API参考: https://docs.ollama.com/api
- 模型库: https://ollama.com/models
Official Libraries
官方库
- Python: https://github.com/ollama/ollama-python
- JavaScript: https://github.com/ollama/ollama-js
- Python: https://github.com/ollama/ollama-python
- JavaScript: https://github.com/ollama/ollama-js
Community
社区
- GitHub: https://github.com/ollama/ollama
- Community Libraries: See GitHub README for full list
- GitHub: https://github.com/ollama/ollama
- 社区库: 查看GitHub README获取完整列表
Notes
说明
- This skill was generated from official Ollama documentation
- All examples are tested and working with Ollama's API
- Code samples include proper language detection for syntax highlighting
- Reference files preserve structure from official docs with working links
- OpenAI compatibility means most OpenAI code works with minimal changes
- 此技能由官方Ollama文档生成
- 所有示例均经过测试并可与Ollama API配合使用
- 代码示例包含正确的语言检测以支持语法高亮
- 参考文件保留官方文档结构并包含有效链接
- OpenAI兼容性意味着大多数OpenAI代码只需少量修改即可使用
Quick Command Reference
快速命令参考
bash
undefinedbash
undefinedCLI Commands
CLI Commands
ollama signin # Sign in to ollama.com
ollama run gemma3 # Run a model interactively
ollama pull gemma3 # Download a model
ollama ps # List running models
ollama list # List installed models
ollama signin # 登录ollama.com
ollama run gemma3 # 交互式运行模型
ollama pull gemma3 # 下载模型
ollama ps # 列出运行中的模型
ollama list # 列出已安装的模型
Check API Status
检查API状态
Environment Variables (Common)
常见环境变量
export OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"
undefinedexport OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"
undefined