ollama-local
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOllama Local Inference
Ollama 本地推理
Run LLMs locally for cost savings, privacy, and offline development.
在本地运行大语言模型(LLM)以节省成本、保障隐私并支持离线开发。
Quick Start
快速开始
bash
undefinedbash
undefinedInstall Ollama
Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
curl -fsSL https://ollama.ai/install.sh | sh
Pull models
Pull models
ollama pull deepseek-r1:70b # Reasoning (GPT-4 level)
ollama pull qwen2.5-coder:32b # Coding
ollama pull nomic-embed-text # Embeddings
ollama pull deepseek-r1:70b # Reasoning (GPT-4 level)
ollama pull qwen2.5-coder:32b # Coding
ollama pull nomic-embed-text # Embeddings
Start server
Start server
ollama serve
undefinedollama serve
undefinedRecommended Models (M4 Max 256GB)
推荐模型(M4 Max 256GB)
| Task | Model | Size | Notes |
|---|---|---|---|
| Reasoning | | ~42GB | GPT-4 level |
| Coding | | ~35GB | 73.7% Aider benchmark |
| Embeddings | | ~0.5GB | 768 dims, fast |
| General | | ~40GB | Good all-around |
| 任务 | 模型 | 大小 | 说明 |
|---|---|---|---|
| 推理 | | ~42GB | GPT-4级别 |
| 编码 | | ~35GB | Aider基准测试得分73.7% |
| 嵌入 | | ~0.5GB | 768维度,速度快 |
| 通用 | | ~40GB | 全能型表现优异 |
LangChain Integration
LangChain 集成
python
from langchain_ollama import ChatOllama, OllamaEmbeddingspython
from langchain_ollama import ChatOllama, OllamaEmbeddingsChat model
Chat model
llm = ChatOllama(
model="deepseek-r1:70b",
base_url="http://localhost:11434",
temperature=0.0,
num_ctx=32768, # Context window
keep_alive="5m", # Keep model loaded
)
llm = ChatOllama(
model="deepseek-r1:70b",
base_url="http://localhost:11434",
temperature=0.0,
num_ctx=32768, # Context window
keep_alive="5m", # Keep model loaded
)
Embeddings
Embeddings
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434",
)
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434",
)
Generate
Generate
response = await llm.ainvoke("Explain async/await")
vector = await embeddings.aembed_query("search text")
undefinedresponse = await llm.ainvoke("Explain async/await")
vector = await embeddings.aembed_query("search text")
undefinedTool Calling with Ollama
Ollama 工具调用
python
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search the document database."""
return f"Found results for: {query}"python
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search the document database."""
return f"Found results for: {query}"Bind tools
Bind tools
llm_with_tools = llm.bind_tools([search_docs])
response = await llm_with_tools.ainvoke("Search for Python patterns")
undefinedllm_with_tools = llm.bind_tools([search_docs])
response = await llm_with_tools.ainvoke("Search for Python patterns")
undefinedStructured Output
结构化输出
python
from pydantic import BaseModel, Field
class CodeAnalysis(BaseModel):
language: str = Field(description="Programming language")
complexity: int = Field(ge=1, le=10)
issues: list[str] = Field(description="Found issues")
structured_llm = llm.with_structured_output(CodeAnalysis)
result = await structured_llm.ainvoke("Analyze this code: ...")python
from pydantic import BaseModel, Field
class CodeAnalysis(BaseModel):
language: str = Field(description="Programming language")
complexity: int = Field(ge=1, le=10)
issues: list[str] = Field(description="Found issues")
structured_llm = llm.with_structured_output(CodeAnalysis)
result = await structured_llm.ainvoke("Analyze this code: ...")result is typed CodeAnalysis object
result is typed CodeAnalysis object
undefinedundefinedProvider Factory Pattern
供应商工厂模式
python
import os
def get_llm_provider(task_type: str = "general"):
"""Auto-switch between Ollama and cloud APIs."""
if os.getenv("OLLAMA_ENABLED") == "true":
models = {
"reasoning": "deepseek-r1:70b",
"coding": "qwen2.5-coder:32b",
"general": "llama3.3:70b",
}
return ChatOllama(
model=models.get(task_type, "llama3.3:70b"),
keep_alive="5m"
)
else:
# Fall back to cloud API
return ChatOpenAI(model="gpt-5.2")python
import os
def get_llm_provider(task_type: str = "general"):
"""Auto-switch between Ollama and cloud APIs."""
if os.getenv("OLLAMA_ENABLED") == "true":
models = {
"reasoning": "deepseek-r1:70b",
"coding": "qwen2.5-coder:32b",
"general": "llama3.3:70b",
}
return ChatOllama(
model=models.get(task_type, "llama3.3:70b"),
keep_alive="5m"
)
else:
# Fall back to cloud API
return ChatOpenAI(model="gpt-5.2")Usage
Usage
llm = get_llm_provider(task_type="coding")
undefinedllm = get_llm_provider(task_type="coding")
undefinedEnvironment Configuration
环境配置
bash
undefinedbash
undefined.env.local
.env.local
OLLAMA_ENABLED=true
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL_REASONING=deepseek-r1:70b
OLLAMA_MODEL_CODING=qwen2.5-coder:32b
OLLAMA_MODEL_EMBED=nomic-embed-text
OLLAMA_ENABLED=true
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL_REASONING=deepseek-r1:70b
OLLAMA_MODEL_CODING=qwen2.5-coder:32b
OLLAMA_MODEL_EMBED=nomic-embed-text
Performance tuning (Apple Silicon)
Performance tuning (Apple Silicon)
OLLAMA_MAX_LOADED_MODELS=3 # Keep 3 models in memory
OLLAMA_KEEP_ALIVE=5m # 5 minute keep-alive
undefinedOLLAMA_MAX_LOADED_MODELS=3 # Keep 3 models in memory
OLLAMA_KEEP_ALIVE=5m # 5 minute keep-alive
undefinedCI Integration
CI 集成
yaml
undefinedyaml
undefinedGitHub Actions (self-hosted runner)
GitHub Actions (self-hosted runner)
jobs:
test:
runs-on: self-hosted # M4 Max runner
env:
OLLAMA_ENABLED: "true"
steps:
- name: Pre-warm models
run: |
curl -s http://localhost:11434/api/embeddings
-d '{"model":"nomic-embed-text","prompt":"warmup"}' > /dev/null
-d '{"model":"nomic-embed-text","prompt":"warmup"}' > /dev/null
- name: Run tests
run: pytest tests/undefinedjobs:
test:
runs-on: self-hosted # M4 Max runner
env:
OLLAMA_ENABLED: "true"
steps:
- name: Pre-warm models
run: |
curl -s http://localhost:11434/api/embeddings
-d '{"model":"nomic-embed-text","prompt":"warmup"}' > /dev/null
-d '{"model":"nomic-embed-text","prompt":"warmup"}' > /dev/null
- name: Run tests
run: pytest tests/undefinedCost Comparison
成本对比
| Provider | Monthly Cost | Latency |
|---|---|---|
| Cloud APIs | ~$675/month | 200-500ms |
| Ollama Local | ~$50 (electricity) | 50-200ms |
| Savings | 93% | 2-3x faster |
| 供应商 | 月度成本 | 延迟 |
|---|---|---|
| 云API | ~675美元/月 | 200-500毫秒 |
| Ollama本地部署 | ~50美元(电费) | 50-200毫秒 |
| 节省比例 | 93% | 速度提升2-3倍 |
Best Practices
最佳实践
- DO use in CI (avoid cold starts)
keep_alive="5m" - DO pre-warm models before first call
- DO set on Apple Silicon
num_ctx=32768 - DO use provider factory for cloud/local switching
- DON'T use (wastes memory)
keep_alive=-1 - DON'T skip pre-warming in CI (30-60s cold start)
- 建议 在CI中使用(避免冷启动)
keep_alive="5m" - 建议 在首次调用前预热模型
- 建议 在Apple Silicon设备上设置
num_ctx=32768 - 建议 使用供应商工厂模式在云/本地部署间切换
- 不建议 使用(浪费内存)
keep_alive=-1 - 不建议 在CI中跳过预热步骤(冷启动耗时30-60秒)
Troubleshooting
故障排查
bash
undefinedbash
undefinedCheck if Ollama is running
Check if Ollama is running
List loaded models
List loaded models
ollama list
ollama list
Check model memory usage
Check model memory usage
ollama ps
ollama ps
Pull specific version
Pull specific version
ollama pull deepseek-r1:70b-q4_K_M
undefinedollama pull deepseek-r1:70b-q4_K_M
undefinedRelated Skills
相关技能
- - Embedding patterns (works with nomic-embed-text)
embeddings - - Testing with local models
llm-evaluation - - Broader cost strategies
cost-optimization
- - 嵌入模式(兼容nomic-embed-text)
embeddings - - 使用本地模型进行测试
llm-evaluation - - 更全面的成本优化策略
cost-optimization
Capability Details
功能详情
setup
搭建
Keywords: setup, install, configure, ollama
Solves:
- Set up Ollama locally
- Configure for development
- Install models
关键词: setup, install, configure, ollama
解决场景:
- 在本地搭建Ollama
- 为开发场景配置Ollama
- 安装模型
model-selection
模型选择
Keywords: model, llama, mistral, qwen, selection
Solves:
- Choose appropriate model
- Compare model capabilities
- Balance speed vs quality
关键词: model, llama, mistral, qwen, selection
解决场景:
- 选择合适的模型
- 对比模型能力
- 平衡速度与质量
provider-template
供应商模板
Keywords: provider, template, python, implementation
Solves:
- Ollama provider template
- Python implementation
- Drop-in LLM provider
关键词: provider, template, python, implementation
解决场景:
- Ollama供应商模板
- Python实现
- 可直接替换的LLM供应商