openclaw-cost-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenClaw Cost Optimization
OpenClaw成本优化
Your OpenClaw setup is probably burning money on three things: using expensive models for trivial tasks, re-processing the same static files on every message, and loading context the agent never touches.
The default config optimizes for capability. That's fine when you're exploring. But once you're running 24/7, it's the difference between $1,500/month and $150/month or even less.
This skill audits your current setup and applies six optimizations in order of impact. Most take under 5 minutes. Combined, they cut costs by roughly 97% without reducing quality where it matters.
你的OpenClaw部署可能在三件事上浪费资金:用昂贵的模型处理琐碎任务、每条消息都重复处理相同的静态文件、加载Agent根本用不到的上下文。
默认配置优先保障能力,在探索阶段没问题,但一旦你开始7*24小时运行,这就会导致月度支出是1500美元还是150美元甚至更低的差异。
本功能会审计你当前的部署,并按照影响优先级应用六项优化。大多数优化耗时不超过5分钟。组合使用后,可在不影响核心场景质量的前提下,将成本降低约97%。
What Claude Gets Wrong Without This
不做这些优化时Claude的不合理行为
Left to its own defaults, an OpenClaw agent will:
- Run Sonnet or Opus for "what's the weather?" — that's a Haiku job
- Load 50KB of context at session start when 8KB would do
- Send heartbeat checks to a paid API 96 times a day
- Re-process your SOUL.md and USER.md at full price on every single message
- Let a search loop burn $20 overnight with no guardrails
None of these are bugs. They're just defaults that weren't designed for 24/7 autonomous operation.
使用默认配置时,OpenClaw Agent会出现以下问题:
- 用Sonnet或Opus处理“天气怎么样?”这类问题——这本来是Haiku就能完成的工作
- 会话启动时加载50KB上下文,实际上8KB就足够
- 每天向付费API发送96次心跳检查请求
- 每条消息都全额付费重新处理SOUL.md和USER.md
- 没有防护机制的情况下,搜索循环一晚上就可能消耗20美元
这些都不是bug,只是默认配置并非为7*24小时自动化运行设计。
The Six Optimizations
六项优化措施
Ordered by impact. Start at #1 and work down.
按影响优先级排序,从第1项开始依次执行。
1. Route the Right Model to the Right Job
1. 为合适的任务匹配正确的模型
Impact: $40-300/month saved depending on usage
This is the single biggest lever. Your agent handles many different types of work — chat, heartbeats, coding, web crawling, image analysis — and each has a model that hits the sweet spot of quality vs cost. Using Opus for a heartbeat check is like hiring a surgeon to take your temperature.
The model routing map:
| Task | Best (no budget) | Budget alternative | Savings |
|---|---|---|---|
| Chat (brain) | Opus 4.6 | Kimi K2.5 | Massive — near-Opus intelligence, slightly less personality |
| Heartbeat checks | Haiku 4.5 | Haiku 4.5 (+ reduce frequency) | ~$50/month |
| Coding / overnight dev | Codex GPT-5.2 | Miniax 2.1 | ~$250/month on long coding runs |
| Web browsing / crawling | Opus 4.6 | DeepSeek V3 | Hundreds/month if you crawl a lot |
| Image understanding | Opus 4.6 | Gemini 2.5 Flash | Significant — Flash is very capable on vision |
| Routine tasks | Haiku 4.5 | Haiku 4.5 | 10x cheaper than Sonnet |
The key insight: You're not picking ONE cheap model. You're building a roster where each task gets the cheapest model that handles it well.
Default model — set your daily driver to Haiku or Kimi depending on how much personality matters:
json5
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
}
}
}
}Model aliases — define aliases in so cron jobs and agents can reference models by short name:
agents.defaults.modelsjson5
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-haiku-4-5": { "alias": "haiku" },
"anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
"anthropic/claude-opus-4-6": { "alias": "opus" },
"openai/codex-gpt-5.2": { "alias": "code" },
"deepseek/deepseek-v3": { "alias": "crawl" },
"google/gemini-2.5-flash": { "alias": "vision" }
}
}
}
}Heartbeat frequency matters too. Default is often every 10 minutes. If you're running Opus for heartbeats, that's ~$2/day / ~$54/month just on "anything need attention?" checks. Switch to Haiku AND reduce frequency to every hour:
json5
{
"heartbeat": {
"every": "1h",
"model": "anthropic/claude-haiku-4-5"
}
}That alone drops heartbeat costs to $0.01-0.10/day.
Add to your agent's SOUL.md or system prompt:
markdown
undefined影响:根据使用情况每月可节省40-300美元
这是效果最显著的优化手段。你的Agent需要处理多种不同类型的工作——聊天、心跳检查、编码、网页爬取、图像分析——每类任务都有对应的质量与成本平衡点最优的模型。用Opus处理心跳检查就好比聘请外科医生来给你量体温。
模型路由映射表:
| 任务 | 最优(无预算限制) | 预算替代方案 | 节省金额 |
|---|---|---|---|
| 聊天(核心能力) | Opus 4.6 | Kimi K2.5 | 极高——接近Opus的智能水平,仅个性表现略有差异 |
| 心跳检查 | Haiku 4.5 | Haiku 4.5(+降低检查频率) | 约50美元/月 |
| 编码/通宵开发任务 | Codex GPT-5.2 | Miniax 2.1 | 长编码任务可节省约250美元/月 |
| 网页浏览/爬取 | Opus 4.6 | DeepSeek V3 | 爬取量大时每月可节省数百美元 |
| 图像理解 | Opus 4.6 | Gemini 2.5 Flash | 节省显著——Flash在视觉任务上表现非常出色 |
| 常规任务 | Haiku 4.5 | Haiku 4.5 | 比Sonnet便宜10倍 |
核心思路: 你不需要选择某一个便宜的模型,而是要建立一个模型列表,让每个任务都使用能满足需求的最便宜的模型。
默认模型——根据你对个性表现的需求,将日常使用的默认模型设置为Haiku或Kimi:
json5
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
}
}
}
}模型别名——在中定义别名,这样定时任务和Agent就可以通过短名称引用模型:
agents.defaults.modelsjson5
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-haiku-4-5": { "alias": "haiku" },
"anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
"anthropic/claude-opus-4-6": { "alias": "opus" },
"openai/codex-gpt-5.2": { "alias": "code" },
"deepseek/deepseek-v3": { "alias": "crawl" },
"google/gemini-2.5-flash": { "alias": "vision" }
}
}
}
}心跳频率也很重要。 默认频率通常是每10分钟一次。如果你用Opus处理心跳,仅“有没有需要处理的事项?”这类检查每月就要花费约54美元(约2美元/天)。切换为Haiku并将频率降低到每小时一次:
json5
{
"heartbeat": {
"every": "1h",
"model": "anthropic/claude-haiku-4-5"
}
}仅这一项就可以将心跳成本降低到0.01-0.10美元/天。
添加到Agent的SOUL.md或系统提示词中:
markdown
undefinedModel Selection
模型选择
Default: Haiku for routine tasks.
Escalate to Sonnet/Opus only for: architecture decisions, code review, security analysis, complex reasoning.
For coding sessions: prefer Codex or dedicated coding model.
For web crawling: use DeepSeek V3.
For image analysis: use Gemini Flash.
When uncertain, try the cheaper model first.
The model follows this as a behavioral constraint. No code changes needed.
---默认:常规任务使用Haiku。
仅在以下场景升级为Sonnet/Opus:架构决策、代码评审、安全分析、复杂推理。
编码会话:优先使用Codex或专用编码模型。
网页爬取:使用DeepSeek V3。
图像分析:使用Gemini Flash。
不确定时,优先尝试更便宜的模型。
模型会遵守该行为约束,无需修改代码。
---2. Cache Stable Context
2. 缓存稳定上下文
Impact: up to 90% discount on repeated tokens
Every message re-reads SOUL.md, USER.md, tool definitions, and reference docs. These files don't change between requests. Without caching, you're paying full price to re-process identical content dozens of times a day.
Anthropic's prompt caching charges 10% for cached tokens on reuse. For content you send repeatedly, that's a 90% cut on your most frequent cost.
What to cache (stable, rarely updated):
- SOUL.md — personality, rules
- USER.md — user profile, preferences
- TOOLS.md — tool definitions
- Reference docs, project specs
What NOT to cache (changes frequently):
- Daily memory files
- Conversation history
- Tool outputs, search results
Config — set per model in :
cacheRetentionagents.defaults.modelsjson5
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-opus-4-6": {
"alias": "opus",
"params": { "cacheRetention": "short" }
}
}
}
}
}| Value | Cache Duration | Notes |
|---|---|---|
| No caching | Disables prompt caching |
| 5 minutes | Default for API key auth |
| 1 hour | Requires beta flag |
Note: OpenClaw automatically applies(5-min cache) when using Anthropic API key authentication. You only need to set this explicitly if you want"short"or"long"."none"
Tips for maximizing cache hits:
- Batch related requests within the cache window (5 min for , 1h for
short)long - Align heartbeat frequency just under the cache TTL (e.g., every 4 min for , every 55 min for
short) to keep caches warmlong - Don't edit SOUL.md mid-conversation — each change invalidates the cache
- Keep stable content at the top of your context hierarchy
影响:重复Token最多可享受90%的费用折扣
每条消息都会重新读取SOUL.md、USER.md、工具定义和参考文档,这些文件在不同请求之间不会变化。如果没有缓存,你每天要全额付费重复处理相同的内容数十次。
Anthropic的提示词缓存对重复使用的缓存Token仅收取10%的费用。对于你重复发送的内容,这可以将你最常产生的成本降低90%。
需要缓存的内容(稳定,极少更新):
- SOUL.md——个性设定、规则
- USER.md——用户档案、偏好
- TOOLS.md——工具定义
- 参考文档、项目规范
不需要缓存的内容(变化频繁):
- 每日记忆文件
- 对话历史
- 工具输出、搜索结果
配置——在中为每个模型设置:
agents.defaults.modelscacheRetentionjson5
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-opus-4-6": {
"alias": "opus",
"params": { "cacheRetention": "short" }
}
}
}
}
}| 取值 | 缓存时长 | 说明 |
|---|---|---|
| 无缓存 | 禁用提示词缓存 |
| 5分钟 | API密钥认证的默认值 |
| 1小时 | 需要测试版权限标识 |
注意: 使用Anthropic API密钥认证时,OpenClaw会自动应用(5分钟缓存)。仅当你需要设置"short"或"long"时才需要显式配置。"none"
最大化缓存命中率的技巧:
- 在缓存窗口内(为5分钟,
short为1小时)批量处理相关请求long - 将心跳频率设置为略低于缓存TTL(例如缓存设为每4分钟一次,
short缓存设为每55分钟一次)以保持缓存命中long - 不要在对话过程中编辑SOUL.md——每次修改都会使缓存失效
- 将稳定内容放在上下文层级的顶部
3. Start Sessions Lean
3. 精简会话启动加载项
Impact: ~$0.35 saved per session, $10-15/month with frequent use
Many setups eagerly load everything at startup: full memory archives, past conversations, every reference file. Most of it sits unused.
Load only what the agent needs to be itself and understand today:
| Load at startup | Skip at startup |
|---|---|
| SOUL.md | Full conversation history |
| USER.md | Past memory archives |
| IDENTITY.md | Research docs from old projects |
| Today's memory file | Other agents' files |
For everything else: pull on demand. When a topic comes up that needs historical context, use memory search:
"Search my memory for what we discussed about the marketing campaign"Semantic search retrieves the relevant passages without loading the entire history. This is both cheaper and more effective — targeted retrieval beats brute-force loading.
Add to your agent's instructions:
markdown
undefined影响:每个会话可节省约0.35美元,高频使用时每月可节省10-15美元
很多部署会在启动时一次性加载所有内容:完整的记忆存档、历史对话、所有参考文件,其中大部分都不会被用到。
仅加载Agent维持自身设定和理解当日场景所需的内容:
| 启动时加载 | 启动时不加载 |
|---|---|
| SOUL.md | 完整对话历史 |
| USER.md | 历史记忆存档 |
| IDENTITY.md | 旧项目的研究文档 |
| 当日记忆文件 | 其他Agent的文件 |
其他内容:按需拉取。 当出现需要历史上下文的话题时,使用记忆搜索:
"Search my memory for what we discussed about the marketing campaign"语义搜索会检索相关段落,无需加载整个历史。这种方式不仅更便宜,也更高效——定向检索优于暴力加载。
添加到Agent的指令中:
markdown
undefinedSession Startup
会话启动
Load ONLY: SOUL.md, USER.md, IDENTITY.md, today's memory file.
For prior context: use memory_search() on demand. Don't pre-load history.
**Result:** Sessions start with ~8KB of context instead of 50KB+. The agent is faster, cheaper, and paradoxically better — less noise in the context means more focused responses.
---仅加载:SOUL.md、USER.md、IDENTITY.md、当日记忆文件。
需要过往上下文时:按需使用memory_search(),不要预加载历史。
**效果:** 会话启动时仅加载约8KB上下文,而非50KB以上。Agent速度更快、成本更低,而且表现反而更好——上下文中的噪音更少,响应更聚焦。
---4. Route Heartbeats to a Local Model
4. 将心跳路由到本地模型
Impact: $5-15/month saved
Heartbeats are periodic checks — "anything need attention?" — that run every 15-60 minutes. They're simple tasks that don't need frontier reasoning. But by default, each one is a paid API call.
Route them to a free local model via Ollama instead.
Setup:
bash
undefined影响:每月可节省5-15美元
心跳是周期性检查——“有没有需要处理的事项?”——每15-60分钟运行一次。这些都是简单任务,不需要前沿推理能力,但默认情况下每次调用都是付费API请求。
改为通过Ollama将其路由到免费的本地模型。
设置:
bash
undefinedInstall Ollama
安装Ollama
curl -fsSL https://ollama.com/install.sh | sh
curl -fsSL https://ollama.com/install.sh | sh
Pull a lightweight model (2GB, handles heartbeat tasks easily)
拉取轻量模型(2GB,足以轻松处理心跳任务)
ollama pull llama3.2:3b
**Config:**
```json5
{
"heartbeat": {
"model": "ollama/llama3.2:3b"
}
}Why it works: Heartbeat tasks are classification problems — "is there something that needs attention?" A 3B parameter model running locally answers this with zero issues. The quality difference from Sonnet is nonexistent for yes/no monitoring, and the cost drops to zero on your API bill.
Requirements: A machine running Ollama with ~2GB free RAM. If OpenClaw already runs on a home server or VPS, you have this.
ollama pull llama3.2:3b
**配置:**
```json5
{
"heartbeat": {
"model": "ollama/llama3.2:3b"
}
}原理: 心跳任务属于分类问题——“有没有需要处理的事项?”本地运行的3B参数模型完全可以零问题回答这个问题。对于是/否类监控任务,和Sonnet的表现没有差异,而API账单上的成本直接降到0。
要求: 运行Ollama的机器有大约2GB空闲内存。如果OpenClaw已经运行在家庭服务器或VPS上,你已经满足该条件。
5. Set Rate Limits and Budget Caps
5. 设置速率限制和预算上限
Impact: prevents $50-200/month in runaway costs
AI agents can loop. A search spawns follow-up searches. A debugging session fires dozens of API calls in seconds. Without guardrails, a single runaway interaction can burn through a day's budget in minutes.
Add to SOUL.md or system prompt:
markdown
undefined影响:可避免每月50-200美元的意外支出
AI Agent可能出现循环:搜索触发后续搜索,调试会话一秒钟内调用数十次API。没有防护机制的话,一次失控的交互可能在几分钟内花完一天的预算。
添加到SOUL.md或系统提示词中:
markdown
undefinedAPI Discipline
API使用规范
- 5 seconds minimum between consecutive API calls
- 10 seconds minimum between web searches
- Max 5 searches per batch, then pause 2 minutes
- If a task needs more than 10 API calls, ask before continuing
- On rate limit error (429): stop, wait 5 minutes, retry once
**Budget discipline via behavioral rules:**
OpenClaw doesn't have a native budget config key. Instead, enforce limits through your agent's instructions and external monitoring:
1. **Set Anthropic API spend alerts** — configure billing alerts directly in [Anthropic Console](https://console.anthropic.com/) or your provider dashboard
2. **Add behavioral limits to SOUL.md** (see above) — the agent will self-regulate
3. **Monitor with the CLI:**
```bash
openclaw cron list --all --json # Check what's running and on which model- Track via Anthropic usage dashboard — review daily/weekly spend patterns
Rate limits prevent waste during normal operation. Provider-level budget alerts catch edge cases. Together: predictable spending without limiting capability.
- 连续API调用之间至少间隔5秒
- 网页搜索之间至少间隔10秒
- 每批次最多执行5次搜索,之后暂停2分钟
- 如果一个任务需要超过10次API调用,先询问再继续
- 遇到速率限制错误(429):停止操作,等待5分钟,重试一次
**通过行为规则实现预算管控:**
OpenClaw没有原生的预算配置项,你可以通过Agent指令和外部监控来执行限制:
1. **设置Anthropic API支出提醒**——直接在[Anthropic控制台](https://console.anthropic.com/)或你的服务商面板中配置账单提醒
2. **在SOUL.md中添加行为限制**(见上文)——Agent会自我管控
3. **使用CLI监控:**
```bash
openclaw cron list --all --json # 检查正在运行的任务及使用的模型- 通过Anthropic用量面板追踪——查看每日/每周支出模式
速率限制可以避免正常运行过程中的浪费,服务商层面的预算提醒可以覆盖边缘情况。两者结合:在不限制能力的前提下实现可预测的支出。
6. Trim Your Workspace Files
6. 裁剪工作区文件
Impact: incremental but compounding — saves on every single request
SOUL.md, USER.md, and similar files load on every interaction. Every unnecessary line costs tokens not once, but on every message. A 200-line SOUL.md that could be 80 lines is silently 2-3x more expensive than it needs to be on your highest-frequency cost.
The audit question for each line: "Does the agent need this to do its job?"
What stays:
- Core personality (2-3 sentences, not paragraphs)
- Hard behavioral rules that change output
- Communication preferences
- Operational constraints
What goes:
- Backstory or lore (move to a reference doc, load on demand)
- Redundant restatements of the same rule
- Verbose explanations of things Claude already knows
- "Nice to have" context that doesn't affect responses
Before (bloated, 85 tokens):
markdown
undefined影响:增量但复利效应——每次请求都会节省成本
SOUL.md、USER.md和类似文件会在每次交互时加载。每一行不必要的内容消耗的Token不是一次,而是每条消息都会消耗。一份本可以精简到80行的200行SOUL.md,会让你最高频的成本悄无声息地变成原来的2-3倍。
每行内容的审计标准:“Agent完成工作需要这行内容吗?”
保留的内容:
- 核心个性(2-3句话,不要大段段落)
- 会改变输出的硬性行为规则
- 沟通偏好
- 运营约束
删除的内容:
- 背景故事或设定(移动到参考文档,按需加载)
- 同一条规则的重复表述
- Claude已经知道的内容的冗余解释
- 不影响响应的“锦上添花”类上下文
修改前(臃肿,85 Token):
markdown
undefinedCommunication Style
沟通风格
You should always communicate in a way that is clear, concise, and helpful.
When talking to the user, make sure your responses are well-structured and easy
to follow. Use bullet points when listing things. Don't write overly long
responses unless the topic genuinely requires depth. The user values efficiency.
**After (lean, 22 tokens):**
```markdown你应该始终以清晰、简洁、有帮助的方式沟通。
和用户交流时,确保你的响应结构清晰、易于理解。
列举内容时使用项目符号。不要写过长的响应,除非话题确实需要深度。
用户看重效率。
**修改后(精简,22 Token):**
```markdownStyle
风格
Direct and concise. Use structure for complex answers. Match depth to the question.
Same behavior. One-fourth the cost, multiplied by every message.
---直接简洁。复杂回答使用结构化排版。回答深度匹配问题需求。
行为完全一致,成本降到原来的四分之一,且每次消息都会享受该节省。
---Combined Impact
组合效果
| Optimization | Monthly Savings | Setup Time |
|---|---|---|
| Default to Haiku | $40-60 | 5 min |
| Prompt caching | $15-30 | 2 min |
| Lean startup context | $10-15 | 30 min |
| Local heartbeats | $5-15 | 15 min |
| Rate limits + budgets | $50-200 (prevention) | 10 min |
| Trim workspace files | $5-10 | 1 hour |
From $1,500+/month to $30-50/month.
Model routing and caching deliver the biggest absolute savings. Workspace trimming compounds over time because it reduces cost on every interaction.
| 优化措施 | 月度节省 | 配置耗时 |
|---|---|---|
| 默认使用Haiku | 40-60美元 | 5分钟 |
| 提示词缓存 | 15-30美元 | 2分钟 |
| 精简启动上下文 | 10-15美元 | 30分钟 |
| 本地心跳 | 5-15美元 | 15分钟 |
| 速率限制+预算 | 避免50-200美元的意外支出 | 10分钟 |
| 裁剪工作区文件 | 5-10美元 | 1小时 |
从每月1500美元以上降到每月30-50美元。
模型路由和缓存带来的绝对节省最多。工作区裁剪会随着时间产生复利效应,因为它降低了每次交互的成本。
How to Apply This
如何应用这些优化
You don't need to hand-edit JSON files for most of these. Tell your OpenClaw agent:
"Switch your default model to Haiku. Only use Sonnet for complex reasoning tasks."
"Enable prompt caching — set cacheRetention to 'long' for my Anthropic models."
"Add API discipline rules to my SOUL.md — rate limits, search batching, and escalation thresholds."
"Audit my SOUL.md and USER.md — flag anything that doesn't directly affect your output quality."
OpenClaw can modify its own config in most cases. Just ask.
大部分优化你不需要手动编辑JSON文件,直接告诉你的OpenClaw Agent即可:
"将你的默认模型切换为Haiku,仅在复杂推理任务中使用Sonnet。"
"启用提示词缓存——为我的Anthropic模型将cacheRetention设置为'long'。"
"将API使用规范规则添加到我的SOUL.md中——速率限制、搜索批处理和升级阈值。"
"审计我的SOUL.md和USER.md——标记所有不直接影响输出质量的内容。"
大多数情况下OpenClaw可以自行修改配置,直接告知即可。
Verifying It Works
验证优化生效
Verification Commands
验证命令
Check actual model per cron job (the table view hides this — you MUST use JSON):
bash
openclaw cron list --all --jsonLook at each job's field. Don't assume from the agent assignment — cron jobs can override the agent's default model.
payload.modelCheck heartbeat status per agent (config is in ):
~/.openclaw/openclaw.jsonbash
cat ~/.openclaw/openclaw.json | python3 -c "
import json,sys; c=json.load(sys.stdin)
d=c['agents']['defaults'].get('heartbeat',{}).get('every','')
print(f'Default: {d or \"(disabled)\"}')
for a in c['agents']['list']:
e=a.get('heartbeat',{}).get('every',d)
print(f'{a[\"id\"]}: {e or \"(disabled)\"}')
"Agents without a block inherit the default. An agent running on Opus with inherited = your biggest cost line.
heartbeatevery: "15m"Check bootstrap context size:
bash
wc -c SOUL.md USER.md AGENTS.md MEMORY.md检查每个定时任务实际使用的模型(表格视图会隐藏该信息——你必须使用JSON格式查看):
bash
openclaw cron list --all --json查看每个任务的字段。不要根据Agent分配推断——定时任务可以覆盖Agent的默认模型。
payload.model检查每个Agent的心跳状态(配置位于):
~/.openclaw/openclaw.jsonbash
cat ~/.openclaw/openclaw.json | python3 -c "
import json,sys; c=json.load(sys.stdin)
d=c['agents']['defaults'].get('heartbeat',{}).get('every','')
print(f'Default: {d or \"(disabled)\"}')
for a in c['agents']['list']:
e=a.get('heartbeat',{}).get('every',d)
print(f'{a[\"id\"]}: {e or \"(disabled)\"}')
"没有块的Agent会继承默认配置。如果一个Agent使用Opus且继承了的配置,这就是你最大的成本项。
heartbeatevery: "15m"检查启动上下文大小:
bash
wc -c SOUL.md USER.md AGENTS.md MEMORY.mdWhat Good Looks Like
理想状态
- Context size: Should be 2-8KB at session start, not 50KB+
- Default model: Should show Haiku, not Sonnet
- Cron jobs: Each job's should match its complexity (Haiku for checkers, Sonnet for structured tasks, Opus only for creative/complex work)
payload.model - Heartbeat: Should route to Ollama/local or Haiku at minimum
- Daily costs: Should drop to $0.10-0.50 range within the first day
If costs haven't dropped, the most common issue is the system prompt not loading the model selection rules. Verify your SOUL.md changes are being picked up.
- 上下文大小: 会话启动时应为2-8KB,而非50KB以上
- 默认模型: 应为Haiku,而非Sonnet
- 定时任务: 每个任务的应与其复杂度匹配(检查类任务用Haiku,结构化任务用Sonnet,仅创意/复杂工作用Opus)
payload.model - 心跳: 至少应路由到Ollama/本地模型或Haiku
- 每日成本: 第一天内就应降到0.10-0.50美元区间
如果成本没有下降,最常见的问题是系统提示词没有加载模型选择规则。验证你的SOUL.md修改已被正确加载。
Common Audit Mistake
常见审计错误
Don't infer cron job models from the agent assignment. table shows but the job itself may override the model in its payload. Always check for the actual field.
openclaw cron listAgent: sherlockopenclaw cron list --all --jsonpayload.model不要根据Agent分配推断定时任务使用的模型。表格显示,但任务本身可能在payload中覆盖了模型。始终通过查看实际的字段。
openclaw cron listAgent: sherlockopenclaw cron list --all --jsonpayload.modelAnti-Patterns
反模式
"I'll just use Haiku for everything, including complex tasks."
Don't. Haiku is fast and cheap but genuinely worse at multi-step reasoning, nuanced analysis, and creative work. The goal is right-sizing, not downgrading. Escalate when quality matters.
"I'll cache everything to save money."
Caching volatile content (daily notes, tool outputs) wastes the cache slot and provides no benefit. Cache only what's stable.
"I'll set the budget to $1/day to be safe."
Too aggressive. Your agent will hit the cap during legitimate work and stop mid-task. Start at $5/day, observe for a week, then adjust based on actual patterns.
"I trimmed SOUL.md to 10 lines and now the agent acts weird."
You cut too deep. Some personality and behavioral rules genuinely affect output quality. Trim the fat, keep the muscle. If behavior degrades, add back the specific rule that was controlling it.
“我就全用Haiku,包括复杂任务。”
不要这么做。Haiku速度快、成本低,但在多步推理、精细化分析和创意工作上的表现确实更差。优化目标是合理匹配,而非全面降级。质量重要的场景要升级模型。
“我把所有内容都缓存来省钱。”
缓存易变内容(每日笔记、工具输出)会浪费缓存槽,没有任何收益。仅缓存稳定内容。
“为了安全我把预算设为1美元/天。”
太激进了。你的Agent在处理合法任务时会达到预算上限,中途停止工作。先设为5美元/天,观察一周,再根据实际使用模式调整。
“我把SOUL.md裁剪到10行,现在Agent表现很奇怪。”
你裁剪过度了。一些个性和行为规则确实会影响输出质量。剪掉脂肪,留下肌肉。如果表现下降,把控制对应行为的规则加回来。
Quality Check
质量检查
Before sharing this config with others:
- Default model is Haiku (not Sonnet/Opus)
- Caching enabled for stable files
- Session startup loads only essentials
- Heartbeat routes to local model (or cheapest available)
- Budget caps set with warning threshold
- Workspace files audited for unnecessary content
- Agent behavior verified — no quality regression on important tasks
在将该配置分享给其他人之前:
- 默认模型是Haiku(不是Sonnet/Opus)
- 稳定文件已启用缓存
- 会话启动仅加载必要内容
- 心跳路由到本地模型(或可用的最便宜模型)
- 已设置带警告阈值的预算上限
- 已审计工作区文件的不必要内容
- 已验证Agent行为——重要任务没有质量下降