openclaw-cost-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenClaw Cost Optimization

OpenClaw成本优化

Your OpenClaw setup is probably burning money on three things: using expensive models for trivial tasks, re-processing the same static files on every message, and loading context the agent never touches.

The default config optimizes for capability. That's fine when you're exploring. But once you're running 24/7, it's the difference between $1,500/month and $150/month or even less.

This skill audits your current setup and applies six optimizations in order of impact. Most take under 5 minutes. Combined, they cut costs by roughly 97% without reducing quality where it matters.

你的OpenClaw部署可能在三件事上浪费资金：用昂贵的模型处理琐碎任务、每条消息都重复处理相同的静态文件、加载Agent根本用不到的上下文。

默认配置优先保障能力，在探索阶段没问题，但一旦你开始7*24小时运行，这就会导致月度支出是1500美元还是150美元甚至更低的差异。

本功能会审计你当前的部署，并按照影响优先级应用六项优化。大多数优化耗时不超过5分钟。组合使用后，可在不影响核心场景质量的前提下，将成本降低约97%。

What Claude Gets Wrong Without This

不做这些优化时Claude的不合理行为

Left to its own defaults, an OpenClaw agent will:

Run Sonnet or Opus for "what's the weather?" — that's a Haiku job
Load 50KB of context at session start when 8KB would do
Send heartbeat checks to a paid API 96 times a day
Re-process your SOUL.md and USER.md at full price on every single message
Let a search loop burn $20 overnight with no guardrails

None of these are bugs. They're just defaults that weren't designed for 24/7 autonomous operation.

使用默认配置时，OpenClaw Agent会出现以下问题：

用Sonnet或Opus处理“天气怎么样？”这类问题——这本来是Haiku就能完成的工作
会话启动时加载50KB上下文，实际上8KB就足够
每天向付费API发送96次心跳检查请求
每条消息都全额付费重新处理SOUL.md和USER.md
没有防护机制的情况下，搜索循环一晚上就可能消耗20美元

这些都不是bug，只是默认配置并非为7*24小时自动化运行设计。

The Six Optimizations

六项优化措施

Ordered by impact. Start at #1 and work down.

按影响优先级排序，从第1项开始依次执行。

1. Route the Right Model to the Right Job

1. 为合适的任务匹配正确的模型

Impact: $40-300/month saved depending on usage

This is the single biggest lever. Your agent handles many different types of work — chat, heartbeats, coding, web crawling, image analysis — and each has a model that hits the sweet spot of quality vs cost. Using Opus for a heartbeat check is like hiring a surgeon to take your temperature.

The model routing map:

Task	Best (no budget)	Budget alternative	Savings
Chat (brain)	Opus 4.6	Kimi K2.5	Massive — near-Opus intelligence, slightly less personality
Heartbeat checks	Haiku 4.5	Haiku 4.5 (+ reduce frequency)	~$50/month
Coding / overnight dev	Codex GPT-5.2	Miniax 2.1	~$250/month on long coding runs
Web browsing / crawling	Opus 4.6	DeepSeek V3	Hundreds/month if you crawl a lot
Image understanding	Opus 4.6	Gemini 2.5 Flash	Significant — Flash is very capable on vision
Routine tasks	Haiku 4.5	Haiku 4.5	10x cheaper than Sonnet

The key insight: You're not picking ONE cheap model. You're building a roster where each task gets the cheapest model that handles it well.

Default model — set your daily driver to Haiku or Kimi depending on how much personality matters:

json5

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      }
    }
  }
}

Model aliases — define aliases in

agents.defaults.models

so cron jobs and agents can reference models by short name:

json5

{
  "agents": {
    "defaults": {
      "models": {
        "anthropic/claude-haiku-4-5": { "alias": "haiku" },
        "anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
        "anthropic/claude-opus-4-6": { "alias": "opus" },
        "openai/codex-gpt-5.2": { "alias": "code" },
        "deepseek/deepseek-v3": { "alias": "crawl" },
        "google/gemini-2.5-flash": { "alias": "vision" }
      }
    }
  }
}

Heartbeat frequency matters too. Default is often every 10 minutes. If you're running Opus for heartbeats, that's ~$2/day / ~$54/month just on "anything need attention?" checks. Switch to Haiku AND reduce frequency to every hour:

json5

{
  "heartbeat": {
    "every": "1h",
    "model": "anthropic/claude-haiku-4-5"
  }
}

That alone drops heartbeat costs to $0.01-0.10/day.

Add to your agent's SOUL.md or system prompt:

markdown

undefined

影响：根据使用情况每月可节省40-300美元

这是效果最显著的优化手段。你的Agent需要处理多种不同类型的工作——聊天、心跳检查、编码、网页爬取、图像分析——每类任务都有对应的质量与成本平衡点最优的模型。用Opus处理心跳检查就好比聘请外科医生来给你量体温。

模型路由映射表：

任务	最优（无预算限制）	预算替代方案	节省金额
聊天（核心能力）	Opus 4.6	Kimi K2.5	极高——接近Opus的智能水平，仅个性表现略有差异
心跳检查	Haiku 4.5	Haiku 4.5（+降低检查频率）	约50美元/月
编码/通宵开发任务	Codex GPT-5.2	Miniax 2.1	长编码任务可节省约250美元/月
网页浏览/爬取	Opus 4.6	DeepSeek V3	爬取量大时每月可节省数百美元
图像理解	Opus 4.6	Gemini 2.5 Flash	节省显著——Flash在视觉任务上表现非常出色
常规任务	Haiku 4.5	Haiku 4.5	比Sonnet便宜10倍

核心思路： 你不需要选择某一个便宜的模型，而是要建立一个模型列表，让每个任务都使用能满足需求的最便宜的模型。

默认模型——根据你对个性表现的需求，将日常使用的默认模型设置为Haiku或Kimi：

json5

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      }
    }
  }
}

模型别名——在

agents.defaults.models

中定义别名，这样定时任务和Agent就可以通过短名称引用模型：

json5

{
  "agents": {
    "defaults": {
      "models": {
        "anthropic/claude-haiku-4-5": { "alias": "haiku" },
        "anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
        "anthropic/claude-opus-4-6": { "alias": "opus" },
        "openai/codex-gpt-5.2": { "alias": "code" },
        "deepseek/deepseek-v3": { "alias": "crawl" },
        "google/gemini-2.5-flash": { "alias": "vision" }
      }
    }
  }
}

心跳频率也很重要。 默认频率通常是每10分钟一次。如果你用Opus处理心跳，仅“有没有需要处理的事项？”这类检查每月就要花费约54美元（约2美元/天）。切换为Haiku并将频率降低到每小时一次：

json5

{
  "heartbeat": {
    "every": "1h",
    "model": "anthropic/claude-haiku-4-5"
  }
}

仅这一项就可以将心跳成本降低到0.01-0.10美元/天。

添加到Agent的SOUL.md或系统提示词中：

markdown

undefined

Model Selection

模型选择

Default: Haiku for routine tasks. Escalate to Sonnet/Opus only for: architecture decisions, code review, security analysis, complex reasoning. For coding sessions: prefer Codex or dedicated coding model. For web crawling: use DeepSeek V3. For image analysis: use Gemini Flash. When uncertain, try the cheaper model first.


The model follows this as a behavioral constraint. No code changes needed.

---

默认：常规任务使用Haiku。仅在以下场景升级为Sonnet/Opus：架构决策、代码评审、安全分析、复杂推理。编码会话：优先使用Codex或专用编码模型。网页爬取：使用DeepSeek V3。图像分析：使用Gemini Flash。不确定时，优先尝试更便宜的模型。


模型会遵守该行为约束，无需修改代码。

---

2. Cache Stable Context

2. 缓存稳定上下文

Impact: up to 90% discount on repeated tokens

Every message re-reads SOUL.md, USER.md, tool definitions, and reference docs. These files don't change between requests. Without caching, you're paying full price to re-process identical content dozens of times a day.

Anthropic's prompt caching charges 10% for cached tokens on reuse. For content you send repeatedly, that's a 90% cut on your most frequent cost.

What to cache (stable, rarely updated):

SOUL.md — personality, rules
USER.md — user profile, preferences
TOOLS.md — tool definitions
Reference docs, project specs

What NOT to cache (changes frequently):

Daily memory files
Conversation history
Tool outputs, search results

Config — set

cacheRetention

per model in

agents.defaults.models

json5

{
  "agents": {
    "defaults": {
      "models": {
        "anthropic/claude-opus-4-6": {
          "alias": "opus",
          "params": { "cacheRetention": "short" }
        }
      }
    }
  }
}

Value	Cache Duration	Notes
`"none"`	No caching	Disables prompt caching
`"short"`	5 minutes	Default for API key auth
`"long"`	1 hour	Requires beta flag

Note: OpenClaw automatically applies
"short"
(5-min cache) when using Anthropic API key authentication. You only need to set this explicitly if you want
"long"
or
"none"
.

Tips for maximizing cache hits:

Batch related requests within the cache window (5 min for
```
short
```
, 1h for
```
long
```
)
Align heartbeat frequency just under the cache TTL (e.g., every 4 min for
```
short
```
, every 55 min for
```
long
```
) to keep caches warm
Don't edit SOUL.md mid-conversation — each change invalidates the cache
Keep stable content at the top of your context hierarchy

影响：重复Token最多可享受90%的费用折扣

每条消息都会重新读取SOUL.md、USER.md、工具定义和参考文档，这些文件在不同请求之间不会变化。如果没有缓存，你每天要全额付费重复处理相同的内容数十次。

Anthropic的提示词缓存对重复使用的缓存Token仅收取10%的费用。对于你重复发送的内容，这可以将你最常产生的成本降低90%。

需要缓存的内容（稳定，极少更新）：

SOUL.md——个性设定、规则
USER.md——用户档案、偏好
TOOLS.md——工具定义
参考文档、项目规范

不需要缓存的内容（变化频繁）：

每日记忆文件
对话历史
工具输出、搜索结果

配置——在

agents.defaults.models

中为每个模型设置

cacheRetention

：

json5

{
  "agents": {
    "defaults": {
      "models": {
        "anthropic/claude-opus-4-6": {
          "alias": "opus",
          "params": { "cacheRetention": "short" }
        }
      }
    }
  }
}

取值	缓存时长	说明
`"none"`	无缓存	禁用提示词缓存
`"short"`	5分钟	API密钥认证的默认值
`"long"`	1小时	需要测试版权限标识

注意： 使用Anthropic API密钥认证时，OpenClaw会自动应用
"short"
（5分钟缓存）。仅当你需要设置
"long"
或
"none"
时才需要显式配置。

最大化缓存命中率的技巧：

在缓存窗口内（
```
short
```
为5分钟，
```
long
```
为1小时）批量处理相关请求
将心跳频率设置为略低于缓存TTL（例如
```
short
```
缓存设为每4分钟一次，
```
long
```
缓存设为每55分钟一次）以保持缓存命中
不要在对话过程中编辑SOUL.md——每次修改都会使缓存失效
将稳定内容放在上下文层级的顶部

3. Start Sessions Lean

3. 精简会话启动加载项

Impact: ~$0.35 saved per session, $10-15/month with frequent use

Many setups eagerly load everything at startup: full memory archives, past conversations, every reference file. Most of it sits unused.

Load only what the agent needs to be itself and understand today:

Load at startup	Skip at startup
SOUL.md	Full conversation history
USER.md	Past memory archives
IDENTITY.md	Research docs from old projects
Today's memory file	Other agents' files

For everything else: pull on demand. When a topic comes up that needs historical context, use memory search:

"Search my memory for what we discussed about the marketing campaign"

Semantic search retrieves the relevant passages without loading the entire history. This is both cheaper and more effective — targeted retrieval beats brute-force loading.

Add to your agent's instructions:

markdown

undefined

影响：每个会话可节省约0.35美元，高频使用时每月可节省10-15美元

很多部署会在启动时一次性加载所有内容：完整的记忆存档、历史对话、所有参考文件，其中大部分都不会被用到。

仅加载Agent维持自身设定和理解当日场景所需的内容：

启动时加载	启动时不加载
SOUL.md	完整对话历史
USER.md	历史记忆存档
IDENTITY.md	旧项目的研究文档
当日记忆文件	其他Agent的文件

其他内容：按需拉取。 当出现需要历史上下文的话题时，使用记忆搜索：

"Search my memory for what we discussed about the marketing campaign"

语义搜索会检索相关段落，无需加载整个历史。这种方式不仅更便宜，也更高效——定向检索优于暴力加载。

添加到Agent的指令中：

markdown

undefined

Session Startup

会话启动

Load ONLY: SOUL.md, USER.md, IDENTITY.md, today's memory file. For prior context: use memory_search() on demand. Don't pre-load history.


**Result:** Sessions start with ~8KB of context instead of 50KB+. The agent is faster, cheaper, and paradoxically better — less noise in the context means more focused responses.

---

仅加载：SOUL.md、USER.md、IDENTITY.md、当日记忆文件。需要过往上下文时：按需使用memory_search()，不要预加载历史。


**效果：** 会话启动时仅加载约8KB上下文，而非50KB以上。Agent速度更快、成本更低，而且表现反而更好——上下文中的噪音更少，响应更聚焦。

---

4. Route Heartbeats to a Local Model

4. 将心跳路由到本地模型

Impact: $5-15/month saved

Heartbeats are periodic checks — "anything need attention?" — that run every 15-60 minutes. They're simple tasks that don't need frontier reasoning. But by default, each one is a paid API call.

Route them to a free local model via Ollama instead.

Setup:

bash

undefined

影响：每月可节省5-15美元

心跳是周期性检查——“有没有需要处理的事项？”——每15-60分钟运行一次。这些都是简单任务，不需要前沿推理能力，但默认情况下每次调用都是付费API请求。

改为通过Ollama将其路由到免费的本地模型。

设置：

bash

undefined

Install Ollama

安装Ollama

curl -fsSL https://ollama.com/install.sh | sh

Pull a lightweight model (2GB, handles heartbeat tasks easily)

拉取轻量模型（2GB，足以轻松处理心跳任务）

ollama pull llama3.2:3b


**Config:**

```json5
{
  "heartbeat": {
    "model": "ollama/llama3.2:3b"
  }
}

Why it works: Heartbeat tasks are classification problems — "is there something that needs attention?" A 3B parameter model running locally answers this with zero issues. The quality difference from Sonnet is nonexistent for yes/no monitoring, and the cost drops to zero on your API bill.

Requirements: A machine running Ollama with ~2GB free RAM. If OpenClaw already runs on a home server or VPS, you have this.

ollama pull llama3.2:3b


**配置：**

```json5
{
  "heartbeat": {
    "model": "ollama/llama3.2:3b"
  }
}

原理： 心跳任务属于分类问题——“有没有需要处理的事项？”本地运行的3B参数模型完全可以零问题回答这个问题。对于是/否类监控任务，和Sonnet的表现没有差异，而API账单上的成本直接降到0。

要求： 运行Ollama的机器有大约2GB空闲内存。如果OpenClaw已经运行在家庭服务器或VPS上，你已经满足该条件。

5. Set Rate Limits and Budget Caps

5. 设置速率限制和预算上限

Impact: prevents $50-200/month in runaway costs

AI agents can loop. A search spawns follow-up searches. A debugging session fires dozens of API calls in seconds. Without guardrails, a single runaway interaction can burn through a day's budget in minutes.

Add to SOUL.md or system prompt:

markdown

undefined

影响：可避免每月50-200美元的意外支出

AI Agent可能出现循环：搜索触发后续搜索，调试会话一秒钟内调用数十次API。没有防护机制的话，一次失控的交互可能在几分钟内花完一天的预算。

添加到SOUL.md或系统提示词中：

markdown

undefined

API Discipline

API使用规范

5 seconds minimum between consecutive API calls
10 seconds minimum between web searches
Max 5 searches per batch, then pause 2 minutes
If a task needs more than 10 API calls, ask before continuing
On rate limit error (429): stop, wait 5 minutes, retry once


**Budget discipline via behavioral rules:**

OpenClaw doesn't have a native budget config key. Instead, enforce limits through your agent's instructions and external monitoring:

1. **Set Anthropic API spend alerts** — configure billing alerts directly in [Anthropic Console](https://console.anthropic.com/) or your provider dashboard
2. **Add behavioral limits to SOUL.md** (see above) — the agent will self-regulate
3. **Monitor with the CLI:**
```bash
openclaw cron list --all --json  # Check what's running and on which model

Track via Anthropic usage dashboard — review daily/weekly spend patterns

Rate limits prevent waste during normal operation. Provider-level budget alerts catch edge cases. Together: predictable spending without limiting capability.

连续API调用之间至少间隔5秒
网页搜索之间至少间隔10秒
每批次最多执行5次搜索，之后暂停2分钟
如果一个任务需要超过10次API调用，先询问再继续
遇到速率限制错误（429）：停止操作，等待5分钟，重试一次


**通过行为规则实现预算管控：**

OpenClaw没有原生的预算配置项，你可以通过Agent指令和外部监控来执行限制：

1. **设置Anthropic API支出提醒**——直接在[Anthropic控制台](https://console.anthropic.com/)或你的服务商面板中配置账单提醒
2. **在SOUL.md中添加行为限制**（见上文）——Agent会自我管控
3. **使用CLI监控：**
```bash
openclaw cron list --all --json  # 检查正在运行的任务及使用的模型

通过Anthropic用量面板追踪——查看每日/每周支出模式

速率限制可以避免正常运行过程中的浪费，服务商层面的预算提醒可以覆盖边缘情况。两者结合：在不限制能力的前提下实现可预测的支出。

6. Trim Your Workspace Files

6. 裁剪工作区文件

Impact: incremental but compounding — saves on every single request

SOUL.md, USER.md, and similar files load on every interaction. Every unnecessary line costs tokens not once, but on every message. A 200-line SOUL.md that could be 80 lines is silently 2-3x more expensive than it needs to be on your highest-frequency cost.

The audit question for each line: "Does the agent need this to do its job?"

What stays:

Core personality (2-3 sentences, not paragraphs)
Hard behavioral rules that change output
Communication preferences
Operational constraints

What goes:

Backstory or lore (move to a reference doc, load on demand)
Redundant restatements of the same rule
Verbose explanations of things Claude already knows
"Nice to have" context that doesn't affect responses

Before (bloated, 85 tokens):

markdown

undefined

影响：增量但复利效应——每次请求都会节省成本

SOUL.md、USER.md和类似文件会在每次交互时加载。每一行不必要的内容消耗的Token不是一次，而是每条消息都会消耗。一份本可以精简到80行的200行SOUL.md，会让你最高频的成本悄无声息地变成原来的2-3倍。

每行内容的审计标准：“Agent完成工作需要这行内容吗？”

保留的内容：

核心个性（2-3句话，不要大段段落）
会改变输出的硬性行为规则
沟通偏好
运营约束

删除的内容：

背景故事或设定（移动到参考文档，按需加载）
同一条规则的重复表述
Claude已经知道的内容的冗余解释
不影响响应的“锦上添花”类上下文

修改前（臃肿，85 Token）：

markdown

undefined

Communication Style

沟通风格

You should always communicate in a way that is clear, concise, and helpful. When talking to the user, make sure your responses are well-structured and easy to follow. Use bullet points when listing things. Don't write overly long responses unless the topic genuinely requires depth. The user values efficiency.


**After (lean, 22 tokens):**
```markdown

你应该始终以清晰、简洁、有帮助的方式沟通。和用户交流时，确保你的响应结构清晰、易于理解。列举内容时使用项目符号。不要写过长的响应，除非话题确实需要深度。用户看重效率。


**修改后（精简，22 Token）：**
```markdown

Style

风格

Direct and concise. Use structure for complex answers. Match depth to the question.


Same behavior. One-fourth the cost, multiplied by every message.

---

直接简洁。复杂回答使用结构化排版。回答深度匹配问题需求。


行为完全一致，成本降到原来的四分之一，且每次消息都会享受该节省。

---

Combined Impact

组合效果

Optimization	Monthly Savings	Setup Time
Default to Haiku	$40-60	5 min
Prompt caching	$15-30	2 min
Lean startup context	$10-15	30 min
Local heartbeats	$5-15	15 min
Rate limits + budgets	$50-200 (prevention)	10 min
Trim workspace files	$5-10	1 hour

From $1,500+/month to $30-50/month.

Model routing and caching deliver the biggest absolute savings. Workspace trimming compounds over time because it reduces cost on every interaction.

优化措施	月度节省	配置耗时
默认使用Haiku	40-60美元	5分钟
提示词缓存	15-30美元	2分钟
精简启动上下文	10-15美元	30分钟
本地心跳	5-15美元	15分钟
速率限制+预算	避免50-200美元的意外支出	10分钟
裁剪工作区文件	5-10美元	1小时

从每月1500美元以上降到每月30-50美元。

模型路由和缓存带来的绝对节省最多。工作区裁剪会随着时间产生复利效应，因为它降低了每次交互的成本。

How to Apply This

如何应用这些优化

You don't need to hand-edit JSON files for most of these. Tell your OpenClaw agent:

"Switch your default model to Haiku. Only use Sonnet for complex reasoning tasks."

"Enable prompt caching — set cacheRetention to 'long' for my Anthropic models."

"Add API discipline rules to my SOUL.md — rate limits, search batching, and escalation thresholds."

"Audit my SOUL.md and USER.md — flag anything that doesn't directly affect your output quality."

OpenClaw can modify its own config in most cases. Just ask.

大部分优化你不需要手动编辑JSON文件，直接告诉你的OpenClaw Agent即可：

"将你的默认模型切换为Haiku，仅在复杂推理任务中使用Sonnet。"

"启用提示词缓存——为我的Anthropic模型将cacheRetention设置为'long'。"

"将API使用规范规则添加到我的SOUL.md中——速率限制、搜索批处理和升级阈值。"

"审计我的SOUL.md和USER.md——标记所有不直接影响输出质量的内容。"

大多数情况下OpenClaw可以自行修改配置，直接告知即可。

Verifying It Works

验证优化生效

Verification Commands

验证命令

Check actual model per cron job (the table view hides this — you MUST use JSON):

bash

openclaw cron list --all --json

Look at each job's

payload.model

field. Don't assume from the agent assignment — cron jobs can override the agent's default model.

Check heartbeat status per agent (config is in

~/.openclaw/openclaw.json

bash

cat ~/.openclaw/openclaw.json | python3 -c "
import json,sys; c=json.load(sys.stdin)
d=c['agents']['defaults'].get('heartbeat',{}).get('every','')
print(f'Default: {d or \"(disabled)\"}')
for a in c['agents']['list']:
    e=a.get('heartbeat',{}).get('every',d)
    print(f'{a[\"id\"]}: {e or \"(disabled)\"}')
"

Agents without a

heartbeat

block inherit the default. An agent running on Opus with

every: "15m"

inherited = your biggest cost line.

Check bootstrap context size:

bash

wc -c SOUL.md USER.md AGENTS.md MEMORY.md

检查每个定时任务实际使用的模型（表格视图会隐藏该信息——你必须使用JSON格式查看）：

bash

openclaw cron list --all --json

查看每个任务的

payload.model

字段。不要根据Agent分配推断——定时任务可以覆盖Agent的默认模型。

检查每个Agent的心跳状态（配置位于

~/.openclaw/openclaw.json

）：

bash

cat ~/.openclaw/openclaw.json | python3 -c "
import json,sys; c=json.load(sys.stdin)
d=c['agents']['defaults'].get('heartbeat',{}).get('every','')
print(f'Default: {d or \"(disabled)\"}')
for a in c['agents']['list']:
    e=a.get('heartbeat',{}).get('every',d)
    print(f'{a[\"id\"]}: {e or \"(disabled)\"}')
"

没有

heartbeat

块的Agent会继承默认配置。如果一个Agent使用Opus且继承了

every: "15m"

的配置，这就是你最大的成本项。

检查启动上下文大小：

bash

wc -c SOUL.md USER.md AGENTS.md MEMORY.md

What Good Looks Like

理想状态

Context size: Should be 2-8KB at session start, not 50KB+
Default model: Should show Haiku, not Sonnet
Cron jobs: Each job's
```
payload.model
```
should match its complexity (Haiku for checkers, Sonnet for structured tasks, Opus only for creative/complex work)
Heartbeat: Should route to Ollama/local or Haiku at minimum
Daily costs: Should drop to $0.10-0.50 range within the first day

If costs haven't dropped, the most common issue is the system prompt not loading the model selection rules. Verify your SOUL.md changes are being picked up.

上下文大小： 会话启动时应为2-8KB，而非50KB以上
默认模型： 应为Haiku，而非Sonnet
定时任务： 每个任务的
```
payload.model
```
应与其复杂度匹配（检查类任务用Haiku，结构化任务用Sonnet，仅创意/复杂工作用Opus）
心跳： 至少应路由到Ollama/本地模型或Haiku
每日成本： 第一天内就应降到0.10-0.50美元区间

如果成本没有下降，最常见的问题是系统提示词没有加载模型选择规则。验证你的SOUL.md修改已被正确加载。

Common Audit Mistake

常见审计错误

Don't infer cron job models from the agent assignment.

openclaw cron list

table shows

Agent: sherlock

but the job itself may override the model in its payload. Always check

openclaw cron list --all --json

for the actual

payload.model

field.

不要根据Agent分配推断定时任务使用的模型。

openclaw cron list

表格显示

Agent: sherlock

，但任务本身可能在payload中覆盖了模型。始终通过

openclaw cron list --all --json

查看实际的

payload.model

字段。

Anti-Patterns

反模式

"I'll just use Haiku for everything, including complex tasks." Don't. Haiku is fast and cheap but genuinely worse at multi-step reasoning, nuanced analysis, and creative work. The goal is right-sizing, not downgrading. Escalate when quality matters.

"I'll cache everything to save money." Caching volatile content (daily notes, tool outputs) wastes the cache slot and provides no benefit. Cache only what's stable.

"I'll set the budget to $1/day to be safe." Too aggressive. Your agent will hit the cap during legitimate work and stop mid-task. Start at $5/day, observe for a week, then adjust based on actual patterns.

"I trimmed SOUL.md to 10 lines and now the agent acts weird." You cut too deep. Some personality and behavioral rules genuinely affect output quality. Trim the fat, keep the muscle. If behavior degrades, add back the specific rule that was controlling it.

“我就全用Haiku，包括复杂任务。” 不要这么做。Haiku速度快、成本低，但在多步推理、精细化分析和创意工作上的表现确实更差。优化目标是合理匹配，而非全面降级。质量重要的场景要升级模型。

“我把所有内容都缓存来省钱。” 缓存易变内容（每日笔记、工具输出）会浪费缓存槽，没有任何收益。仅缓存稳定内容。

“为了安全我把预算设为1美元/天。” 太激进了。你的Agent在处理合法任务时会达到预算上限，中途停止工作。先设为5美元/天，观察一周，再根据实际使用模式调整。

“我把SOUL.md裁剪到10行，现在Agent表现很奇怪。” 你裁剪过度了。一些个性和行为规则确实会影响输出质量。剪掉脂肪，留下肌肉。如果表现下降，把控制对应行为的规则加回来。

openclaw-cost-optimization

Original

Translation

OpenClaw Cost Optimization

OpenClaw成本优化

What Claude Gets Wrong Without This

不做这些优化时Claude的不合理行为

The Six Optimizations

六项优化措施

1. Route the Right Model to the Right Job

1. 为合适的任务匹配正确的模型

Model Selection

模型选择

2. Cache Stable Context

2. 缓存稳定上下文

3. Start Sessions Lean

3. 精简会话启动加载项

Session Startup

会话启动

4. Route Heartbeats to a Local Model

4. 将心跳路由到本地模型

Install Ollama

安装Ollama

Pull a lightweight model (2GB, handles heartbeat tasks easily)

拉取轻量模型（2GB，足以轻松处理心跳任务）

5. Set Rate Limits and Budget Caps

5. 设置速率限制和预算上限

API Discipline

API使用规范

6. Trim Your Workspace Files

6. 裁剪工作区文件

Communication Style

沟通风格

Style

风格

Combined Impact

组合效果

How to Apply This

如何应用这些优化

Verifying It Works

验证优化生效

Verification Commands

验证命令

What Good Looks Like

理想状态

Common Audit Mistake

常见审计错误

Anti-Patterns

反模式

Quality Check

质量检查