eve-agent-optimisation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Eve Agent Optimisation

Eve Agent优化

The goal: get the agent to its objective in the fewest tool calls, fewest tokens, shortest time. Find where it wastes effort and eliminate it.
目标:让Agent以最少的工具调用、最少的token、最短的时间达成目标。找出它浪费资源的环节并消除。

Hard Rule: Recommend, Don't Change

铁则:只推荐,不修改

Never change the harness, model, reasoning effort, or permission policy without asking the user first. These are cost and capability decisions that belong to the project owner. Diagnose, explain the tradeoff, and recommend — then wait for approval.
**未经用户事先询问,切勿修改运行框架、模型、推理强度或权限策略。**这些属于项目所有者的成本和能力决策范畴。你只需诊断问题、解释权衡方案并给出建议,然后等待批准即可。

What You're Looking For

排查方向

Analyse agent execution logs to identify:
  1. Wrong turns — agent tried an approach that couldn't work and had to backtrack.
  2. Blind alleys — agent spent tokens exploring something irrelevant to the goal.
  3. Unnecessary tool calls — agent read files it didn't need, ran commands that gave no useful information, or repeated calls with slight variations.
  4. Missing context — agent had to discover something through trial and error that should have been stated in the SKILL.md or job description.
  5. Wrong tool for the job — agent used a slow or fragile tool when a faster/native alternative exists (e.g., shelling out to
    pdftotext
    when the LLM reads PDFs natively).
  6. Excessive reading — agent read entire large files when it only needed a section, or read many files looking for something that could have been found with a targeted search.
  7. Verbose output — agent explained its reasoning at length when the task only needed a concise result.
  8. Retry loops — agent repeated the same failing operation, hoping for a different result.
分析Agent执行日志,识别以下问题:
  1. 错误转向 — Agent尝试了不可能生效的方案,不得不回溯。
  2. 死胡同 — Agent消耗token探索了和目标无关的内容。
  3. 不必要的工具调用 — Agent读取了不需要的文件、运行了无法产出有效信息的命令,或是重复调用仅有微小差异的接口。
  4. 上下文缺失 — Agent需要通过试错才能获得本应在SKILL.md或任务描述中说明的信息。
  5. 工具选型错误 — Agent使用了速度慢、稳定性差的工具,而本可以用更快速的原生替代方案(比如当LLM本身支持读取PDF时,却调用shell执行
    pdftotext
    )。
  6. 过度读取 — Agent只需部分内容却读取了整个大文件,或是遍历大量文件查找本可以通过定向搜索定位的内容。
  7. 输出冗余 — 任务只需要简洁结果,Agent却大段解释推理过程。
  8. 重试循环 — Agent重复执行相同的失败操作,期望得到不同结果。

Diagnostic Workflow

诊断工作流

Step 1: Get the Execution Record

步骤1:获取执行记录

bash
eve job diagnose <job-id>          # Full timeline, routing, errors
eve job show <job-id> --verbose    # Phase, attempts, harness, agent
eve job receipt <job-id>           # Token usage + cost
Key numbers:
  • Input tokens — how much the agent read. High = reading too much.
  • Output tokens — how much it wrote. High = verbose or excessive reasoning.
  • Attempt count — more than 1 means the agent crashed or timed out.
  • Duration — compare against what a focused agent should take.
bash
eve job diagnose <job-id>          # 完整时间线、路由、错误信息
eve job show <job-id> --verbose    # 阶段、尝试次数、运行框架、Agent信息
eve job receipt <job-id>           # Token用量 + 成本
核心指标:
  • 输入token — Agent读取的内容量。数值过高说明读取内容过多。
  • 输出token — Agent生成的内容量。数值过高说明输出冗余或推理过程过于冗长。
  • 尝试次数 — 次数大于1说明Agent发生崩溃或超时。
  • 执行时长 — 和聚焦目标的Agent预期耗时做对比。

Step 2: Stream or Replay the Logs

步骤2:流式查看或回放日志

bash
eve job follow <job-id>            # Real-time (if still active)
eve job logs <job-id>              # Historical
Read the log sequentially. For each tool call, ask:
  • Did this advance the goal? If not, it's waste.
  • Could this have been avoided? If the SKILL.md had told the agent where to look, would it have skipped this?
  • Was this the right tool? Could a different approach have gotten the same information faster?
  • Was the scope right? Did the agent read an entire file when it needed 10 lines?
bash
eve job follow <job-id>            # 实时日志(任务仍在运行时使用)
eve job logs <job-id>              # 历史日志
按顺序读取日志。针对每一次工具调用,思考:
  • 这次调用推进目标达成了吗? 如果没有,就是资源浪费。
  • 这次调用可以避免吗? 如果SKILL.md已经告知Agent查找方向,它是不是就会跳过这一步?
  • 工具选型是否正确? 有没有其他方案可以更快获得相同信息?
  • 调用范围是否合理? Agent是不是只需要10行内容,却读取了整个文件?

Step 3: Map the Critical Path

步骤3:梳理关键路径

Identify the minimum set of tool calls needed to achieve the goal:
  1. What files actually mattered?
  2. What commands actually produced useful output?
  3. What decisions were correct on first attempt?
Everything else is waste. Quantify: how many tool calls were on the critical path vs total? What percentage of tokens were spent on productive work?
识别达成目标所需的最小工具调用集合
  1. 哪些文件是真正相关的?
  2. 哪些命令真正产出了有效输出?
  3. 哪些决策是首次尝试就正确的?
除此之外的所有操作都是浪费。量化分析:关键路径上的工具调用占总调用量的比例?用于有效工作的token占总token的百分比?

Step 4: Identify Root Causes

步骤4:定位根本原因

For each category of waste, trace back to the root cause:
WasteRoot CauseFix
Agent explored wrong filesSKILL.md doesn't say where to lookAdd specific file paths or search patterns to SKILL.md
Agent tried wrong approach firstSKILL.md doesn't state the preferred approachAdd explicit instructions: "Do X, not Y"
Agent read files it didn't needJob description too vagueNarrow the description; specify exact scope
Agent retried failing commandNo error handling guidanceAdd failure mode instructions to SKILL.md
Agent used wrong tool for file typeSKILL.md doesn't mention native capabilitiesAdd file-type routing: "PDFs: read natively. Images: view directly."
Agent read entire large fileNo guidance on targeted readingAdd instructions: "Read only lines 1-50" or "Search for X"
Agent verbose in outputNo output format specifiedSpecify exact format: JSON schema, attachment name, concise summary
Agent lacks context for decisionsMissing resource refs or env varsAttach the right resources; ensure
with_apis
is configured
Agent re-discovers known factsNo persistent memory strategyUse org docs, KV store, or attachments to carry forward knowledge
Agent slow due to provisioningToo many resources, large clone, unnecessary toolchainsTrim resource refs, configure shallow clone, remove unused toolchains
针对每一类浪费,回溯根本原因:
浪费类型根本原因修复方案
Agent探索了错误的文件SKILL.md未说明查找位置在SKILL.md中添加具体的文件路径或搜索规则
Agent优先尝试了错误方案SKILL.md未说明推荐方案添加明确指令:「执行X,不要执行Y」
Agent读取了不需要的文件任务描述过于模糊收窄描述范围,明确具体边界
Agent重试失败命令无错误处理指引在SKILL.md中添加失败场景处理说明
Agent针对文件类型使用了错误工具SKILL.md未提及原生能力添加文件类型路由规则:「PDF:原生读取;图片:直接查看」
Agent读取了整个大文件无定向读取指引添加指令:「仅读取1-50行」或「搜索X内容」
Agent输出冗余未指定输出格式明确输出格式:JSON schema、附件名称、简洁摘要
Agent缺乏决策上下文缺失资源引用或环境变量关联正确的资源,确保
with_apis
已正确配置
Agent重复获取已知信息无持久化内存策略使用组织文档、KV存储或附件传递历史知识
Agent因环境初始化运行缓慢资源过多、仓库克隆体积大、工具链冗余精简资源引用、配置浅克隆、移除未使用的工具链

The Fix Is Almost Always the SKILL.md

优化方案几乎都可以通过修改SKILL.md实现

The SKILL.md is the highest-leverage optimisation target. A precise SKILL.md eliminates entire categories of wasted tool calls.
SKILL.md是杠杆率最高的优化对象。一份精确的SKILL.md可以消除整类工具调用浪费。

Write for Efficiency

编写高效的SKILL.md

  1. State the goal in one sentence. The agent should know exactly what it's trying to achieve before doing anything.
  2. Name specific files and paths. "Check the auth config" wastes tool calls searching. "Read
    src/config/auth.ts
    lines 1-30" is one tool call.
  3. State the approach explicitly. "Use native PDF reading via the Read tool — do NOT shell out to conversion tools" prevents the agent from trying the wrong path.
  4. Specify what NOT to do. If there's a common wrong turn, block it. "Do not read the entire test suite; only read the failing test file."
  5. Define the output format. "Write a JSON attachment named
    findings.json
    with schema
    {issues: [{file, line, severity, message}]}
    ." This eliminates formatting deliberation.
  6. Tell the agent what context it has. "The resource index at
    .eve/resources/index.json
    lists all attached documents with mime_type. Read it first to determine processing strategy."
  7. Provide decision trees for branches. Instead of "handle different file types appropriately":
    Check mime_type in resource index:
    - application/pdf → read natively, use page ranges for >10 pages
    - text/* → read directly
    - image/* → view directly (multimodal)
    - other → describe and note for human review
  8. Keep it short. Every word the agent reads consumes input tokens. Cut filler. Use tables and lists over prose.
  1. 用一句话说明目标。Agent在执行任何操作前都应该明确知道要达成什么结果。
  2. 明确指定文件和路径。「检查鉴权配置」会让Agent浪费工具调用搜索,「读取
    src/config/auth.ts
    的1-30行」只需要一次工具调用。
  3. 明确说明执行方案。「使用Read工具的原生PDF读取能力——不要调用shell执行转换工具」可以避免Agent走弯路。
  4. 明确禁止的操作。如果存在常见的错误转向,直接屏蔽。「不要读取整个测试套件,仅读取失败的测试文件即可」。
  5. 定义输出格式。「生成名为
    findings.json
    的JSON附件,schema为
    {issues: [{file, line, severity, message}]}
    」可以消除Agent在输出格式上的纠结。
  6. 告知Agent已有的上下文。「
    .eve/resources/index.json
    下的资源索引列出了所有关联文档的mime_type,优先读取该文件确定处理策略」。
  7. 提供分支决策树。不要写「合理处理不同文件类型」,而是写成:
    检查资源索引中的mime_type:
    - application/pdf → 原生读取,超过10页的文件按页范围读取
    - text/* → 直接读取
    - image/* → 直接查看(多模态能力)
    - 其他类型 → 描述内容并标注等待人工审核
  8. 保持简洁。Agent读取的每个字都会消耗输入token。删除冗余内容,优先使用表格和列表而非大段文字。

Test the SKILL.md

测试SKILL.md效果

After rewriting, run the same job again and compare:
  • Fewer tool calls?
  • Fewer tokens?
  • Faster completion?
  • Correct result on first attempt?
bash
eve job compare <old-job-id> <new-job-id>   # Compare receipts
修改完成后,重新运行相同任务,对比以下指标:
  • 工具调用次数是否减少?
  • Token用量是否减少?
  • 完成速度是否更快?
  • 是否首次尝试就得到正确结果?
bash
eve job compare <old-job-id> <new-job-id>   # 对比执行账单

Beyond the SKILL.md

超出SKILL.md调整范围的优化

When SKILL.md changes aren't sufficient, look at these levers (all require user approval to change):
当修改SKILL.md无法满足需求时,可以考虑以下调整项(所有修改都需要用户批准):

Harness and Model

运行框架和模型

If the agent is consistently:
  • Too slow for the task → recommend a faster model (e.g., sonnet → haiku).
  • Not capable enough → recommend a more capable model (e.g., sonnet → opus).
  • Using too many thinking tokens → recommend lower reasoning effort.
  • Not thinking enough → recommend higher reasoning effort.
Present the tradeoff (speed vs cost vs quality) and let the user decide.
如果Agent持续出现以下问题:
  • 任务执行过慢 → 推荐更快的模型(比如sonnet切换为haiku)。
  • 能力不足无法完成任务 → 推荐能力更强的模型(比如sonnet切换为opus)。
  • 思考token消耗过多 → 推荐降低推理强度。
  • 推理深度不足 → 推荐提高推理强度。
给出权衡方案(速度vs成本vs质量),由用户做最终决策。

Permission Policy

权限策略

If the agent is blocked waiting for approvals on every file edit:
  • Recommend
    yolo
    for automated batch work.
  • Recommend
    auto_edit
    for supervised coding.
  • Explain the security implications.
如果Agent每次编辑文件都需要等待审批被阻塞:
  • 自动化批量任务推荐开启
    yolo
    权限。
  • supervised编码场景推荐开启
    auto_edit
    权限。
  • 说明对应的安全影响。

Resource Refs

资源引用

If provisioning is slow:
  • Remove resource refs the agent doesn't actually use.
  • Mark optional context as
    required: false
    .
  • Thread
    mime_type
    so the agent doesn't need to probe file types.
如果环境初始化过慢:
  • 移除Agent实际不需要的资源引用。
  • 将可选上下文标记为
    required: false
  • 补充
    mime_type
    信息,避免Agent探测文件类型。

Git Controls

Git控制

If the agent wastes time on git operations:
  • commit: auto
    +
    push: on_success
    eliminates manual git ceremony.
  • create_branch: if_missing
    avoids branch creation failures.
  • ref_policy: auto
    minimises clone scope.
如果Agent在git操作上浪费时间:
  • 配置
    commit: auto
    +
    push: on_success
    消除手动git操作流程。
  • 配置
    create_branch: if_missing
    避免分支创建失败。
  • 配置
    ref_policy: auto
    最小化克隆范围。

Job Scope

任务范围

If the agent is doing too much in one job:
  • Split into focused children via orchestration.
  • Each child gets a narrow scope and specialised SKILL.md.
  • Cheaper models for simpler children; capable models only where needed.
如果单个任务中Agent需要处理的内容过多:
  • 通过编排拆分为多个聚焦的子任务。
  • 每个子任务配置窄范围的专项SKILL.md。
  • 简单子任务使用更便宜的模型,仅在必要场景使用高能力模型。

Team Coordination

团队协作

If child agents duplicate work:
  • Ensure skills read
    .eve/coordination-inbox.md
    at startup.
  • Wire
    depends_on
    for sequential steps.
  • Use attachments (not prose) for passing data between jobs.
如果子Agent重复执行相同工作:
  • 确保技能启动时优先读取
    .eve/coordination-inbox.md
  • 为串行步骤配置
    depends_on
    依赖。
  • 使用附件(而非文字描述)在任务间传递数据。

Optimisation Report Template

优化报告模板

After analysing an agent's execution, present findings in this format:
undefined
分析完Agent执行过程后,按以下格式输出结果:
undefined

Agent Optimisation Report: <job-id>

Agent优化报告:<job-id>

Goal: <what the agent was trying to do> Result: <succeeded/failed> in <duration> using <tokens> tokens (<cost>)
目标: <Agent的执行目标> 结果: <成功/失败>,耗时<duration>,消耗<tokens> token(成本<cost>

Efficiency Score

效率得分

  • Total tool calls: N
  • Productive tool calls: M (X%)
  • Wasted tool calls: N-M (Y%)
  • 总工具调用次数:N
  • 有效工具调用次数:M (占比X%)
  • 浪费工具调用次数:N-M (占比Y%)

Waste Categories

浪费分类

  1. <category>: N calls, ~X tokens wasted
    • Example: <specific wasteful action from logs>
    • Fix: <specific SKILL.md or config change>
  1. <类别>:N次调用,浪费约X token
    • 示例:<日志中具体的浪费行为>
    • 修复方案:<具体的SKILL.md或配置修改建议>

Recommended Changes

推荐修改项

  • SKILL.md: <specific edit> — eliminates <category> waste
  • SKILL.md: <specific edit> — eliminates <category> waste
  • (Requires approval) Model: <current><recommended><reason>
  • (Requires approval) Reasoning: <current><recommended><reason>
  • SKILL.md:<具体修改内容> — 消除<类别>浪费
  • SKILL.md:<具体修改内容> — 消除<类别>浪费
  • (需审批)模型:<当前模型> → <推荐模型> — <原因>
  • (需审批)推理强度:<当前值> → <推荐值> — <原因>

Expected Improvement

预期提升

  • Estimated tool calls: N → M
  • Estimated tokens: X → Y
  • Estimated time: A → B
undefined
  • 预计工具调用次数:N → M
  • 预计Token用量:X → Y
  • 预计执行时长:A → B
undefined

Quick Reference: Common Waste Patterns

快速参考:常见浪费模式

PatternSignal in LogsFix
File huntingMultiple
Read
calls to different files
Name the target file in SKILL.md
Grep cascadeMultiple searches with different patternsProvide the right search term
Trial and errorTool call fails, agent retries with variationDocument the correct approach
Over-readingRead tool on 5000+ line fileSpecify line ranges or tell agent to search first
Unnecessary explorationAgent reads README, CHANGELOG, etc.Explicitly say what NOT to read
Format deliberationLong assistant turns deciding output structureSpecify output format in SKILL.md
Redundant validationAgent re-checks things it already confirmedStructure the SKILL.md as a linear flow
Native capability missShell out to CLI tool when LLM can process directlyState native capabilities explicitly
Context re-discoveryAgent re-learns project structure every runUse org docs or KV store for persistent context
Approval blockingAgent pauses waiting for permissionRecommend
yolo
or
auto_edit
to user
模式日志信号修复方案
文件查找多次针对不同文件的
Read
调用
在SKILL.md中明确目标文件
级联搜索多次使用不同规则的搜索操作提供正确的搜索关键词
试错执行工具调用失败后,Agent微调参数重试说明正确的执行方案
过度读取对5000行以上的文件执行Read操作指定行范围,或告知Agent优先搜索
无意义探索Agent读取README、CHANGELOG等无关文件明确说明禁止读取的内容
格式纠结Agent多次输出长内容讨论输出结构在SKILL.md中明确输出格式
冗余校验Agent重复检查已经确认过的内容将SKILL.md设计为线性流程
忽略原生能力当LLM可以直接处理时,仍调用CLI工具明确说明原生能力支持范围
上下文重复获取Agent每次运行都重新学习项目结构使用组织文档或KV存储持久化上下文
审批阻塞Agent暂停运行等待权限向用户推荐开启
yolo
auto_edit

Related Skills

相关技能

  • eve-job-debugging
    — CLI commands for monitoring and diagnosing jobs.
  • eve-orchestration
    — decomposing work into parallel children.
  • eve-agent-memory
    — storage primitives for persistence across jobs.
  • eve-skill-distillation
    — encoding learned patterns into reusable skills.
  • eve-read-eve-docs
    — platform reference docs (CLI, manifest, jobs, harnesses).
  • eve-job-debugging
    — 用于监控和诊断任务的CLI命令。
  • eve-orchestration
    — 将任务拆分为并行子任务的编排能力。
  • eve-agent-memory
    — 跨任务持久化的存储原语。
  • eve-skill-distillation
    — 将沉淀的模式编码为可复用技能的能力。
  • eve-read-eve-docs
    — 平台参考文档(CLI、manifest、任务、运行框架)。