autoresearch-create

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Autoresearch

自动研究

Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
自主实验循环:尝试各种思路,保留有效方案,摒弃无效方案,持续运行永不停止。

Tools

工具

  • init_experiment
    — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.
  • run_experiment
    — runs command, times it, captures output.
  • log_experiment
    — records result.
    keep
    auto-commits.
    discard
    /
    crash
    /
    checks_failed
    git checkout -- .
    to revert. Always include secondary
    metrics
    dict. Dashboard: ctrl+x.
  • init_experiment
    — 配置会话(名称、指标、单位、优化方向)。当优化目标变更时,可再次调用该工具,基于新基准重新初始化。
  • run_experiment
    — 运行命令,记录耗时,捕获输出结果。
  • log_experiment
    — 记录实验结果。选择
    keep
    会自动提交代码。选择
    discard
    /
    crash
    /
    checks_failed
    时,执行
    git checkout -- .
    撤销更改。必须附带次要
    metrics
    字典。查看仪表盘:ctrl+x。

Setup

搭建步骤

  1. Ask (or infer): Goal, Command, Metric (+ direction), Files in scope, Constraints.
  2. git checkout -b autoresearch/<goal>-<date>
  3. Read the source files. Understand the workload deeply before writing anything.
  4. Write
    autoresearch.md
    and
    autoresearch.sh
    (see below). Commit both.
  5. init_experiment
    → run baseline →
    log_experiment
    → start looping immediately.
  1. 询问(或推断):目标命令指标(含优化方向)、涉及文件范围约束条件
  2. 执行
    git checkout -b autoresearch/<goal>-<date>
    创建分支
  3. 阅读源代码文件。在编写任何内容前,深入理解当前工作负载。
  4. 编写
    autoresearch.md
    autoresearch.sh
    (见下文),并提交这两个文件。
  5. 执行
    init_experiment
    → 运行基准实验 → 执行
    log_experiment
    → 立即启动循环。

autoresearch.md

autoresearch.md

This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
markdown
undefined
这是实验会话的核心文件。即使是无上下文的新Agent,也能通过阅读该文件有效运行实验循环。请花时间完善这份文件。
markdown
undefined

Autoresearch: <goal>

Autoresearch: <goal>

Objective

目标

<Specific description of what we're optimizing and the workload.>
<详细描述我们要优化的对象及工作负载。>

Metrics

指标

  • Primary: <name> (<unit>, lower/higher is better)
  • Secondary: <name>, <name>, ...
  • 主指标:<名称>(<单位>,数值越低/越高越好)
  • 次要指标:<名称>, <名称>, ...

How to Run

运行方式

./autoresearch.sh
— outputs
METRIC name=number
lines.
./autoresearch.sh
— 输出
METRIC name=number
格式的结果行。

Files in Scope

涉及文件

<Every file the agent may modify, with a brief note on what it does.>
<列出Agent可修改的所有文件,并简要说明各文件功能。>

Off Limits

禁止修改项

<What must NOT be touched.>
<列出绝对不能改动的内容。>

Constraints

约束条件

<Hard rules: tests must pass, no new deps, etc.>
<硬性规则:如必须通过测试、不能新增依赖等。>

What's Been Tried

已尝试方案

<Update this section as experiments accumulate. Note key wins, dead ends, and architectural insights so the agent doesn't repeat failed approaches.>

Update `autoresearch.md` periodically — especially the "What's Been Tried" section — so resuming agents have full context.
<随着实验推进更新此部分。记录关键成果、无效尝试和架构层面的洞察,避免Agent重复失败的方案。>

定期更新`autoresearch.md`——尤其是“已尝试方案”部分——以便恢复运行的Agent拥有完整上下文。

autoresearch.sh

autoresearch.sh

Bash script (
set -euo pipefail
) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs
METRIC name=number
lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
这是一个Bash脚本(启用
set -euo pipefail
),功能包括:快速预检查(1秒内检测语法错误)、运行基准测试、输出
METRIC name=number
格式的结果行。请保持脚本运行速度——每一秒的耗时都会在数百次运行中被放大。可在循环运行期间按需更新该脚本。

autoresearch.checks.sh
(optional)

autoresearch.checks.sh
(可选)

Bash script (
set -euo pipefail
) for backpressure/correctness checks: tests, types, lint, etc. Only create this file when the user's constraints require correctness validation (e.g., "tests must pass", "types must check").
When this file exists:
  • Runs automatically after every passing benchmark in
    run_experiment
    .
  • If checks fail,
    run_experiment
    reports it clearly — log as
    checks_failed
    .
  • Its execution time does NOT affect the primary metric.
  • You cannot
    keep
    a result when checks have failed.
  • Has a separate timeout (default 300s, configurable via
    checks_timeout_seconds
    ).
When this file does not exist, everything behaves exactly as before — no changes to the loop.
Keep output minimal. Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.
bash
#!/bin/bash
set -euo pipefail
这是一个Bash脚本(启用
set -euo pipefail
),用于进行正确性检查:如测试、类型检查、代码规范检查等。仅当用户的约束条件要求正确性验证时才创建该文件(例如:“必须通过测试”、“必须通过类型检查”)。
当该文件存在时:
  • 会在每次基准测试通过后自动运行。
  • 若检查失败,
    run_experiment
    会清晰报告结果——记录为
    checks_failed
  • 其执行时间不会影响主指标。
  • 检查失败时,无法选择
    keep
    保留结果。
  • 有独立的超时时间(默认300秒,可通过
    checks_timeout_seconds
    配置)。
当该文件不存在时,所有流程与之前完全一致——不会对循环运行产生任何变化。
**请保持输出内容精简。**仅会将检查失败时的最后80行输出反馈给Agent。屏蔽冗余的进度/成功输出,仅保留错误信息。这样可保持上下文简洁,帮助Agent快速定位问题。
bash
#!/bin/bash
set -euo pipefail

Example: run tests and typecheck — suppress success output, only show errors

示例:运行测试和类型检查——屏蔽成功输出,仅显示错误信息

pnpm test --run --reporter=dot 2>&1 | tail -50 pnpm typecheck 2>&1 | grep -i error || true
undefined
pnpm test --run --reporter=dot 2>&1 | tail -50 pnpm typecheck 2>&1 | grep -i error || true
undefined

Loop Rules

循环运行规则

LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.
  • Primary metric is king. Improved →
    keep
    . Worse/equal →
    discard
    . Secondary metrics rarely affect this.
  • Simpler is better. Removing code for equal perf = keep. Ugly complexity for tiny gain = probably discard.
  • Don't thrash. Repeatedly reverting the same idea? Try something structurally different.
  • Crashes: fix if trivial, otherwise log and move on. Don't over-invest.
  • Think longer when stuck. Re-read source files, study the profiling data, reason about what the CPU is actually doing. The best ideas come from deep understanding, not from trying random variations.
  • Resuming: if
    autoresearch.md
    exists, read it + git log, continue looping.
NEVER STOP. The user may be away for hours. Keep going until interrupted.
**持续循环运行。**永远不要询问“是否继续?”——用户期望的是自主式工作。
  • **主指标优先。**若主指标提升→选择
    keep
    。若主指标下降/持平→选择
    discard
    。次要指标几乎不影响该决策。
  • **越简洁越好。**在性能相当的情况下,删除代码的方案→保留。为微小性能提升引入复杂实现→通常应摒弃。
  • **避免无效重复。**若反复撤销同一方案?尝试从架构层面进行不同的改动。
  • **崩溃处理:**若问题简单则修复,否则记录问题后继续。不要过度投入时间。
  • **陷入瓶颈时多思考。**重新阅读源代码文件,分析性能分析数据,思考CPU实际的运行状态。最优方案来自深度理解,而非随机尝试。
  • **恢复运行:**若
    autoresearch.md
    已存在,阅读该文件及git日志后,继续循环运行。
**永远不要停止。**用户可能数小时不在场。持续运行直到被中断。

Ideas Backlog

待办思路

When you discover complex but promising optimizations that you won't pursue right now, append them as bullets to
autoresearch.ideas.md
. Don't let good ideas get lost.
On resume (context limit, crash), check
autoresearch.ideas.md
— prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.
当你发现复杂但有潜力的优化方案,却暂时无法实施时,将其以项目符号形式追加到
autoresearch.ideas.md
文件中
。不要让好想法被遗漏。
恢复运行时(如遇到上下文限制、崩溃后),检查
autoresearch.ideas.md
——删除过时/已尝试的条目,对剩余方案进行实验。当所有思路都已尝试完毕,删除该文件并撰写最终总结。

User Messages During Experiments

实验运行期间的用户消息处理

If the user sends a message while an experiment is running, finish the current
run_experiment
+
log_experiment
cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.
若在实验运行期间用户发送消息,需先完成当前的
run_experiment
+
log_experiment
周期,再在下一次迭代中纳入用户的反馈。不要中途终止正在运行的实验。