setup

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/ar:setup — Create New Experiment

/ar:setup — 创建新实验

Set up a new autoresearch experiment with all required configuration.

设置包含所有必要配置的新自动研究实验。

Usage

使用方法

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators

/ar:setup                                    # 交互式模式
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # 显示现有实验
/ar:setup --list-evaluators                  # 显示可用评估器

What It Does

功能说明

If arguments provided

若提供参数

Pass them directly to the setup script:

bash

python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

直接将参数传递给设置脚本：

bash

python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

若未提供参数（交互式模式）

Collect each parameter one at a time:

Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
Target file — Ask: "Which file to optimize?" Verify it exists.
Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
Direction — Ask: "Is lower or higher better?"
Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run

setup_experiment.py

with the collected parameters.

逐个收集每个参数：

领域 — 询问：“所属领域？（engineering、marketing、content、prompts、custom）”
实验名称 — 询问：“实验名称？（例如：api-speed、blog-titles）”
目标文件 — 询问：“要优化的文件是哪个？”并验证文件是否存在。
评估命令 — 询问：“如何进行测量？（例如：pytest bench.py、python evaluate.py）”
指标 — 询问：“评估输出的指标是什么？（例如：p50_ms、ctr_score）”
优化方向 — 询问：“数值越低越好还是越高越好？”
评估器（可选）— 显示内置评估器，询问：“使用内置评估器还是自定义评估器？”
存储范围 — 询问：“存储在项目目录（.autoresearch/）还是用户目录（~/.autoresearch/）？”

随后使用收集到的参数运行

setup_experiment.py

。

Listing

列表查看

bash

undefined

bash

undefined

Show existing experiments

显示现有实验

python {skill_path}/scripts/setup_experiment.py --list

Show available evaluators

显示可用评估器

python {skill_path}/scripts/setup_experiment.py --list-evaluators

undefined

python {skill_path}/scripts/setup_experiment.py --list-evaluators

undefined

Built-in Evaluators

内置评估器

Name	Metric	Use Case
`benchmark_speed`	`p50_ms` (lower)	Function/API execution time
`benchmark_size`	`size_bytes` (lower)	File, bundle, Docker image size
`test_pass_rate`	`pass_rate` (higher)	Test suite pass percentage
`build_speed`	`build_seconds` (lower)	Build/compile/Docker build time
`memory_usage`	`peak_mb` (lower)	Peak memory during execution
`llm_judge_content`	`ctr_score` (higher)	Headlines, titles, descriptions
`llm_judge_prompt`	`quality_score` (higher)	System prompts, agent instructions
`llm_judge_copy`	`engagement_score` (higher)	Social posts, ad copy, emails

名称	指标	使用场景
`benchmark_speed`	`p50_ms` （越低越好）	函数/API执行时间
`benchmark_size`	`size_bytes` （越低越好）	文件、打包文件、Docker镜像大小
`test_pass_rate`	`pass_rate` （越高越好）	测试套件通过率
`build_speed`	`build_seconds` （越低越好）	构建/编译/Docker构建时间
`memory_usage`	`peak_mb` （越低越好）	执行期间的峰值内存
`llm_judge_content`	`ctr_score` （越高越好）	标题、副标题、描述
`llm_judge_prompt`	`quality_score` （越高越好）	系统提示词、Agent指令
`llm_judge_copy`	`engagement_score` （越高越好）	社交帖子、广告文案、邮件

After Setup

设置完成后

Report to the user:

Experiment path and branch name
Whether the eval command worked and the baseline metric
Suggest: "Run
```
/ar:run {domain}/{name}
```
to start iterating, or
```
/ar:loop {domain}/{name}
```
for autonomous mode."

向用户反馈：

实验路径和分支名称
评估命令是否可用以及基准指标
建议：“运行
```
/ar:run {domain}/{name}
```
开始迭代，或使用
```
/ar:loop {domain}/{name}
```
进入自主模式。”