setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/ar:setup — Create New Experiment

/ar:setup — 创建新实验

Set up a new autoresearch experiment with all required configuration.
设置包含所有必要配置的新自动研究实验。

Usage

使用方法

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators
/ar:setup                                    # 交互式模式
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # 显示现有实验
/ar:setup --list-evaluators                  # 显示可用评估器

What It Does

功能说明

If arguments provided

若提供参数

Pass them directly to the setup script:
bash
python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]
直接将参数传递给设置脚本:
bash
python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

若未提供参数(交互式模式)

Collect each parameter one at a time:
  1. Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
  2. Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
  3. Target file — Ask: "Which file to optimize?" Verify it exists.
  4. Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
  5. Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
  6. Direction — Ask: "Is lower or higher better?"
  7. Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
  8. Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"
Then run
setup_experiment.py
with the collected parameters.
逐个收集每个参数:
  1. 领域 — 询问:“所属领域?(engineering、marketing、content、prompts、custom)”
  2. 实验名称 — 询问:“实验名称?(例如:api-speed、blog-titles)”
  3. 目标文件 — 询问:“要优化的文件是哪个?”并验证文件是否存在。
  4. 评估命令 — 询问:“如何进行测量?(例如:pytest bench.py、python evaluate.py)”
  5. 指标 — 询问:“评估输出的指标是什么?(例如:p50_ms、ctr_score)”
  6. 优化方向 — 询问:“数值越低越好还是越高越好?”
  7. 评估器(可选)— 显示内置评估器,询问:“使用内置评估器还是自定义评估器?”
  8. 存储范围 — 询问:“存储在项目目录(.autoresearch/)还是用户目录(~/.autoresearch/)?”
随后使用收集到的参数运行
setup_experiment.py

Listing

列表查看

bash
undefined
bash
undefined

Show existing experiments

显示现有实验

python {skill_path}/scripts/setup_experiment.py --list
python {skill_path}/scripts/setup_experiment.py --list

Show available evaluators

显示可用评估器

python {skill_path}/scripts/setup_experiment.py --list-evaluators
undefined
python {skill_path}/scripts/setup_experiment.py --list-evaluators
undefined

Built-in Evaluators

内置评估器

NameMetricUse Case
benchmark_speed
p50_ms
(lower)
Function/API execution time
benchmark_size
size_bytes
(lower)
File, bundle, Docker image size
test_pass_rate
pass_rate
(higher)
Test suite pass percentage
build_speed
build_seconds
(lower)
Build/compile/Docker build time
memory_usage
peak_mb
(lower)
Peak memory during execution
llm_judge_content
ctr_score
(higher)
Headlines, titles, descriptions
llm_judge_prompt
quality_score
(higher)
System prompts, agent instructions
llm_judge_copy
engagement_score
(higher)
Social posts, ad copy, emails
名称指标使用场景
benchmark_speed
p50_ms
(越低越好)
函数/API执行时间
benchmark_size
size_bytes
(越低越好)
文件、打包文件、Docker镜像大小
test_pass_rate
pass_rate
(越高越好)
测试套件通过率
build_speed
build_seconds
(越低越好)
构建/编译/Docker构建时间
memory_usage
peak_mb
(越低越好)
执行期间的峰值内存
llm_judge_content
ctr_score
(越高越好)
标题、副标题、描述
llm_judge_prompt
quality_score
(越高越好)
系统提示词、Agent指令
llm_judge_copy
engagement_score
(越高越好)
社交帖子、广告文案、邮件

After Setup

设置完成后

Report to the user:
  • Experiment path and branch name
  • Whether the eval command worked and the baseline metric
  • Suggest: "Run
    /ar:run {domain}/{name}
    to start iterating, or
    /ar:loop {domain}/{name}
    for autonomous mode."
向用户反馈:
  • 实验路径和分支名称
  • 评估命令是否可用以及基准指标
  • 建议:“运行
    /ar:run {domain}/{name}
    开始迭代,或使用
    /ar:loop {domain}/{name}
    进入自主模式。”