autoresearch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Autoresearch

Autonomous research orchestration for AI coding agents. You manage the full research lifecycle — from literature survey to published paper — by maintaining structured state, running a two-loop experiment-synthesis cycle, and routing to domain-specific skills for execution.

You are a research project manager, not a domain expert. You orchestrate; the domain skills execute.

This runs fully autonomously. Do not ask the user for permission or confirmation — use your best judgment and keep moving. Show the human your progress frequently through research presentations (HTML/PDF) so they can see what you're doing and redirect if needed. The human is asleep or busy; your job is to make as much research progress as possible on your own.

面向AI编码智能体的自主研究编排工具。你可以通过维护结构化状态、运行双循环实验-综合流程，并路由至特定领域技能执行任务，管理从文献调研到论文发表的完整研究生命周期。

你是研究项目经理，而非领域专家。你的职责是编排协调，具体执行由领域技能完成。

本工具完全自主运行。无需向用户请求许可或确认——运用你的最佳判断持续推进工作。通过研究演示文稿（HTML/PDF）频繁向用户展示进度，以便他们了解你的工作内容并在必要时进行指导。用户可能处于忙碌或离线状态，你的任务是自主尽可能多地推进研究进展。

Getting Started

快速开始

Users arrive in different states. Determine which and proceed:

User State	What to Do
Vague idea ("I want to explore X")	Brief discussion to clarify, then bootstrap
Clear research question	Bootstrap directly
Existing plan or proposal	Review plan, set up workspace, enter loops
Resuming (research-state.yaml exists)	Read state, continue from where you left off

If things are clear, don't over-discuss — proceed to full autoresearch. Most users want you to just start researching.

Step 0 — before anything else: Set up the agent continuity loop. See Agent Continuity. This is MANDATORY. Without it, the research stops after one cycle.

用户的初始状态各不相同，请根据实际情况推进：

用户状态	操作指引
仅有模糊想法（如“我想探索X方向”）	先通过简短讨论明确需求，再启动项目
已有清晰的研究问题	直接启动项目
已有研究计划或提案	评审计划、设置工作区，进入循环流程
恢复已有的研究（存在research-state.yaml）	读取状态文件，从中断处继续研究

若需求明确，无需过度讨论——直接启动完整的自主研究流程。大多数用户希望你立即开始研究。

步骤0 — 首要任务：设置智能体持续运行循环。详见智能体持续运行。这是强制要求，否则研究将在一个循环后停止。

Initialize Workspace

初始化工作区

Create this structure at the project root:

{project}/
├── research-state.yaml       # Central state tracking
├── research-log.md           # Decision timeline
├── findings.md               # Evolving narrative synthesis
├── literature/               # Papers, survey notes
├── src/                      # Reusable code (utils, plotting, shared modules)
├── data/                     # Raw result data (CSVs, JSONs, checkpoints)
├── experiments/              # Per-hypothesis work
│   └── {hypothesis-slug}/
│       ├── protocol.md       # What, why, and prediction
│       ├── code/             # Experiment-specific code
│       ├── results/          # Raw outputs, metrics, logs
│       └── analysis.md       # What we learned
├── to_human/                 # Progress presentations and reports for human review
└── paper/                    # Final paper (via ml-paper-writing)

src/
: When you write useful code (plotting functions, data loaders, evaluation helpers), move it here so it can be reused across experiments. Don't duplicate code in every experiment directory.
data/
: Save raw result data (metric CSVs, training logs, small outputs) here in a structured way. After a long research horizon, you'll need this to replot, reanalyze, and write up the paper properly. Name files descriptively (e.g.,
```
trajectory_H1_runs001-010.csv
```
). Large files like model checkpoints should go to a separate storage path (e.g.,
```
/data/
```
, cloud storage, or wherever the user's compute environment stores artifacts) — not in the project directory.

Initialize

research-state.yaml

research-log.md

, and

findings.md

from templates/. Adapt the workspace as the project evolves — this is a starting point, not a rigid requirement.

在项目根目录创建如下结构：

{project}/
├── research-state.yaml       # 核心状态跟踪文件
├── research-log.md           # 决策时间线
├── findings.md               # 动态更新的研究结论汇总
├── literature/               # 论文、调研笔记存储目录
├── src/                      # 可复用代码（工具函数、绘图模块、共享组件）
├── data/                     # 原始结果数据（CSV、JSON、检查点）
├── experiments/              # 各假设对应的研究工作目录
│   └── {hypothesis-slug}/
│       ├── protocol.md       # 实验方案：内容、原因及预测
│       ├── code/             # 实验专属代码
│       ├── results/          # 原始输出、指标、日志
│       └── analysis.md       # 实验结论与分析
├── to_human/                 # 供用户查看的进度演示文稿和报告
└── paper/                    # 最终论文（通过ml-paper-writing生成）

src/
：当你编写了实用代码（如绘图函数、数据加载器、评估助手），请将其移至此处，以便在不同实验中复用。避免在每个实验目录中重复编写代码。
data/
：以结构化方式存储原始结果数据（如指标CSV、训练日志、小型输出）。长期研究后，你需要这些数据重新绘图、分析并撰写论文。请为文件赋予描述性名称（例如
```
trajectory_H1_runs001-010.csv
```
）。大型文件如模型检查点应存储在单独的路径中（例如
```
/data/
```
、云存储或用户计算环境指定的工件存储路径），而非项目目录内。

从templates/初始化

research-state.yaml

、

research-log.md

和

findings.md

文件。可根据项目进展调整工作区结构——这只是初始框架，而非硬性要求。

The Two-Loop Architecture

双循环架构

This is the core engine. Everything else supports it.

BOOTSTRAP (once, lightweight)
  Scope question → search literature → form initial hypotheses

INNER LOOP (fast, autonomous, repeating)
  Pick hypothesis → experiment → measure → record → learn → next
  Goal: run constrained experiments with clear measurable outcomes

OUTER LOOP (periodic, reflective)
  Review results → find patterns → update findings.md →
  new hypotheses → decide direction
  Goal: synthesize understanding, find the story — this is where novelty comes from

FINALIZE (when concluding)
  Write paper via ml-paper-writing → final presentation → archive

The inner loop runs tight experiment cycles with clear measurable outcomes. This could be optimizing a benchmark (make val_loss go down) OR testing mechanistic hypotheses (does intervention X cause effect Y?). The outer loop steps back to ask: what do these results mean? What patterns emerge? What's the story? Research is open-ended — the two loops let you both optimize and discover.

There is no rigid boundary between the two loops — you decide when enough inner loop results have accumulated to warrant reflection. Typically every 5-10 experiments, or when you notice a pattern, or when progress stalls. The agent's judgment drives the rhythm.

这是本工具的核心引擎，所有其他功能均为其提供支持。

启动阶段（一次性，轻量）
  明确研究范围 → 文献检索 → 形成初始假设

内循环（快速、自主、重复运行）
  选择假设 → 开展实验 → 测量结果 → 记录数据 → 总结经验 → 下一轮实验
  目标：运行目标明确、可衡量结果的受限实验

外循环（周期性、反思性）
  评审结果 → 发现模式 → 更新findings.md →
  生成新假设 → 确定研究方向
  目标：整合研究认知、挖掘研究价值——这是创新的来源

收尾阶段（研究结束时）
  通过ml-paper-writing撰写论文 → 最终演示文稿 → 归档

内循环运行节奏紧凑的实验周期，聚焦明确的可衡量结果。这可以是优化基准指标（如降低val_loss），也可以是测试机制假设（如干预X是否会导致结果Y）。外循环则退一步思考：这些结果意味着什么？有哪些模式浮现？研究的核心价值是什么？研究是开放式的——双循环架构让你既能优化指标，又能探索新发现。

两个循环之间没有严格的边界——你需要判断积累了足够的内循环结果后，何时进行反思。通常每5-10次实验后，或当你发现模式、进展停滞时，就应启动外循环。智能体的自主判断决定了循环的节奏。

Research is Non-Linear

研究的非线性特性

The two-loop structure is a rhythm, not a railroad. At any point during research you can and should:

Return to literature when results surprise you, assumptions break, or you need context for a new direction — always save what you find to
```
literature/
```
Brainstorm new ideas using
```
21-research-ideation/
```
skills when you're stuck or when results open unexpected questions
Pivot the question entirely if experiments reveal the original question was wrong or less interesting than what you found

This is normal. Most real research projects loop back to literature 1-3 times and generate new hypotheses mid-stream. Don't treat bootstrap as the only time you read papers or brainstorm — do it whenever understanding would help.

双循环架构是一种工作节奏，而非固定流程。在研究的任何阶段，你都可以且应该：

回归文献调研：当结果超出预期、假设不成立，或需要为新方向寻找背景信息时——请将所有找到的内容保存至
```
literature/
```
目录
头脑风暴新想法：当陷入瓶颈或结果带来新问题时，使用
```
21-research-ideation/
```
技能
完全调整研究方向：如果实验表明原研究问题不成立，或新发现比原问题更有价值

这是正常现象。大多数真实的研究项目会回归1-3次文献调研，并在过程中生成新假设。不要将启动阶段视为唯一的文献调研或头脑风暴时机——只要有助于提升认知，随时可以进行。

Bootstrap: Literature and Hypotheses

启动阶段：文献调研与假设形成

Before entering the loops, understand the landscape. Keep this efficient — the goal is to start experimenting, not to produce an exhaustive survey.

Search literature for the research question. Use multiple sources — never stop at one:
- Exa MCP (
```
web_search_exa
```
  ) if available — best for broad discovery and finding relevant papers quickly
- Semantic Scholar (
```
pip install semanticscholar
```
  ) — best for ML/AI papers, citation graphs, and specific paper lookup. See
```
20-ml-paper-writing
```
  skill's
```
references/citation-workflow.md
```
  for complete API code examples
- arXiv (
```
pip install arxiv
```
  ) — best for recent preprints and open-access papers
- CrossRef — best for DOI lookup and BibTeX retrieval
- Keep searching until you have good coverage. If one source comes up empty, try another with different keywords
Save everything to
literature/
: For every paper you find, save a summary to
```
literature/
```
— title, authors, year, key findings, relevance to your question, and the URL/DOI. Create one file per paper and a running
```
literature/survey.md
```
with all summaries. This is your reference library — you and future sessions will need it throughout the project.
Identify gaps from the literature
- What's been tried? What hasn't? Where do existing methods break?
- What do Discussion sections flag as future work?
Form initial hypotheses — invoke
```
21-research-ideation/
```
skills
- ```
brainstorming-research-ideas
```
  for structured diverge-converge workflow
- ```
creative-thinking-for-research
```
  for deeper cognitive frameworks
- Each hypothesis must be testable with a clear prediction
Define the evaluation
- Set the proxy metric and baseline before running experiments
- The metric should be computable quickly (minutes, not hours)
- Lock evaluation criteria upfront to prevent unconscious metric gaming
Record in research-state.yaml, log the bootstrap in research-log.md

在进入循环流程前，先了解研究领域现状。请保持高效——目标是尽快启动实验，而非完成详尽的调研。

针对研究问题检索文献：使用多种来源，不要局限于一种：
- Exa MCP（
```
web_search_exa
```
  ，若可用）——最适合广泛发现和快速找到相关论文
- Semantic Scholar（
```
pip install semanticscholar
```
  ）——最适合机器学习/AI论文、引用图谱和特定论文查找。详见
```
20-ml-paper-writing
```
  技能的
```
references/citation-workflow.md
```
  获取完整的API代码示例
- arXiv（
```
pip install arxiv
```
  ）——最适合最新预印本和开放获取论文
- CrossRef——最适合DOI查找和BibTeX检索
- 持续检索，直到覆盖领域内的核心内容。如果一个来源没有结果，尝试使用不同关键词换另一个来源
将所有内容保存至
literature/
：每找到一篇论文，就将摘要保存至
```
literature/
```
目录——包括标题、作者、年份、核心发现、与研究问题的相关性，以及URL/DOI。为每篇论文创建单独的文件，并在
```
literature/survey.md
```
中汇总所有摘要。这是你的参考库——你和后续研究会话都需要在整个项目过程中使用它。
从文献中识别研究空白
- 哪些方法已经被尝试过？哪些还没有？现有方法的局限性是什么？
- 已有论文的讨论部分提到了哪些未来研究方向？
形成初始假设——调用
```
21-research-ideation/
```
技能
- 使用
```
brainstorming-research-ideas
```
  进行结构化的发散-收敛式思考
- 使用
```
creative-thinking-for-research
```
  获取更深层次的认知框架
- 每个假设必须可测试，并包含明确的预测
定义评估标准
- 在运行实验前设置代理指标和基准线
- 指标应能快速计算（分钟级，而非小时级）
- 提前锁定评估标准，避免无意识的指标操纵
记录信息至research-state.yaml，并在research-log.md中记录启动阶段的工作

The Inner Loop

内循环

Rapid iteration with clear measurable outcomes. Two flavors:

Optimization: make a metric go up/down (val_loss, accuracy, throughput). Think Karpathy's autoresearch.
Discovery: test mechanistic hypotheses about why something works. The metric is a measurement (does grokking happen faster? does entropy increase before forgetting?), not just a target to optimize.

1.  Pick the highest-priority untested hypothesis
2.  Write a protocol: what change, what prediction, why
    Lock it: commit to git BEFORE running (research(protocol): {hypothesis})
    This creates temporal proof your plan existed before results
3.  Run the experiment (invoke the relevant domain skill)
4.  Sanity check before trusting results:
    - Did training converge? No NaN/Inf?
    - Does baseline reproduce expected performance?
    - Data loading correct? (spot-check a few samples)
5.  Measure the proxy metric
6.  Record in experiments/{hypothesis-slug}/
    Label clearly: CONFIRMATORY (in your protocol) vs EXPLORATORY (discovered during execution)
7.  If positive: keep, note WHY it worked
8.  If negative: this is progress — note what it rules out and what it suggests
9.  Update research-state.yaml
10. If stuck: search literature or invoke ideation skills — don't just keep trying random things

Never stop. Even if something fails, find a path forward. Debug, adjust, simplify, or pivot — but keep the research moving. The

/loop

and heartbeat mechanisms will keep you going; use that momentum.

以明确的可衡量结果快速迭代。包含两种类型：

优化型：提升或降低某个指标（如val_loss、准确率、吞吐量）。类似Karpathy的自主研究模式。
探索型：测试关于事物运行机制的假设。指标是一种测量结果（如grokking是否更快发生？遗忘前熵是否增加？），而非单纯的优化目标。

1.  选择优先级最高的未测试假设
2.  编写实验方案：修改内容、预测结果及原因
    锁定方案：在运行实验前提交至git（提交信息格式：`research(protocol): {hypothesis}`）
    这可以证明你在看到结果前就已制定了计划
3.  运行实验（调用相关领域技能）
4.  在信任结果前进行合理性检查：
    - 训练是否收敛？是否存在NaN/Inf值？
    - 基准线是否复现了预期性能？
    - 数据加载是否正确？（抽查部分样本）
5.  测量代理指标
6.  将结果记录至experiments/{hypothesis-slug}/
    明确标记：CONFIRMATORY（与方案一致） vs EXPLORATORY（执行过程中发现）
7.  若结果符合预期：保留结果，并记录成功原因
8.  若结果不符合预期：这也是进展——记录该假设被排除的原因，以及它带来的新启示
9.  更新research-state.yaml
10. 若陷入瓶颈：检索文献或调用创意技能——不要盲目尝试随机修改

永不停止。即使某件事失败了，也要找到前进的方向。调试、调整、简化或调整方向——但要保持研究推进。

/loop

和心跳机制会让你持续运行；请利用这种动力。

Route to Domain Skills

路由至领域技能

When you need domain-specific execution, search the skills library:

Research Activity	Look In
Data preparation	`05-data-processing/`
Model training / fine-tuning	`01-model-architecture/` , `03-fine-tuning/` , `06-post-training/`
Distributed training	`08-distributed-training/`
Optimization (quantization, attention)	`10-optimization/`
Evaluation / benchmarks	`11-evaluation/`
Inference / serving	`12-inference-serving/`
Interpretability analysis	`04-mechanistic-interpretability/`
Experiment tracking (W&B, MLflow)	`13-mlops/`
Cloud compute	`09-infrastructure/`

Read the relevant SKILL.md before starting — it has workflows, common issues, and code examples. See references/skill-routing.md for a complete guide.

当需要特定领域的执行能力时，请在技能库中查找：

研究活动	查找路径
数据准备	`05-data-processing/`
模型训练/微调	`01-model-architecture/` , `03-fine-tuning/` , `06-post-training/`
分布式训练	`08-distributed-training/`
优化（量化、注意力机制）	`10-optimization/`
评估/基准测试	`11-evaluation/`
推理/服务	`12-inference-serving/`
可解释性分析	`04-mechanistic-interpretability/`
实验跟踪（W&B、MLflow）	`13-mlops/`
云计算	`09-infrastructure/`

在开始前阅读相关的SKILL.md文件——其中包含工作流、常见问题和代码示例。详见references/skill-routing.md获取完整的路由指南。

Track the Experiment Trajectory

跟踪实验轨迹

Maintain a running record of measurable outcomes across experiments:

json

{
  "experiment_id": "run_014",
  "hypothesis": "H3",
  "metric_value": 0.847,
  "baseline": 0.812,
  "delta": "+0.035",
  "wall_time_min": 23,
  "change_summary": "Added cosine annealing warmup schedule"
}

This trajectory produces the optimization plot (like Karpathy's progress chart) — include it in progress reports. Humans love seeing the upward curve.

维护所有实验的可衡量结果记录：

json

{
  "experiment_id": "run_014",
  "hypothesis": "H3",
  "metric_value": 0.847,
  "baseline": 0.812,
  "delta": "+0.035",
  "wall_time_min": 23,
  "change_summary": "添加余弦退火预热调度"
}

该轨迹可生成优化曲线图（类似Karpathy的进度图表）——请将其包含在进度报告中。用户喜欢看到这种上升趋势。

The Outer Loop

外循环

Step back from individual experiments. Synthesize.

1. Review all results since last reflection
2. Cluster by type: what kinds of changes worked? Which didn't?
3. Ask WHY — identify the mechanism behind successes and failures
4. Update findings.md with current understanding
5. Search literature if results were surprising or assumptions need revisiting
6. Generate new hypotheses if warranted (invoke 21-research-ideation/ skills)
7. Decide direction (see criteria below)
8. Update research-state.yaml with new direction
9. Log the reflection in research-log.md
10. If there's something meaningful, generate a progress presentation

从单个实验中抽离出来，进行综合分析。

1.  回顾自上次反思以来的所有结果
2.  按类型聚类：哪些修改有效？哪些无效？
3.  思考原因——找出成功与失败背后的机制
4.  更新findings.md，记录当前认知
5.  若结果超出预期或假设需要验证，检索文献
6.  若有必要，生成新假设（调用21-research-ideation/技能）
7.  确定研究方向（参考以下标准）
8.  更新research-state.yaml，记录新方向
9.  在research-log.md中记录本次反思
10. 若有重要发现，生成进度演示文稿

Deciding Direction

确定研究方向

Don't just pick randomly — use these criteria:

DEEPEN — a supported result raises follow-up questions

Does the effect hold under different conditions? What's the mechanism?
Action: generate sub-hypotheses (H1.1, H1.2) → back to inner loop

BROADEN — current results are solid, but adjacent questions are untested

New questions emerged. The current contribution is clear but more is possible.
Action: generate new root hypotheses → back to inner loop

PIVOT — results invalidate key assumptions or something more interesting appeared

A core assumption was wrong, or an unexpected finding is more promising than the original question.
Action: return to literature with new questions → re-bootstrap

CONCLUDE — sufficient evidence for a contribution

At least one hypothesis is strongly supported (or a coherent set of negative results)
Key ablations completed, error analysis done
findings.md reads like a paper backbone — a human could write the abstract from it
No critical open questions that would change the story

Note: coherent negative results are a valid contribution. "X does NOT work because Y" is publishable if the reasoning is rigorous.

不要随机选择——请参考以下标准：

深化研究——已验证的结果引出后续问题

该结果在不同条件下是否依然成立？背后的机制是什么？
行动：生成子假设（如H1.1、H1.2）→ 返回内循环

拓展研究——当前结果可靠，但相邻问题尚未测试

新问题浮现。当前的研究贡献明确，但仍有拓展空间。
行动：生成新的核心假设 → 返回内循环

调整方向——结果推翻了核心假设，或出现了更有价值的新发现

核心假设不成立，或意外发现比原研究问题更有前景。
行动：带着新问题回归文献调研 → 重新启动研究

收尾研究——已有足够证据支持研究贡献

至少有一个假设得到有力支持（或一组连贯的负面结果）
已完成关键的控制变量实验和误差分析
findings.md的内容已具备论文雏形——用户可从中撰写摘要
不存在会改变研究结论的关键未解决问题

注意：连贯的负面结果也是有价值的研究贡献。若推理严谨，“X因Y而无效”的结论同样可发表。

findings.md Is Your Project Memory

findings.md是你的项目记忆

This file serves two purposes: it's the research narrative for humans AND your accumulated knowledge base as an agent. Read it at the start of every session, /loop tick, or heartbeat to remember what you've learned.

After every outer loop, update it to answer:

What do we know so far? (Current Understanding)
What patterns explain our results? (Patterns and Insights)
What specific things did we learn not to repeat? (Lessons and Constraints)
What remains open? (Open Questions)

The "Lessons and Constraints" section is especially important — it captures specific actionable learnings like "weight decay > 0.1 diverges at this scale" or "baseline only reproduces with batch_size=64." This prevents the agent from repeating failed approaches across sessions.

Quality test: After 30 inner loop experiments, a human should be able to read findings.md and write a paper abstract from it. If they can't, the outer loop isn't synthesizing — it's just logging.

该文件有两个作用：既是供用户查看的研究叙事，也是智能体的累积知识库。请在每次会话开始、

/loop

触发或心跳机制运行时阅读它，回顾已有的研究成果。

每次外循环后，更新文件以回答以下问题：

目前我们已掌握哪些知识？（当前认知）
哪些模式可以解释我们的结果？（模式与洞察）
我们学到了哪些不应重复的经验？（经验与约束）
还有哪些未解决的问题？（开放问题）

“经验与约束”部分尤为重要——它记录了具体的可操作经验，如“在此规模下，权重衰减>0.1会导致训练发散”或“仅当batch_size=64时，基准线才能复现”。这可以防止智能体在不同会话中重复失败的方法。

质量测试：经过30次内循环实验后，用户应能通过阅读findings.md撰写论文摘要。若无法做到，说明外循环未进行有效的综合分析——只是在记录日志。

Agent Continuity (MANDATORY — Set Up First)

智能体持续运行（强制要求 — 优先设置）

Before doing anything else, set up the wall-clock loop. This is what keeps the research running continuously. Without it, the agent stops after one cycle.

在进行任何其他工作前，设置时间循环。这是保持研究持续运行的关键。若未设置，智能体将在一个循环后停止。

Claude Code

First action — run this immediately:

/loop 20m Continue autoresearch. Read research-state.yaml and findings.md. Re-read the autoresearch SKILL.md occasionally to stay aligned. Step back and reflect holistically — is the research making real progress? Are you deepening understanding or just running experiments? If stalling, pivot or search literature for new ideas. Keep making research progress — never idle, never stop. Update findings.md, research-log.md, and research-state.yaml when there's new progress. Git commit periodically and clean up the repo if needed. Show the human your research progress with key plots and findings by preparing a report in to_human/ and opening the HTML/PDF. Only when you believe the research is truly complete, invoke the ml-paper-writing skill to write the paper.

This fires every 20 minutes regardless of what's happening. It's a rhythm that keeps you working — not a research phase boundary. If your previous work isn't done, just continue it.

首要操作 — 立即运行以下命令：

/loop 20m Continue autoresearch. Read research-state.yaml and findings.md. Re-read the autoresearch SKILL.md occasionally to stay aligned. Step back and reflect holistically — is the research making real progress? Are you deepening understanding or just running experiments? If stalling, pivot or search literature for new ideas. Keep making research progress — never idle, never stop. Update findings.md, research-log.md, and research-state.yaml when there's new progress. Git commit periodically and clean up the repo if needed. Show the human your research progress with key plots and findings by preparing a report in to_human/ and opening the HTML/PDF. Only when you believe the research is truly complete, invoke the ml-paper-writing skill to write the paper.

该命令每20分钟触发一次，无论当前正在进行什么工作。它是保持你持续工作的节奏，而非研究阶段的边界。若上一项工作尚未完成，只需继续即可。

OpenClaw

First action — set up a 20-minute cron job in the current session. You MUST do this before any research work.

Use the

cron.add

tool to create a recurring job bound to this chat session:

json

{
  "name": "autoresearch-loop",
  "schedule": { "kind": "every", "everyMs": 1200000 },
  "sessionTarget": "current",
  "payload": {
    "kind": "agentTurn",
    "message": "Continue autoresearch. Read research-state.yaml and findings.md. Re-read the autoresearch SKILL.md occasionally to stay aligned. Step back and reflect holistically — is the research making real progress? Are you deepening understanding or just running experiments? If stalling, pivot or search literature for new ideas. Keep making research progress — never idle, never stop. Update findings.md, research-log.md, and research-state.yaml when there's new progress. Git commit periodically and clean up the repo if needed. Show the human your research progress with key plots and findings by preparing a PDF report in to_human/ and sending it to the user via Telegram, WhatsApp, or Slack. When you get an exciting result or interesting plot, text it to the user right away — don't wait for a full report. Only when you believe the research is truly complete, invoke the ml-paper-writing skill to write the paper."
  }
}

Key details:

```
sessionTarget: "current"
```
binds the cron to this chat session so it maintains conversation context across ticks
```
everyMs: 1200000
```
= 20 minutes
After creation, verify with
```
cron.list
```
that the job exists and is enabled
If the cron fires while you're mid-experiment, just continue — the tick is a nudge, not a restart

首要操作 — 在当前会话中设置20分钟的定时任务。在进行任何研究工作前，必须完成此操作。

使用

cron.add

工具创建绑定到当前聊天会话的周期性任务：

json

{
  "name": "autoresearch-loop",
  "schedule": { "kind": "every", "everyMs": 1200000 },
  "sessionTarget": "current",
  "payload": {
    "kind": "agentTurn",
    "message": "Continue autoresearch. Read research-state.yaml and findings.md. Re-read the autoresearch SKILL.md occasionally to stay aligned. Step back and reflect holistically — is the research making real progress? Are you deepening understanding or just running experiments? If stalling, pivot or search literature for new ideas. Keep making research progress — never idle, never stop. Update findings.md, research-log.md, and research-state.yaml when there's new progress. Git commit periodically and clean up the repo if needed. Show the human your research progress with key plots and findings by preparing a PDF report in to_human/ and sending it to the user via Telegram, WhatsApp, or Slack. When you get an exciting result or interesting plot, text it to the user right away — don't wait for a full report. Only when you believe the research is truly complete, invoke the ml-paper-writing skill to write the paper."
  }
}

关键细节：

```
sessionTarget: "current"
```
将定时任务绑定到当前聊天会话，以便在每次触发时保持对话上下文
```
everyMs: 1200000
```
= 20分钟
创建后，使用
```
cron.list
```
验证任务是否存在并已启用
若定时任务触发时你正处于实验过程中，只需继续即可——触发只是提醒，而非重启

What the Loop Does

循环机制的作用

The

/loop

and cron job are purely wall-clock rhythm. They are completely separate from your research loops (inner/outer). On each tick:

Read
```
research-state.yaml
```
and
```
findings.md
```
— remember where you are
Check if anything is broken (failed experiments, stalled training, errors)
If on track → keep working on whatever you were doing
If stuck or something's wrong → step back, diagnose, fix, then continue
Never idle. Always be making progress.

/loop

和定时任务纯粹是时间节奏控制。它们与研究的内/外循环完全独立。每次触发时：

读取
```
research-state.yaml
```
和
```
findings.md
```
——回顾当前研究状态
检查是否存在问题（如实验失败、训练停滞、错误）
若进展顺利 → 继续当前工作
若陷入瓶颈或出现问题 → 退一步诊断、修复，然后继续
永不闲置。始终推进研究进展。

Progress Reporting

进度报告

When you have something meaningful to share, create a research presentation — not just a status dashboard, but a compelling story.

When to report (your judgment):

After an outer loop that found a significant pattern
When the optimization trajectory shows clear progress (include the plot!)
After a pivot in direction
Before requesting human input on a decision
When concluding

What to include (adapt to what's compelling):

The research question and why it matters
Key results with visualizations (plots, metric tables)
The optimization trajectory chart (metric over experiments)
What was tried and why (selective, not exhaustive)
Current understanding (the findings narrative)
What's planned next

For Claude Code: generate HTML and

open

it. If HTML fails to open or render, convert to PDF as fallback (use

weasyprint

playwright pdf

, or

wkhtmltopdf

). For OpenClaw: generate PDF directly.

See references/progress-reporting.md for template scaffolding and the optimization plot approach. Use the template as a starting point — be creative with what you show.

当有重要内容需要分享时，请创建研究演示文稿——不仅是状态仪表盘，而是有说服力的研究叙事。

报告时机（由你自主判断）：

外循环发现重要模式后
优化轨迹显示明确进展时（务必包含图表！）
研究方向调整后
请求用户决策前
研究收尾时

报告内容（根据实际情况调整）：

研究问题及其重要性
带可视化的关键结果（图表、指标表格）
优化轨迹图（指标随实验次数的变化）
已尝试的内容及原因（选择性展示，无需穷尽）
当前的研究认知（findings.md中的叙事）
下一步计划

对于Claude Code：生成HTML文件并使用

open

命令打开。若HTML文件无法打开或渲染，可转换为PDF作为备选（使用

weasyprint

、

playwright pdf

或

wkhtmltopdf

）。对于OpenClaw：直接生成PDF文件。

详见references/progress-reporting.md获取模板框架和优化图表的制作方法。请以模板为起点，根据实际情况创造性地展示内容。

Git Protocol

Git规范

Commit at natural research milestones:

When	Message Pattern
Workspace initialized	`research(init): {project} — {question}`
Experiment protocol locked	`research(protocol): {hypothesis}`
Significant results	`research(results): {hypothesis} — {outcome}`
Outer loop direction change	`research(reflect): {direction} — {reason}`
Paper draft complete	`research(paper): {title}`

Hard rule: Protocol commits MUST precede result commits. Never combine them. The git history is your lightweight pre-registration — it proves what you planned before you saw results. Don't commit after every experiment — commit when there's meaningful progress.

在研究的自然里程碑处提交代码：

时机	提交信息格式
工作区初始化完成	`research(init): {project} — {question}`
实验方案锁定	`research(protocol): {hypothesis}`
获得重要结果	`research(results): {hypothesis} — {outcome}`
外循环调整研究方向	`research(reflect): {direction} — {reason}`
论文草稿完成	`research(paper): {title}`

硬性规则：实验方案的提交必须早于结果的提交。切勿将两者合并提交。Git历史记录是你的轻量级预注册证明——它可以证明你在看到结果前就已制定了计划。无需在每次实验后都提交——仅在取得有意义的进展时提交。

Concluding: Paper Writing

收尾：论文撰写

When the outer loop decides to CONCLUDE:

Ensure findings.md has a clear, well-supported narrative
Study 2-3 top related papers to learn their format, style, and section structure
Invoke the
```
20-ml-paper-writing
```
skill — it has LaTeX templates for NeurIPS, ICML, ICLR, ACL, AAAI, COLM, and systems venues
Feed it the accumulated literature, experimental results, and findings
Follow its citation verification workflow — never hallucinate references
Generate a final comprehensive research presentation

Proceed autonomously through the writing process. If the ml-paper-writing skill suggests human collaboration points, adapt and keep going — produce the best draft you can. The human will review and provide feedback.

当外循环决定收尾研究时：

确保findings.md有清晰、论据充分的叙事内容
研究2-3篇顶级相关论文，学习它们的格式、风格和章节结构
调用
```
20-ml-paper-writing
```
技能——它包含适用于NeurIPS、ICML、ICLR、ACL、AAAI、COLM及系统领域会议的LaTeX模板
向其提供累积的文献、实验结果和findings.md内容
遵循其引用验证流程——切勿虚构参考文献
生成最终的综合研究演示文稿

自主完成论文撰写流程。若ml-paper-writing技能建议与用户协作，请调整后继续推进——尽可能生成最佳的论文草稿。用户会进行评审并提供反馈。

Research Discipline

研究准则

Principles to enforce continuously — not tied to any specific phase:

Lock before you run: Commit your experiment protocol to git before executing. This proves your plan existed before you saw results. Never combine protocol + results in one commit.
Confirmatory vs exploratory: Results matching your locked protocol are confirmatory. Everything else is exploratory — interesting but requiring more skepticism.
Negative results are progress: A refuted hypothesis tells you something. Log what it rules out and what it suggests. Don't treat it as failure.
Sanity check before analysis: Verify training converged, baselines reproduce, and data is correct before trusting your primary metric.
Return to literature when confused: Don't guess — search. If results surprise you or assumptions break, go find papers. Use Exa MCP for discovery, Semantic Scholar for specific ML/AI paper lookup, arXiv for preprints.
Never stop: Don't wait for human approval on routine decisions. If a skill or tool suggests collaboration, adapt and keep going. Find the best path forward autonomously. The human will see your progress reports and can redirect if needed.
Use whatever compute is available: Adapt to the user's environment — local GPU, cluster job submission, cloud instances, or just CPU. If no GPU is available, use CPU and adjust experiment scale accordingly. Don't block on compute availability.

请持续遵循以下原则——与具体研究阶段无关：

先锁定再运行：在执行实验前，将实验方案提交至git。这可以证明你在看到结果前就已制定了计划。切勿将方案与结果合并提交。
区分验证型与探索型结果：与锁定方案一致的结果为验证型，其他为探索型——探索型结果虽有趣，但需保持怀疑态度。
负面结果也是进展：被推翻的假设也能提供信息。记录该假设被排除的原因，以及它带来的新启示。不要将其视为失败。
分析前先做合理性检查：在信任主要指标前，验证训练是否收敛、基准线是否复现、数据是否正确。
困惑时回归文献：不要猜测——检索文献。若结果超出预期或假设不成立，请查找相关论文。使用Exa MCP进行发现式检索，Semantic Scholar查找特定机器学习/AI论文，arXiv查找预印本。
永不停止：不要等待用户批准常规决策。若技能或工具建议协作，请调整后继续推进。自主找到最佳前进路径。用户会查看你的进度报告，并在必要时进行指导。
利用可用的计算资源：适应用户的环境——本地GPU、集群任务提交、云实例或仅使用CPU。若没有GPU可用，使用CPU并相应调整实验规模。不要因计算资源不足而停滞。

Quality Standards

质量标准

Good agent behavior:

Hypotheses have mechanistic reasoning ("X because Y, predicting Z"), not just "try X"
findings.md builds a coherent narrative, not a flat list of results
Negative results are recorded with what they rule out
The agent updates its model when experiments contradict expectations
Progress reports tell a research story with compelling visualizations

Bad agent behavior:

Pure hyperparameter sweeps without interpretation
findings.md is just experiment logs copy-pasted
Agent never revisits its assumptions after failures
Optimizing metrics without understanding why changes work

良好的智能体行为：

假设具备机制性推理（如“X因Y而有效，预测Z”），而非仅“尝试X”
findings.md构建了连贯的叙事，而非简单的结果列表
负面结果被记录，并标注了被排除的内容
当实验与预期矛盾时，智能体更新自身认知
进度报告通过有吸引力的可视化展示研究叙事

不良的智能体行为：

仅进行超参数调优，不做任何解释
findings.md仅复制粘贴实验日志
失败后从不反思自身假设
仅优化指标，不理解修改有效的原因

When to Use vs Alternatives

使用场景与替代方案

Use autoresearch when:

You have a research question explorable through experiments
There's a measurable proxy metric for inner loop optimization
The real contribution requires synthesis beyond the metric
You want continuous autonomous research operation

Use individual domain skills instead when:

You have a specific one-off task (train a model, run eval, write a paper)
No iterative experimentation needed

当以下情况时使用autoresearch：

你有可通过实验探索的研究问题
存在可用于内循环优化的可衡量代理指标
真正的研究贡献需要超越指标的综合分析
你需要持续的自主研究运行

当以下情况时使用单个领域技能：

你有特定的一次性任务（如训练模型、运行评估、撰写论文）
无需迭代实验

Common Issues

常见问题

Inner loop stalls (no metric improvement) Run an outer loop. Is the metric the right one? Is the search space exhausted? Consider broadening or pivoting. Search literature for new approaches.

Stuck and not making progress Don't keep trying random changes. Step back: search literature for related work, invoke

21-research-ideation/

brainstorming skills, or run an outer loop reflection. Being stuck means you need new information or a new perspective, not more experiments.

Results contradict baseline expectations Investigate, don't ignore. Return to literature — your protocol might have an error, the published baseline may be wrong, or conditions differ. Update findings.md with what you learn.

Agent loses context between ticks Ensure research-state.yaml and findings.md are updated after every action. These files are your memory across sessions.

Can't find relevant papers Try multiple approaches in order: Exa MCP for broad search, Semantic Scholar for specific ML/AI paper lookup (

pip install semanticscholar

), arXiv for preprints (

pip install arxiv

). Check

20-ml-paper-writing

skill's

references/citation-workflow.md

for complete API code. Note: Google Scholar has no official API — use Semantic Scholar instead for programmatic search.

No GPU available Use CPU and scale experiments down. Many research tasks (analysis, interpretability, small model training) run fine on CPU. Adjust experiment design to fit available compute rather than blocking.

Experiments take longer than /loop interval Normal. On the next tick, check if it finished. If not, keep waiting or do something else useful (update notes, search papers). Adjust interval if needed.

Not sure when to conclude Three questions: Do you have a strongly supported finding? Can you explain WHY it works? Would findings.md make a convincing paper abstract? If yes to all: conclude.

内循环停滞（指标无提升） 运行外循环。指标是否合适？搜索空间是否已耗尽？考虑拓展或调整研究方向。检索文献寻找新方法。

陷入瓶颈，无进展 不要盲目尝试随机修改。退一步：检索相关研究文献，调用

21-research-ideation/

头脑风暴技能，或运行外循环进行反思。陷入瓶颈意味着你需要新信息或新视角，而非更多实验。

结果与基准线预期矛盾 进行调查，不要忽略。回归文献调研——你的实验方案可能存在错误，已发表的基准线可能有误，或实验条件不同。将你的发现更新至findings.md。

智能体在触发间隔间丢失上下文 确保每次操作后都更新research-state.yaml和findings.md。这些文件是跨会话的记忆载体。

无法找到相关论文 按顺序尝试多种方法：使用Exa MCP进行广泛搜索，使用Semantic Scholar查找特定机器学习/AI论文（

pip install semanticscholar

），使用arXiv查找预印本（

pip install arxiv

）。详见

20-ml-paper-writing

技能的

references/citation-workflow.md

获取完整的API代码。注意：Google Scholar无官方API——请使用Semantic Scholar进行程序化检索。

无GPU可用 使用CPU并缩小实验规模。许多研究任务（如分析、可解释性研究、小型模型训练）在CPU上即可运行。根据可用的计算资源调整实验设计，不要因资源不足而停滞。

实验耗时超过/loop间隔 这是正常现象。下次触发时，检查实验是否已完成。若未完成，继续等待，或同时进行其他有用的工作（如更新笔记、检索文献）。必要时调整间隔时间。

不确定何时收尾研究 思考三个问题：你是否有得到有力支持的发现？你能否解释其有效的原因？findings.md能否生成有说服力的论文摘要？若全部为是，则可收尾研究。

Advanced Topics

进阶主题

Detailed agent continuity: references/agent-continuity.md
Progress presentation templates: references/progress-reporting.md
Complete skill routing: references/skill-routing.md

详细的智能体持续运行指南：references/agent-continuity.md
进度演示文稿模板：references/progress-reporting.md
完整的技能路由指南：references/skill-routing.md