experiment-report-writer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Experiment Report Writer

实验报告撰写工具

Turn experiment evidence into a clear research report that a reader can evaluate without rerunning the experiment.

Use this skill to write a standalone document, a section for a paper or lab note, a mentor-facing update, or a presentation-ready experiment summary.

Pair this skill with

research-project-memory

when completed results should update project claims, evidence, risks, actions, figures, or worktree decisions.

将实验证据转化为清晰的研究报告，让读者无需重新运行实验即可对其进行评估。

使用此技能撰写独立文档、论文或实验笔记的章节、面向导师的进展汇报，或可用于演示的实验总结。

当需要用已完成的实验结果更新项目主张、证据、风险、行动、图表或工作树决策时，请将此技能与

research-project-memory

配合使用。

Skill Directory Layout

技能目录结构

text

<installed-skill-dir>/
├── SKILL.md
└── templates/
    └── experiment-report.md

text

<installed-skill-dir>/
├── SKILL.md
└── templates/
    └── experiment-report.md

Progressive Loading

渐进式加载

Use
```
templates/experiment-report.md
```
as the default Markdown skeleton when saving a report.
If the user only wants a draft in chat, follow the same section order without needing to read or copy the template verbatim.

保存报告时，将
```
templates/experiment-report.md
```
作为默认的Markdown框架。
如果用户仅需要聊天中的草稿，遵循相同的章节顺序，无需逐字读取或复制模板内容。

Core Principles

核心原则

Ground every claim in evidence: configs, commands, logs, metrics, tables, figures, commit hashes, or user-provided notes.
Separate observed results from interpretation. Do not present a hypothesis as a measured fact.
Make the report reproducible enough that another researcher can identify what was run.
Explain why the experiment matters before listing numbers.
Compare against the right reference point: baseline, previous run, ablation control, expected behavior, or published number.
Preserve uncertainty. If evidence is missing, mark it as missing and ask for the smallest useful clarification.
Write for the intended audience. A lab notebook can be dense; a mentor update should emphasize decisions, evidence, and next steps.

所有主张均基于证据：配置文件、命令、日志、指标、表格、图表、提交哈希或用户提供的笔记。
区分观测结果与解读内容。勿将假设作为实测事实呈现。
确保报告具备足够的可复现性，让其他研究人员能够明确实验的运行方式。
在列出数据前，先解释实验的重要性。
与正确的参考基准进行对比：基线模型、之前的实验运行、消融对照组、预期表现或已发表的数据。
保留不确定性。若证据缺失，明确标记并请求最必要的补充信息。
针对目标受众撰写内容。实验笔记可以较为详尽；导师汇报则应重点强调决策、证据和下一步计划。

Step 1 - Classify the Report

步骤1 - 分类报告类型

Identify the report mode:

```
single-experiment
```
: one run or one controlled comparison
```
ablation-report
```
: several variants testing one factor
```
batch-summary
```
: many related runs from a sweep or experiment batch
```
mentor-update
```
: concise progress report with decision-oriented discussion
```
paper-section
```
: polished text intended to become part of a paper

Also identify:

audience
output format: Markdown, LaTeX, slide outline, or chat draft
save path, if the user wants a file
expected length
whether figures, tables, configs, logs, or notebooks are available

If the user gives no format, default to Markdown. If they ask for a file and no path is given, use:

text

docs/reports/experiment_report_YYYY-MM-DD.md

确定报告模式：

```
single-experiment
```
：单次运行或一组受控对比实验
```
ablation-report
```
：针对单一变量的多个变体测试
```
batch-summary
```
：来自参数扫描或实验批次的多个相关运行
```
mentor-update
```
：简洁的进展汇报，侧重决策导向的讨论
```
paper-section
```
：用于论文的 polished 文本（保留polished）

同时确定：

受众
输出格式：Markdown、LaTeX、幻灯片大纲或聊天草稿
保存路径（若用户需要生成文件）
预期篇幅
是否有可用的图表、表格、配置文件、日志或笔记本

若用户未指定格式，默认使用Markdown。若用户要求生成文件但未提供路径，使用：

text

docs/reports/experiment_report_YYYY-MM-DD.md

Step 2 - Gather Evidence

步骤2 - 收集证据

Prefer primary evidence over memory.

Look for:

experiment commands or scripts
config files and parameter overrides
random seeds and number of runs
dataset name, split, preprocessing, and sample count
model, method variant, checkpoint, or algorithm version
hardware and runtime if relevant
metrics, logs, result tables, figures, and failure cases
git commit hash or code version, when available

Useful local checks include:

bash

git rev-parse --short HEAD
find . -maxdepth 3 -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.json" -o -name "*.csv" -o -name "*.md" \)
find . -maxdepth 4 -type f \( -name "*.png" -o -name "*.jpg" -o -name "*.pdf" -o -name "*.svg" \)

If the user only provides informal notes, use them but label missing reproducibility details explicitly.

优先使用原始证据而非记忆内容。

查找以下内容：

实验命令或脚本
配置文件及参数覆盖项
随机种子和运行次数
数据集名称、划分、预处理及样本数量
模型、方法变体、检查点或算法版本
硬件和运行时长（若相关）
指标、日志、结果表格、图表及失败案例
git提交哈希或代码版本（若可用）

实用的本地检查命令包括：

bash

git rev-parse --short HEAD
find . -maxdepth 3 -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.json" -o -name "*.csv" -o -name "*.md" \)
find . -maxdepth 4 -type f \( -name "*.png" -o -name "*.jpg" -o -name "*.pdf" -o -name "*.svg" \)

若用户仅提供非正式笔记，可使用这些内容，但需明确标记缺失的可复现性细节。

Step 3 - Extract the Experiment Story

步骤3 - 梳理实验脉络

Before drafting, organize the experiment into:

question: what was this experiment trying to learn?
motivation: why does the question matter?
hypothesis: what did we expect and why?
method: what changed compared with the baseline?
controls: what stayed fixed?
measurement: which metrics answer the question?
outcome: what happened?
interpretation: what does the outcome suggest?
decision: what should happen next?

For ablations or sweeps, make the independent variable explicit and keep the comparison fair.

撰写草稿前，将实验内容整理为以下模块：

问题：本实验旨在探究什么？
动机：该问题为何重要？
假设：我们的预期是什么，原因是什么？
方法：与基线相比有哪些变化？
控制变量：哪些因素保持不变？
测量指标：哪些指标能回答实验问题？
结果：实际发生了什么？
解读：该结果说明了什么？
决策：下一步应采取什么行动？

对于消融实验或参数扫描，明确自变量并确保对比公平。

Required Report Structure

必备报告结构

Use these sections unless the user requests a different format:

markdown

undefined

除非用户要求其他格式，否则使用以下章节：

markdown

undefined

[Experiment Report Title]

[实验报告标题]

Summary

摘要

1. Experiment Motivation

1. 实验动机

2. Experiment Setup

2. 实验设置

3. Core Algorithm or Method

3. 核心算法或方法

4. Metrics

4. 指标说明

5. Results

5. 实验结果

6. How to Read the Figures

6. 图表解读指南

7. Interpretation

7. 结果解读

8. Conclusion and Discussion

8. 结论与讨论

9. Limitations and Caveats

9. 局限性与注意事项

10. Next Steps

10. 下一步计划

Reproducibility Notes

可复现性说明


If there is no core algorithm, write "Not applicable" and briefly explain whether the experiment changes data, hyperparameters, evaluation, infrastructure, or analysis instead.

If there are no figures, omit "How to Read the Figures" or replace it with "How to Read the Tables" when tables are the main evidence.


若不存在核心算法，填写“不适用”并简要说明实验是更改了数据、超参数、评估方式、基础设施还是分析方法。

若没有图表，可省略“图表解读指南”；若表格是主要证据，可替换为“表格解读指南”。

Section Guidance

章节撰写指南

Summary

摘要

Write 3-6 bullets covering:

experiment question
most important setup details
headline result
interpretation
recommended next step

用3-6条要点涵盖：

实验问题
最重要的设置细节
核心结果
解读内容
建议的下一步计划

1. Experiment Motivation

1. 实验动机

Explain the research or engineering reason for the experiment:

problem being tested
expected mechanism
why the result would affect the project
what decision the experiment supports

解释开展实验的研究或工程原因：

测试的问题
预期机制
结果对项目的影响
实验支持的决策

2. Experiment Setup

2. 实验设置

Include enough detail to reproduce or audit the run:

dataset, split, preprocessing
baseline and compared variants
key hyperparameters and parameter changes
training/evaluation command, config file, or run ID
random seed and number of trials
hardware, runtime, and code version when relevant

Use a table for parameters when there are more than five important settings.

提供足够的细节以支持复现或审计：

数据集、划分、预处理
基线模型及对比变体
关键超参数及参数变更
训练/评估命令、配置文件或运行ID
随机种子和试验次数
硬件、运行时长及代码版本（若相关）

当重要设置超过5项时，使用表格展示参数。

3. Core Algorithm or Method

3. 核心算法或方法

Describe the algorithm only at the level needed to understand the experiment:

what input it consumes
what output it produces
key steps or objective
what is new or different from the baseline
complexity, assumptions, or implementation details that affect interpretation

Do not over-explain standard background unless the audience needs it.

仅在理解实验所需的层面描述算法：

输入内容
输出内容
关键步骤或目标
与基线相比的创新点或差异
影响解读的复杂度、假设或实现细节

除非受众需要，否则无需过度解释标准背景知识。

4. Metrics

4. 指标说明

For each metric, explain:

definition
direction: higher is better, lower is better, or target range
unit
aggregation: mean, median, best checkpoint, final epoch, confidence interval, or standard deviation
why it is relevant to the experiment question

Flag metrics that can conflict with each other.

针对每个指标，解释：

定义
优劣方向：越高越好、越低越好或目标范围
单位
聚合方式：均值、中位数、最佳检查点、最终轮次、置信区间或标准差
与实验问题的相关性

标记可能相互冲突的指标。

5. Results

5. 实验结果

Present results before interpretation.

Use:

tables for exact numeric comparisons
figures for trends, distributions, or qualitative examples
short text for the main deltas

Always identify the baseline and report absolute values plus meaningful deltas when possible.

先呈现结果，再进行解读。

使用：

表格展示精确的数值对比
图表展示趋势、分布或定性示例
简短文字描述主要差异

务必明确基线模型，并尽可能同时报告绝对值和有意义的差值。

6. How to Read the Figures

6. 图表解读指南

For every figure, explain:

what the figure is meant to show
x-axis: variable, unit, and scale
y-axis: metric, unit, and direction
legend: method names, groups, colors, markers, or line styles
error bars or shaded regions, if present
whether points are individual runs, averages, checkpoints, epochs, or samples
the main visual pattern the reader should notice

If an axis is log-scaled, normalized, clipped, or unitless, say so explicitly.

针对每个图表，解释：

图表的展示目的
X轴：变量、单位及刻度
Y轴：指标、单位及优劣方向
图例：方法名称、分组、颜色、标记或线条样式
误差棒或阴影区域（若存在）
数据点代表单次运行、平均值、检查点、轮次还是样本
读者应关注的主要视觉模式

若轴采用对数刻度、归一化、截断或无单位，需明确说明。

7. Interpretation

7. 结果解读

Connect the observed results back to the motivation:

whether the hypothesis was supported
what changed relative to the baseline
likely explanation
alternative explanations
surprising or negative results
whether the evidence is strong enough to act on

Use cautious wording when there is only one seed, weak statistical evidence, or missing controls.

将观测结果与实验动机关联：

假设是否得到验证
与基线相比的变化
可能的解释
其他备选解释
意外或负面结果
证据是否足够支持行动

当仅使用一个随机种子、统计证据薄弱或缺失控制变量时，使用谨慎的措辞。

8. Conclusion and Discussion

8. 结论与讨论

State the practical conclusion:

what we learned
what decision this supports
whether to keep, reject, or further test the method
how the result affects the broader project

阐述实际结论：

我们学到了什么
该结果支持的决策
是否保留、拒绝或进一步测试该方法
结果对整个项目的影响

9. Limitations and Caveats

9. 局限性与注意事项

Include risks that could change the conclusion:

small number of seeds
narrow dataset or subset
missing baseline
unstable training
possible implementation bug
metric mismatch
data leakage or evaluation contamination risk
hardware/runtime constraints

列出可能改变结论的风险：

随机种子数量过少
数据集范围狭窄或仅使用子集
缺失基线模型
训练不稳定
可能存在实现漏洞
指标不匹配
数据泄露或评估污染风险
硬件/运行时长限制

10. Next Steps

10. 下一步计划

Recommend concrete follow-ups:

one immediate verification step
one high-value extension
one cleanup or documentation task when needed

Tie each next step to the uncertainty it resolves.

推荐具体的后续行动：

一项即时验证步骤
一项高价值扩展实验
必要时的清理或文档任务

将每项下一步计划与它要解决的不确定性关联起来。

Project Memory Writeback

项目内存回写

If the project uses

research-project-memory

, write back the result after the report is drafted:

```
memory/evidence-board.md
```
: completed
```
EVD-###
```
summary, source paths, linked claim IDs, limitations, and certainty
```
memory/claim-board.md
```
: mark claims as supported, weakened, revised, unsupported, or cut based on the observed result
```
memory/risk-board.md
```
: close mitigated risks or add new risks exposed by the result
```
memory/action-board.md
```
: next steps from the report, including rerun, write, revise-method, park, or kill decisions
```
memory/current-status.md
```
: latest reliable experiment state and next session entry point
worktree
```
.agent/worktree-status.md
```
: latest result and exit condition if the experiment belongs to a worktree

Do not write an interpretation as a measured fact. Use

observed

for metrics from logs/tables and

inferred

for explanations.

若项目使用

research-project-memory

，在报告撰写完成后回写结果：

```
memory/evidence-board.md
```
：完整的
```
EVD-###
```
摘要、源路径、关联主张ID、局限性及确定性
```
memory/claim-board.md
```
：根据观测结果将标记主张为已支持、已削弱、已修订、不支持或取消
```
memory/risk-board.md
```
：关闭已缓解的风险，或添加结果暴露的新风险
```
memory/action-board.md
```
：报告中的下一步计划，包括重新运行、撰写、修订方法、搁置或终止决策
```
memory/current-status.md
```
：最新的可靠实验状态及下一次会话的切入点
工作树
```
.agent/worktree-status.md
```
：最新结果及实验的退出条件（若实验属于工作树）

勿将解读内容作为实测事实写入。对来自日志/表格的指标使用

observed

（观测到），对解释内容使用

inferred

（推断）。

Output Quality Checklist

输出质量检查清单

Before finalizing, check that:

the report states the experiment question and decision context
all key parameters and baselines are named
metrics include direction and units
results are separated from interpretation
every figure/table has reading guidance
missing evidence is labeled instead of invented
conclusions do not overclaim beyond the data
next steps are actionable
project memory is updated when present and relevant

最终定稿前，检查以下内容：

报告明确了实验问题和决策背景
所有关键参数和基线模型均已命名
指标包含优劣方向和单位
结果与解读内容分离
每个图表/表格都有解读指南
缺失的证据已标记，而非编造
结论未超出数据支持的范围
下一步计划具备可操作性
当项目内存存在且相关时已完成更新