create-runbook

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Runbook Creation Wizard

Runbook创建向导

You are guiding a user through creating their first runbook. A runbook is a structured markdown document that tells a coding agent (Claude Code, Cursor, Codex, etc.) how to accomplish a complex, multi-step task end-to-end — with built-in evaluation loops, iteration, and quality gates.

Follow these steps IN ORDER. Be friendly and concise. At each decision point, use AskUserQuestion to let the user choose.

你将引导用户创建他们的第一个Runbook。Runbook是一种结构化Markdown文档，用于指导编码Agent（Claude Code、Cursor、Codex等）端到端完成复杂的多步骤任务——内置评估循环、迭代机制和质量门。

请严格按顺序执行以下步骤，保持友好简洁的语气。在每个决策点，使用

AskUserQuestion

让用户进行选择。

Cross-Agent Compatibility

跨Agent兼容性

This skill uses

AskUserQuestion

for interactive choices. If you are running in an environment where

AskUserQuestion

is not available (e.g., Codex CLI, Gemini CLI, Cursor), replace each AskUserQuestion call with a direct question to the user in your text output. Ask the user to reply with their choice. The wizard flow is the same — only the interaction mechanism differs.

For non-interactive / batch execution (e.g., Codex with

--quiet

), the user should pass the required context as the skill argument:

create-runbook "NL-to-SQL regression evaluator, programmatic evaluation, save to ./RUNBOOK.md"

Parse the argument to extract: task description, evaluation pattern (programmatic/rubric), and file path. Skip the AskUserQuestion steps and proceed directly to scaffolding.

本技能使用

AskUserQuestion

实现交互式选择。若运行环境不支持

AskUserQuestion

（如Codex CLI、Gemini CLI、Cursor），请将每个

AskUserQuestion

调用替换为直接向用户提问的文本内容，让用户回复其选择。向导流程保持不变——仅交互方式不同。

对于非交互式/批量执行场景（如带

--quiet

参数的Codex），用户应将所需上下文作为技能参数传入：

create-runbook "NL-to-SQL回归评估器，程序化评估，保存至./RUNBOOK.md"

解析参数以提取：任务描述、评估模式（程序化/评估准则）和文件路径。跳过

AskUserQuestion

步骤，直接进入搭建环节。

Step 1: Orientation

步骤1：环境校验

First, check that the user has Jetty set up:

bash

test -f ~/.config/jetty/token && echo "JETTY_OK" || echo "NO_TOKEN"

NO_TOKEN

, tell the user:

"You'll need a Jetty account first. Run
/jetty-setup
to get started, then come back here."

End the skill.

JETTY_OK

, briefly explain what they're about to build:

What's a runbook?

Skill Workflow Runbook
Format Markdown (SKILL.md) JSON (step configs) Markdown (RUNBOOK.md)
Executed by Coding agent Jetty engine Coding agent, calling workflows/APIs
Complexity Single tool or short procedure Fixed pipeline Multi-phase process with judgment
Iteration None — one-shot None — runs to completion Built-in: evaluate → refine → re-evaluate

A skill says "here's how to call the Jetty API." A runbook says "here's how to pull data, process it, evaluate the results, iterate until they're good enough, and produce a report — and here's how to know when you're done."

Let's build one.

	Skill	Workflow	Runbook
Format	Markdown (SKILL.md)	JSON (step configs)	Markdown (RUNBOOK.md)
Executed by	Coding agent	Jetty engine	Coding agent, calling workflows/APIs
Complexity	Single tool or short procedure	Fixed pipeline	Multi-phase process with judgment
Iteration	None — one-shot	None — runs to completion	Built-in: evaluate → refine → re-evaluate

首先检查用户是否已配置Jetty：

bash

test -f ~/.config/jetty/token && echo "JETTY_OK" || echo "NO_TOKEN"

若输出为

NO_TOKEN

，告知用户：

"你需要先拥有Jetty账号。运行
/jetty-setup
开始配置，完成后再返回此处。"

终止当前技能流程。

若输出为

JETTY_OK

，简要解释即将构建的内容：

什么是Runbook？

Skill Workflow Runbook
格式 Markdown（SKILL.md） JSON（步骤配置） Markdown（RUNBOOK.md）
执行主体 编码Agent Jetty引擎编码Agent（可调用工作流/API）
复杂度 单一工具或短流程固定流水线包含判断逻辑的多阶段流程
迭代机制 无——单次执行无——运行至完成内置：评估→优化→重新评估

Skill的作用是*"告知如何调用Jetty API"。 Runbook的作用是"告知如何拉取数据、处理数据、评估结果、迭代至达标并生成报告，以及如何判断任务完成"*。

现在开始构建吧。

	Skill	Workflow	Runbook
格式	Markdown（SKILL.md）	JSON（步骤配置）	Markdown（RUNBOOK.md）
执行主体	编码Agent	Jetty引擎	编码Agent（可调用工作流/API）
复杂度	单一工具或短流程	固定流水线	包含判断逻辑的多阶段流程
迭代机制	无——单次执行	无——运行至完成	内置：评估→优化→重新评估

Step 2: Gather Context

步骤2：收集上下文信息

2a: Task Description

2a：任务描述

Use AskUserQuestion:

Header: "Task"
Question: "What task do you want to automate? Describe it in a sentence or two — what goes in, what processing happens, and what comes out."
Options:
- "I'll describe it" / "Let me type a description" (user types in the text field)
- "Show me examples first" / "Show example runbook tasks before I decide"

If "Show me examples first", display these real-world examples:

Example runbook tasks:

NL-to-SQL Regression — Pull failed queries from Langfuse, replay them against the NL-to-SQL API, execute on Snowflake, evaluate pass/fail, produce a regression report

PDF-to-Metadata Conversion — Extract metadata from academic PDFs, generate Croissant JSON-LD, validate against the schema, iterate on errors

Branded Social Graphics — Parse a text script, generate an AI image via Jetty workflow, compose HTML with text overlays, judge against a brand rubric, iterate

Clinical Training Content — Parse competency documents, generate training scenarios with rubric-scored quality, produce learning plans

Data Extraction Pipeline — Extract structured data from documents into multiple formats, validate schema compliance, produce quality report

Then re-ask the question (same AskUserQuestion, minus the "Show me examples" option).

Save the user's task description for use in all subsequent steps.

使用

AskUserQuestion

：

标题："任务"
问题："你想要自动化什么任务？用一两句话描述——输入内容、处理过程和输出结果分别是什么。"
选项：
- "我来描述" / "让我输入任务描述"（用户在文本框中输入）
- "先看示例" / "先展示Runbook任务示例再做决定"

若用户选择"先看示例"，展示以下真实场景示例：

Runbook任务示例：

NL-to-SQL回归测试 — 从Langfuse拉取失败的查询语句，在NL-to-SQL API上重放，在Snowflake上执行，评估通过/失败情况，生成回归测试报告

PDF转元数据转换 — 从学术PDF中提取元数据，生成Croissant JSON-LD格式数据，根据Schema验证，针对错误进行迭代优化

品牌化社交图形生成 — 解析文本脚本，通过Jetty工作流生成AI图像，合成带文本叠加层的HTML，根据品牌准则判断质量，迭代优化

临床培训内容生成 — 解析能力文档，生成带准则评分的培训场景，制作学习计划

数据提取流水线 — 从文档中提取结构化数据并转换为多种格式，验证Schema合规性，生成质量报告

然后重新发起提问（相同问题，移除"先看示例"选项）。

保存用户提供的任务描述，供后续步骤使用。

2b: Evaluation Pattern

2b：评估模式

Use AskUserQuestion:

Header: "Evaluation"
Question: "How will you know when the output is good enough?"
Options:
- "Programmatic checks" / "I can validate with code, a schema, tests, or an API (objective pass/fail)"
- "Quality rubric" / "I need to score against multiple criteria (subjective quality on a 1-5 scale)"
- "Help me decide" / "Not sure which fits my task"

If "Help me decide", explain:

Programmatic is right when:

Your output is structured data (JSON, CSV, SQL)

You can validate with a schema, test suite, or API call

Pass/fail is objective — it either works or it doesn't

Examples: schema validation, SQL execution, test suites, API response checks

Rubric is right when:

Your output is creative or complex (text, images, reports, designs)

Quality is subjective across multiple dimensions

You need a numeric score to track improvement

Examples: content quality, brand compliance, UX evaluation, report comprehensiveness

Then re-ask (same question, minus "Help me decide").

Save the chosen evaluation pattern:

programmatic

rubric

使用

AskUserQuestion

：

标题："评估"
问题："如何判断输出结果是否达标？"
选项：
- "程序化校验" / "我可以通过代码、Schema、测试或API验证（客观通过/失败）"
- "评估准则" / "我需要根据多个标准评分（主观质量，1-5分制）"
- "帮我选择" / "不确定哪种模式适合我的任务"

若用户选择"帮我选择"，解释如下：

程序化校验适用于：

输出为结构化数据（JSON、CSV、SQL）

可通过Schema、测试套件或API调用验证

通过/失败结果客观——要么有效，要么无效

示例：Schema验证、SQL执行、测试套件、API响应校验

评估准则适用于：

输出为创意或复杂内容（文本、图像、报告、设计）

质量需从多维度进行主观判断

需要数值分数跟踪改进情况

示例：内容质量、品牌合规性、UX评估、报告全面性

然后重新发起提问（相同问题，移除"帮我选择"选项）。

保存用户选择的评估模式：

programmatic

或

rubric

。

2c: File Location

2c：文件位置

Use AskUserQuestion:

Header: "Location"
Question: "Where should I create the RUNBOOK.md file?"
Options:
- "Here" / "Create ./RUNBOOK.md in the current directory"
- "Custom path" / "Let me specify where to put it" (user types a path)

Save the target file path.

使用

AskUserQuestion

：

标题："位置"
问题："RUNBOOK.md文件应创建在何处？"
选项：
- "此处" / "在当前目录创建./RUNBOOK.md"
- "自定义路径" / "让我指定文件路径"（用户输入路径）

保存目标文件路径。

2d: Agent Runtime & Snapshot

2d：Agent运行时与沙箱快照

Use AskUserQuestion:

Header: "Agent Runtime"
Question: "Which agent will run this runbook on Jetty?"
Options:
- "Claude Code (Anthropic)" / "Uses claude-sonnet-4-6 — strong at reasoning and tool use. Requires an Anthropic API key."
- "Codex (OpenAI)" / "Uses gpt-5.4 — strong at code generation. Requires an OpenAI API key."
- "Gemini CLI (Google)" / "Uses gemini-3.1-pro-preview — free tier available. Requires a Google AI API key."

Save the agent and model choice. The mapping is:

Claude Code → agent:
```
claude-code
```
, model:
```
claude-sonnet-4-6
```
Codex → agent:
```
codex
```
, model:
```
gpt-5.4
```
Gemini CLI → agent:
```
gemini-cli
```
, model:
```
gemini-3.1-pro-preview
```

Then ask about the sandbox:

Use AskUserQuestion:

Header: "Sandbox"
Question: "Will your runbook need a web browser (Playwright)?\nExamples: taking screenshots, web scraping, OAuth flows, testing web UIs"
Options:
- "Yes, I need a browser" / "Use prism-playwright snapshot (Python 3.12, uv, Playwright + Chromium pre-installed)"
- "No browser needed" / "Use python312-uv snapshot (lighter, faster startup)"

Save the snapshot choice. These values will be written into the runbook frontmatter in Step 3.

使用

AskUserQuestion

：

标题："Agent运行时"
问题："将在Jetty上运行此Runbook的Agent是哪一个？"
选项：
- "Claude Code（Anthropic）" / "使用claude-sonnet-4-6——擅长推理和工具调用。需要Anthropic API密钥。"
- "Codex（OpenAI）" / "使用gpt-5.4——擅长代码生成。需要OpenAI API密钥。"
- "Gemini CLI（Google）" / "使用gemini-3.1-pro-preview——提供免费层级。需要Google AI API密钥。"

保存用户选择的Agent和模型。对应关系如下：

Claude Code → agent:
```
claude-code
```
, model:
```
claude-sonnet-4-6
```
Codex → agent:
```
codex
```
, model:
```
gpt-5.4
```
Gemini CLI → agent:
```
gemini-cli
```
, model:
```
gemini-3.1-pro-preview
```

然后询问沙箱相关问题：

使用

AskUserQuestion

：

标题："沙箱"
问题："你的Runbook是否需要Web浏览器（Playwright）？\n示例：截图、网页爬取、OAuth流程、Web UI测试"
选项：
- "是，我需要浏览器" / "使用prism-playwright快照（预装Python 3.12、uv、Playwright + Chromium）"
- "不需要浏览器" / "使用python312-uv快照（更轻量，启动更快）"

保存用户选择的沙箱快照。这些值将在步骤3中写入Runbook的前置配置字段。

Step 3: Scaffold the Runbook

步骤3：搭建Runbook框架

Read the appropriate starter template based on the evaluation pattern chosen in Step 2b:

Programmatic: Read
```
templates/programmatic.md
```
from the skill's directory
Rubric: Read
```
templates/rubric.md
```
from the skill's directory

To find the templates, locate this skill's directory:

bash

find ~/.claude -path "*/create-runbook/templates/programmatic.md" 2>/dev/null | head -1

If not found there, also check the working directory:

bash

find . -path "*/create-runbook/templates/programmatic.md" 2>/dev/null | head -1

Read the template using the Read tool.

Now customize the template using the task description from Step 2a:

Title: Replace
```
{Task Name}
```
with a concise name derived from the task description
Objective: Write a 2-5 sentence objective based on what the user described — input, processing, output
Output manifest: Propose specific output files based on the task (replace
```
{primary_output}
```
with a real filename like
```
results.csv
```
,
```
output.json
```
,
```
report.html
```
, etc.)
Parameters: Propose parameters based on inputs mentioned in the task description. Always keep
```
{{results_dir}}
```
.
Agent/model/snapshot: Write the choices from Step 2d into the frontmatter fields
Steps 2-3: Rename and briefly describe the processing steps based on the task

Leave

{TODO: ...}

markers,

{How to fix it}

, and similar placeholders in sections that require detailed domain input from the user (evaluation criteria, common fixes, tips, dependencies).

Write the customized runbook to the path chosen in Step 2c using the Write tool.

Tell the user:

"I've created a runbook scaffold at
{path}
. It has the full structure with your task details filled in and placeholder markers where I need your input. Let's walk through each section."

根据步骤2b中选择的评估模式，读取对应的起始模板：

程序化校验：从技能目录读取
```
templates/programmatic.md
```
评估准则：从技能目录读取
```
templates/rubric.md
```

查找模板文件的位置：

bash

find ~/.claude -path "*/create-runbook/templates/programmatic.md" 2>/dev/null | head -1

若未找到，检查当前工作目录：

bash

find . -path "*/create-runbook/templates/programmatic.md" 2>/dev/null | head -1

使用Read工具读取模板文件。

现在使用步骤2a中的任务描述自定义模板：

标题：将
```
{Task Name}
```
替换为从任务描述中提炼的简洁名称
目标：根据用户描述撰写2-5句话的目标——包含输入、处理过程和输出
输出清单：根据任务提出具体的输出文件（将
```
{primary_output}
```
替换为实际文件名，如
```
results.csv
```
、
```
output.json
```
、
```
report.html
```
等）
参数：根据任务描述中提到的输入提出参数建议。需保留
```
{{results_dir}}
```
参数。
Agent/模型/快照：将步骤2d中的选择写入前置配置字段
步骤2-3：根据任务重命名并简要描述处理步骤

对于需要用户提供领域细节的部分（评估标准、常见修复方案、提示、依赖项），保留

{TODO: ...}

标记、

{How to fix it}

等占位符。

使用Write工具将自定义后的Runbook写入步骤2c中选择的路径。

告知用户：

"我已在
{path}
创建了Runbook框架。框架包含完整结构，已填充你的任务详情，并在需要你输入信息的位置添加了占位符标记。接下来我们逐一查看每个章节。"

Step 4: Customize Sections

步骤4：自定义章节内容

Walk through each section that needs user input. For each, show the user what's currently in the runbook and ask for their refinement. Use the Edit tool to apply changes.

逐一查看需要用户输入的章节。对于每个章节，向用户展示Runbook中当前的内容，并请求他们进行优化。使用Edit工具应用修改。

4a: Review Objective

4a：审核目标

Show the user the Objective section you drafted. Use AskUserQuestion:

Header: "Objective"
Question: "Here's the objective I drafted:\n\n{show the objective text}\n\nDoes this capture your task accurately?"
Options:
- "Looks good" / "Move on to the next section"
- "Needs changes" / "Let me refine it" (user types corrections)

If they want changes, apply via Edit and move on.

向用户展示你撰写的目标章节。使用

AskUserQuestion

：

标题："目标"
问题："以下是我撰写的目标：\n\n{展示目标文本}\n\n是否准确描述了你的任务？"
选项：
- "没问题" / "进入下一章节"
- "需要修改" / "让我优化目标内容"（用户输入修改内容）

若用户需要修改，使用Edit工具应用更改后继续。

4b: Output Files

4b：输出文件

Show the proposed output manifest. Use AskUserQuestion:

Header: "Output Files"
Question: "These are the files the runbook will produce:\n\n{list the files}\n\nDoes this look right?"
Options:
- "Looks good" / "This manifest is correct"
- "Add a file" / "I need an additional output file"
- "Change a file" / "One of these needs to be different"

Apply changes via Edit. Ensure

validation_report.json

and

summary.md

always remain in the manifest.

展示提出的输出清单。使用

AskUserQuestion

：

标题："输出文件"
问题："以下是Runbook将生成的文件：\n\n{列出文件}\n\n是否符合预期？"
选项：
- "没问题" / "此输出清单正确"
- "添加文件" / "我需要额外的输出文件"
- "修改文件" / "其中某个文件需要调整"

使用Edit工具应用更改。确保

validation_report.json

和

summary.md

始终保留在输出清单中。

4c: Parameters

4c：参数配置

Show proposed parameters. Use AskUserQuestion:

Header: "Parameters"
Question: "These are the configurable inputs:\n\n{list parameters}\n\nAnything to add or change?"
Options:
- "Looks good" / "These parameters are sufficient"
- "Add more" / "I need additional parameters" (user describes them)

Apply changes via Edit.

展示提出的参数。使用

AskUserQuestion

：

标题："参数"
问题："以下是可配置的输入参数：\n\n{列出参数}\n\n是否需要添加或修改？"
选项：
- "没问题" / "这些参数足够"
- "添加更多" / "我需要额外的参数"（用户描述参数）

使用Edit工具应用更改。

4d: Dependencies

4d：依赖项

Use AskUserQuestion:

Header: "Dependencies"
Question: "What does your runbook need beyond the base environment?"
Options:
- "Jetty workflows" / "I'll call Jetty workflows as sub-steps"
- "External APIs" / "I call non-Jetty APIs (REST, GraphQL, etc.)"
- "Python/Node packages" / "I need specific libraries installed"
- "None" / "No special dependencies — just standard tools"

For each selected category, ask a follow-up for specifics (workflow names, API URLs, package names). Populate the Dependencies table and the Step 1 setup script via Edit.

使用

AskUserQuestion

：

标题："依赖项"
问题："除基础环境外，你的Runbook还需要哪些依赖？"
选项：
- "Jetty工作流" / "我将调用Jetty工作流作为子步骤"
- "外部API" / "我将调用非Jetty API（REST、GraphQL等）"
- "Python/Node包" / "我需要安装特定库"
- "无" / "无特殊依赖——仅需标准工具"

对于每个选中的类别，进一步询问具体信息（工作流名称、API地址、包名）。使用Edit工具填充依赖项表格和步骤1中的设置脚本。

4e: Secrets (Optional)

4e：密钥（可选）

Use AskUserQuestion:

Header: "Secrets"
Question: "Does this runbook need any API keys, tokens, or other credentials?"
Options:
- "Yes" / "I need to declare secrets for API keys or credentials"
- "No" / "No sensitive parameters needed"

If yes, for each secret collect via AskUserQuestion:

Logical name (e.g.,
```
OPENAI_API_KEY
```
)
Collection environment variable name (usually same as logical name)
Description
Required or optional

Populate the

secrets

block in frontmatter with the collected values. For example:

yaml

secrets:
  OPENAI_API_KEY:
    env: OPENAI_API_KEY
    description: "OpenAI API key for embeddings"
    required: true

Also add a verification block in Step 1 (Environment Setup) that checks each required secret is available as an environment variable.

使用

AskUserQuestion

：

标题："密钥"
问题："你的Runbook是否需要API密钥、令牌或其他凭据？"
选项：
- "是" / "我需要声明API密钥或凭据等密钥"
- "否" / "无需敏感参数"

若用户选择"是"，通过

AskUserQuestion

收集每个密钥的以下信息：

逻辑名称（如
```
OPENAI_API_KEY
```
）
对应的环境变量名称（通常与逻辑名称相同）
描述
是否为必填项

将收集到的值填充到前置配置的

secrets

块中。例如：

yaml

secrets:
  OPENAI_API_KEY:
    env: OPENAI_API_KEY
    description: "用于嵌入的OpenAI API密钥"
    required: true

同时在步骤1（环境设置）中添加验证块，检查每个必填密钥是否已作为环境变量存在。

4f: Processing Steps

4f：处理步骤

Based on the task description, propose a sequence of processing steps. Show the user your proposed outline. Use AskUserQuestion:

Header: "Processing Steps"
Question: "Here's the step sequence I'm proposing:\n\n{numbered list of steps}\n\nWant to adjust?"
Options:
- "Looks good" / "This sequence works"
- "Add a step" / "I need an additional step"
- "Change order" / "The steps need reordering"
- "Remove a step" / "One of these isn't needed"

Apply changes via Edit. For each confirmed step, write a skeleton with:

Step name as header
2-3 sentence description of what to do

Placeholder for API calls or code snippets:

{TODO: add API call examples and expected response format}

Placeholder for error handling:

{TODO: add error handling for common failures}

根据任务描述，提出处理步骤的序列。向用户展示你提出的步骤大纲。使用

AskUserQuestion

：

标题："处理步骤"
问题："以下是我提出的步骤序列：\n\n{编号列出步骤}\n\n是否需要调整？"
选项：
- "没问题" / "此序列可行"
- "添加步骤" / "我需要额外的步骤"
- "调整顺序" / "步骤顺序需要重新排列"
- "删除步骤" / "其中某个步骤不需要"

使用Edit工具应用更改。对于每个确认的步骤，编写以下框架内容：

步骤名称（作为标题）
2-3句话描述该步骤的操作

API调用或代码片段占位符：

{TODO: add API call examples and expected response format}

错误处理占位符：

{TODO: add error handling for common failures}

4g: Evaluation Criteria

4g：评估标准

This is the most important section. Branch based on the evaluation pattern:

For programmatic:

Use AskUserQuestion:

Header: "Pass/Fail Criteria"
Question: "Define what PASS, PARTIAL, and FAIL mean for your outputs. What makes an output correct? What makes it partially correct? What's a failure?"
Options:
- "I'll define them" / "Let me describe each status" (user types)
- "Use defaults" / "Keep the template defaults and I'll refine later"

If they define criteria, update the status table via Edit.

For rubric:

Use AskUserQuestion:

Header: "Rubric Criteria"
Question: "What criteria matter for your output quality? Name 3-7 dimensions you'd score on a 1-5 scale.\n\nExamples: accuracy, completeness, clarity, brand compliance, technical correctness, creativity, formatting"
Options:
- "I'll list them" / "Let me name my criteria" (user types)
- "Use 5 defaults" / "Start with generic criteria and I'll customize later"

If they provide criteria, build the rubric table with rows for each. For each criterion, ask (in a single AskUserQuestion):

Header: "Rubric Details"
Question: "For each criterion, briefly describe what 5 (excellent) and 1 (poor) look like. Or just list the criteria names and I'll draft reasonable descriptions.\n\n{list their criteria}"
Options:
- "I'll describe them" / "Let me define the scale for each" (user types)
- "You draft them" / "Write reasonable descriptions and I'll review"

Update the rubric table via Edit.

这是最重要的章节。根据评估模式分支处理：

对于程序化校验：

使用

AskUserQuestion

：

标题："通过/失败标准"
问题："定义输出结果的通过、部分通过和失败状态。什么是正确的输出？什么是部分正确的输出？什么是失败的输出？"
选项：
- "我来定义" / "让我描述每个状态"（用户输入）
- "使用默认值" / "保留模板默认值，我稍后再优化"

若用户定义了标准，使用Edit工具更新状态表格。

对于评估准则：

使用

AskUserQuestion

：

标题："评估准则"
问题："哪些标准会影响输出质量？列出3-7个你将用1-5分制评分的维度。\n\n示例：准确性、完整性、清晰度、品牌合规性、技术正确性、创意性、格式"
选项：
- "我来列出" / "让我命名我的评估标准"（用户输入）
- "使用5个默认标准" / "从通用标准开始，我稍后再自定义"

若用户提供了标准，为每个标准构建评估准则表格。对于每个标准，通过一次

AskUserQuestion

询问：

标题："评估准则详情"
问题："对于每个标准，简要描述5分（优秀）和1分（较差）的表现。或者仅列出标准名称，我来撰写合理的描述。\n\n{列出用户提供的标准}"
选项：
- "我来描述" / "让我定义每个标准的评分尺度"（用户输入）
- "你来撰写" / "编写合理的描述，我稍后再审核"

使用Edit工具更新评估准则表格。

4h: Common Fixes (optional)

4h：常见修复方案（可选）

Use AskUserQuestion:

Header: "Common Fixes"
Question: "Do you know the typical failure modes for this task? If so, describe them and I'll build a fix table. If not, you can fill this in after your first few runs."
Options:
- "I know some" / "Let me describe common issues" (user types)
- "Skip for now" / "I'll fill this in after running the runbook"

If they provide issues, populate the Common Fixes table via Edit. If skipped, leave the placeholder rows.

使用

AskUserQuestion

：

标题："常见修复方案"
问题："你是否了解此任务的典型失败模式？如果了解，请描述，我将构建修复方案表格。如果不了解，你可以在运行几次Runbook后再填充此部分。"
选项：
- "我了解一些" / "让我描述常见问题"（用户输入）
- "暂时跳过" / "我将在运行Runbook后再填充"

若用户提供了问题，使用Edit工具填充常见修复方案表格。若用户选择跳过，保留占位符行。

4i: Tips (optional)

4i：提示信息（可选）

Use AskUserQuestion:

Header: "Tips"
Question: "Any domain-specific gotchas, API quirks, or hard-won lessons you want to capture? These help the agent avoid known pitfalls."
Options:
- "Yes" / "I have some tips to add" (user types)
- "Skip" / "Nothing comes to mind — I'll add tips later"

If they provide tips, update the Tips section via Edit.

使用

AskUserQuestion

：

标题："提示信息"
问题："是否有任何领域特定的陷阱、API特性或经验教训需要记录？这些信息可帮助Agent避免已知问题。"
选项：
- "是" / "我有一些提示信息要添加"（用户输入）
- "跳过" / "暂时没想到——我稍后再添加"

若用户提供了提示信息，使用Edit工具更新提示信息章节。

Step 5: Validate the Runbook

步骤5：验证Runbook

Run structural validation checks on the completed runbook. Write a validation script to a temp file and execute it:

bash

cat > /tmp/validate_runbook.sh << 'VALIDATE_EOF'
#!/bin/bash
FILE="$1"
ERRORS=0
WARNINGS=0

echo "=== RUNBOOK VALIDATION: $FILE ==="

对已完成的Runbook进行结构验证。将验证脚本写入临时文件并执行：

bash

cat > /tmp/validate_runbook.sh << 'VALIDATE_EOF'
#!/bin/bash
FILE="$1"
ERRORS=0
WARNINGS=0

echo "=== RUNBOOK VALIDATION: $FILE ==="

Check frontmatter

if head -5 "$FILE" | grep -q "^---"; then VERSION=$(grep "^version:" "$FILE" | head -1 | sed 's/version: *//' | tr -d '"') EVAL=$(grep "^evaluation:" "$FILE" | head -1 | sed 's/evaluation: *//' | tr -d '"') if [ -n "$VERSION" ] && [ -n "$EVAL" ]; then echo "PASS: Frontmatter (version: $VERSION, evaluation: $EVAL)" else echo "ERROR: Frontmatter missing version or evaluation field" ERRORS=$((ERRORS+1)) fi if [ "$EVAL" != "programmatic" ] && [ "$EVAL" != "rubric" ]; then echo "ERROR: evaluation must be 'programmatic' or 'rubric', got '$EVAL'" ERRORS=$((ERRORS+1)) fi else echo "ERROR: No YAML frontmatter found" ERRORS=$((ERRORS+1)) fi

Check required sections

for section in "## Objective" "## REQUIRED OUTPUT FILES" "## Final Checklist"; do if grep -q "$section" "$FILE"; then echo "PASS: '$section' section found" else echo "ERROR: '$section' section missing" ERRORS=$((ERRORS+1)) fi done

Check validation_report.json in manifest

if grep -q "validation_report.json" "$FILE"; then echo "PASS: validation_report.json in output manifest" else echo "ERROR: validation_report.json not found in output manifest" ERRORS=$((ERRORS+1)) fi

Check summary.md in manifest

if grep -q "summary.md" "$FILE"; then echo "PASS: summary.md in output manifest" else echo "WARN: summary.md not found in output manifest (recommended)" WARNINGS=$((WARNINGS+1)) fi

Check for evaluation step

if grep -q "Evaluate" "$FILE" || grep -q "Rubric" "$FILE"; then echo "PASS: Evaluation step found" else echo "ERROR: No evaluation step found" ERRORS=$((ERRORS+1)) fi

Check for iteration with max rounds

if grep -qiE "max [0-9]+ round|iterate.*max|up to [0-9]+" "$FILE"; then echo "PASS: Iteration step with bounded rounds found" else echo "ERROR: No bounded iteration step found (must specify max rounds)" ERRORS=$((ERRORS+1)) fi

Check for verification script

if grep -q "FINAL OUTPUT VERIFICATION" "$FILE" || grep -q "Verification Script" "$FILE"; then echo "PASS: Verification script found" else echo "ERROR: No verification script in Final Checklist" ERRORS=$((ERRORS+1)) fi

Check Parameters section if template vars exist

VARS=$(grep -oE '{{[a-z_]+}}' "$FILE" | sort -u | tr -d '{}') if [ -n "$VARS" ]; then if grep -q "## Parameters" "$FILE"; then echo "PASS: Parameters section found" # Check each template var is declared for var in $VARS; do if grep -q "$var" "$FILE" | grep -cq "Template Variable|Parameter"; then true # declared fi done else echo "ERROR: Template variables found but no Parameters section" ERRORS=$((ERRORS+1)) fi fi

Check for Dependencies section

if grep -q "## Dependencies" "$FILE"; then echo "PASS: Dependencies section found" else echo "WARN: No Dependencies section (add if runbook uses external APIs/workflows)" WARNINGS=$((WARNINGS+1)) fi

Check for Tips section

if grep -q "## Tips" "$FILE"; then echo "PASS: Tips section found" else echo "WARN: No Tips section (recommended for domain-specific guidance)" WARNINGS=$((WARNINGS+1)) fi

Check for remaining TODO markers

TODO_COUNT=$(grep -c "{TODO:" "$FILE" 2>/dev/null || echo 0) if [ "$TODO_COUNT" -gt 0 ]; then echo "WARN: $TODO_COUNT {TODO:} markers remain — fill these in before running" WARNINGS=$((WARNINGS+1)) fi

echo "" if [ $ERRORS -eq 0 ]; then echo "Result: VALID ($WARNINGS warning(s))" else echo "Result: INVALID ($ERRORS error(s), $WARNINGS warning(s))" fi VALIDATE_EOF chmod +x /tmp/validate_runbook.sh bash /tmp/validate_runbook.sh "THE_RUNBOOK_PATH"


Replace `THE_RUNBOOK_PATH` with the actual path from Step 2c.

**If there are errors**, tell the user what needs to be fixed and guide them through the fixes using Edit. Re-run validation after fixes.

**If valid**, tell the user:

> "Your runbook passes structural validation! {N warnings if any — mention them briefly.}"

---


将`THE_RUNBOOK_PATH`替换为步骤2c中的实际文件路径。

**若存在错误**，告知用户需要修复的内容，并使用Edit工具引导用户完成修复。修复后重新运行验证。

**若验证通过**，告知用户：

> "你的Runbook通过结构验证！{若有警告，简要提及。}"

---

Step 6: Optional Dry Run

步骤6：可选的试运行

Use AskUserQuestion:

Header: "Test"
Question: "Want me to do a dry run? I'll read through your runbook step by step and list what each step would do — flagging any missing credentials, unavailable APIs, or potential issues — without actually executing anything."
Options:
- "Yes, dry run" / "Walk through the plan without executing"
- "Skip" / "I'll test it myself later"

If "Yes, dry run":

Read the completed runbook with the Read tool. Then produce a walkthrough:

List all parameters and whether they have values or need to be provided at runtime
For each step, describe what the agent would do:
- Which APIs or services it would call
- What data it would process
- What files it would write
Flag potential issues:
- Parameters without defaults that need values
- External APIs or credentials referenced
- Jetty workflows that need to exist
- Packages that need to be installed
Estimate the rough scope (number of API calls, expected outputs)

Present this as a formatted summary to the user. If the runbook has a

{{results_dir}}

, create the results directory and write the walkthrough to

{results_dir}/plan.md

bash

mkdir -p ./results

Write

./results/plan.md

with the walkthrough using the Write tool.

使用

AskUserQuestion

：

标题："测试"
问题："是否需要进行试运行？我将逐步读取你的Runbook，列出每个步骤的操作——标记缺失的凭据、不可用的API或潜在问题——但不会实际执行任何操作。"
选项：
- "是，进行试运行" / "在不执行的情况下浏览计划"
- "跳过" / "我稍后自己测试"

若用户选择"是，进行试运行"：

使用Read工具读取已完成的Runbook。然后生成以下浏览内容：

列出所有参数，以及它们是否有默认值或需要在运行时提供
对于每个步骤，描述Agent将执行的操作：
- 将调用哪些API或服务
- 将处理哪些数据
- 将写入哪些文件
标记潜在问题：
- 无默认值的参数（需要用户提供）
- 引用的外部API或凭据
- 需要存在的Jetty工作流
- 需要安装的包
估算大致范围（API调用次数、预期输出数量）

将以上内容整理为格式化的摘要展示给用户。若Runbook包含

{{results_dir}}

参数，创建结果目录并将浏览内容写入

{results_dir}/plan.md

：

bash

mkdir -p ./results

使用Write工具将浏览内容写入

./results/plan.md

。

Step 7: Next Steps

步骤7：后续操作

Tell the user:

Your runbook is ready! Here's how to use it:

Run it locally: Open the runbook in a new conversation and tell the agent to follow it: "Follow the runbook in ./RUNBOOK.md. Use these parameters: results_dir=./results, {other params}..."
Run it on Jetty (recommended): Use the chat-completions endpoint with a
jetty
block — this is the single API call that configures everything: which agent runs it, which collection it belongs to, and what files to upload into the sandbox.
bash
curl -X POST "https://flows-api.jetty.io/v1/chat/completions" \
  -H "Authorization: Bearer $JETTY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{model from frontmatter}",
    "messages": [
      {"role": "system", "content": "<contents of your RUNBOOK.md>"},
      {"role": "user", "content": "Execute the runbook."}
    ],
    "stream": true,
    "jetty": {
      "runbook": true,
      "collection": "{your-collection}",
      "task": "{task-name}",
      "agent": "{agent from frontmatter}",
      "file_paths": []
    }
  }'
The
jetty
block fields map directly to your runbook's frontmatter:
Frontmatter field
jetty
block field
Purpose
agent
jetty.agent
Which agent CLI runs the runbook (
claude-code
,
codex
,
gemini-cli
)
model
model
(top-level)
Which LLM the agent uses
snapshot
(auto-selected)
Sandbox environment — set via collection config
—
jetty.collection
Namespace that holds your env vars and secrets
—
jetty.task
Task name for grouping trajectories
—
jetty.file_paths
Files to upload into the sandbox workspace
Or use
/jetty run runbook
to have the agent build this request for you interactively.
Iterate on the runbook: After your first few runs, come back and:

Add entries to the Common Fixes table based on failures you observe

Add Tips for gotchas the agent encountered

Tighten evaluation criteria as your quality bar becomes clearer

Bump the version when you make structural changes
Re-validate after changes: Run
/create-runbook
again on an existing RUNBOOK.md to re-validate it, or run the validation script from Step 5 manually.

Frontmatter field	`jetty` block field	Purpose
`agent`	`jetty.agent`	Which agent CLI runs the runbook ( `claude-code` , `codex` , `gemini-cli` )
`model`	`model` (top-level)	Which LLM the agent uses
`snapshot`	(auto-selected)	Sandbox environment — set via collection config
—	`jetty.collection`	Namespace that holds your env vars and secrets
—	`jetty.task`	Task name for grouping trajectories
—	`jetty.file_paths`	Files to upload into the sandbox workspace

告知用户：

你的Runbook已准备就绪！ 使用方法如下：

本地运行： 在新对话中打开Runbook，告知Agent按照Runbook执行： "按照./RUNBOOK.md中的Runbook执行。使用以下参数：results_dir=./results，{其他参数}..."
在Jetty上运行（推荐）： 使用chat-completions端点并携带
jetty
块——这是一个可配置所有内容的单一API调用：包括运行Runbook的Agent、所属集合，以及要上传到沙箱的文件。
bash
curl -X POST "https://flows-api.jetty.io/v1/chat/completions" \
  -H "Authorization: Bearer $JETTY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{model from frontmatter}",
    "messages": [
      {"role": "system", "content": "<contents of your RUNBOOK.md>"},
      {"role": "user", "content": "Execute the runbook."}
    ],
    "stream": true,
    "jetty": {
      "runbook": true,
      "collection": "{your-collection}",
      "task": "{task-name}",
      "agent": "{agent from frontmatter}",
      "file_paths": []
    }
  }'
jetty
块字段与Runbook前置配置的对应关系：
前置配置字段
jetty
块字段
用途
agent
jetty.agent
运行Runbook的Agent CLI（
claude-code
、
codex
、
gemini-cli
）
model
顶层
model
Agent使用的LLM模型
snapshot
自动选择
沙箱环境——通过集合配置设置
—
jetty.collection
存储环境变量和密钥的命名空间
—
jetty.task
用于分组任务轨迹的任务名称
—
jetty.file_paths
要上传到沙箱工作区的文件
或使用
/jetty run runbook
让Agent交互式地帮你构建此请求。
迭代优化Runbook： 在运行几次后，返回并：

根据观察到的失败情况，向常见修复方案表格添加内容

添加Agent遇到的提示信息

随着质量标准的明确，收紧评估标准

当进行结构性更改时，升级版本号
修改后重新验证： 对已有的RUNBOOK.md再次运行
/create-runbook
以重新验证，或手动运行步骤5中的验证脚本。

前置配置字段	`jetty` 块字段	用途
`agent`	`jetty.agent`	运行Runbook的Agent CLI（ `claude-code` 、 `codex` 、 `gemini-cli` ）
`model`	顶层 `model`	Agent使用的LLM模型
`snapshot`	自动选择	沙箱环境——通过集合配置设置
—	`jetty.collection`	存储环境变量和密钥的命名空间
—	`jetty.task`	用于分组任务轨迹的任务名称
—	`jetty.file_paths`	要上传到沙箱工作区的文件

Important Notes

重要注意事项

Always keep
validation_report.json
in the output manifest. This is the standardized machine-readable results filename across all Jetty runbooks. Never use
```
scores.json
```
,
```
results.json
```
, or other variants.
The
{{results_dir}}
parameter defaults to
```
/app/results
```
when running on Jetty and
```
./results
```
when running locally.
Bound iteration. Every iteration loop must specify a maximum round count (typically 3). Without bounds, the agent may loop indefinitely.
Use imperative language in the output manifest and final checklist. Agents tend to wrap up early when they encounter errors — strong language like "Do NOT finish until all items pass" overrides this.
Don't over-specify intermediate steps. The agent should have room to adapt. Specify what each step must produce, not every line of code.
Don't mix evaluation patterns. Programmatic validation for structured output, rubric scoring for creative output. Don't rubric-score a JSON file or schema-validate a social graphic.

始终在输出清单中保留
validation_report.json
。这是所有Jetty Runbook的标准化机器可读结果文件名。请勿使用
```
scores.json
```
、
```
results.json
```
或其他变体。
{{results_dir}}
参数在Jetty上运行时默认值为
```
/app/results
```
，在本地运行时默认值为
```
./results
```
。
限制迭代次数。每个迭代循环必须指定最大轮次（通常为3次）。若无限制，Agent可能会无限循环。
在输出清单和最终检查清单中使用命令式语言。Agent遇到错误时往往会提前结束任务——使用类似"未全部通过前不得结束"的强硬语言可避免此情况。
不要过度指定中间步骤。Agent需要有调整的空间。指定每个步骤必须生成的结果，而非每一行代码。
不要混合使用评估模式。结构化输出使用程序化校验，创意输出使用评估准则评分。不要对JSON文件使用准则评分，也不要对社交图形使用Schema验证。