midscene-yaml-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMidscene YAML Generator
Midscene YAML Generator
典型工作流
Typical Workflow
用户需求 → [Generator] 生成 YAML
→ [Generator] 自动 dry-run 验证
→ 验证失败?→ [Generator] 自动修复
→ [Runner] 执行
→ 执行失败?→ [Runner] 分析 + 修复 YAML → 重新执行
→ 成功 → 展示报告摘要User Requirement → [Generator] Generate YAML
→ [Generator] Auto dry-run validation
→ Validation failed? → [Generator] Auto-fix
→ [Runner] Execute
→ Execution failed? → [Runner] Analyze + Fix YAML → Re-execute
→ Success → Display report summary触发条件
Trigger Conditions
当用户描述一个浏览器自动化需求(自然语言),需要生成 Midscene YAML 文件时使用。
常见触发短语:
- "生成一个 YAML 来..."
- "帮我写个自动化脚本..."
- "创建 Midscene 测试用例..."
- "我想自动化 XXX 操作..."
- "把这个需求转成 YAML..."
- "写个 Midscene 配置文件..."
English trigger phrases:
- "Generate a YAML for..."
- "Write an automation script to..."
- "Create a test case for..."
- "Automate the login flow"
- "Convert this requirement to YAML"
- "Write a Midscene config file for..."
Use this when users describe a browser automation requirement (in natural language) and need to generate a Midscene YAML file.
Common trigger phrases:
- "Generate a YAML to..."
- "Help me write an automation script..."
- "Create a Midscene test case..."
- "I want to automate the XXX operation..."
- "Convert this requirement to YAML..."
- "Write a Midscene config file..."
English trigger phrases:
- "Generate a YAML for..."
- "Write an automation script to..."
- "Create a test case for..."
- "Automate the login flow"
- "Convert this requirement to YAML"
- "Write a Midscene config file for..."
工作流程
Workflow
第 1 步:分析需求复杂度
Step 1: Analyze Requirement Complexity
根据用户描述判断所需模式:
选择 Native 模式 — 当需求仅涉及:
- 打开网页 / 启动应用
- 点击、悬停、输入、滚动、键盘操作等基础交互
- AI 自动规划执行()
ai - 数据提取()
aiQuery - 验证断言()
aiAssert - 等待条件()
aiWaitFor - 工具操作(、
sleep、javascript)recordToReport - 平台特定操作(、
runAdbShell、runWdaRequest)launch
选择 Extended 模式 — 当需求涉及以下任一:
- 条件判断("如果...则...")
- 循环操作("重复"、"遍历"、"翻页")
- 变量和动态数据("定义变量"、"参数化")
- 外部 API 调用("调用接口")
- 错误处理重试("失败了就..."、"重试")
- 并行任务("同时做...")
- 数据转换处理("过滤"、"排序"、"映射")
- 导入复用子流程("复用"、"导入")
经验法则: 先用 Native 写,当你发现自己需要 、 或变量时,切换到 Extended。
ifforDetermine the required mode based on the user's description:
Choose Native Mode — When the requirement only involves:
- Open web pages / Launch applications
- Basic interactions like clicking, hovering, inputting, scrolling, keyboard operations
- AI automatic planning and execution ()
ai - Data extraction ()
aiQuery - Validation assertions ()
aiAssert - Wait conditions ()
aiWaitFor - Tool operations (,
sleep,javascript)recordToReport - Platform-specific operations (,
runAdbShell,runWdaRequest)launch
Choose Extended Mode — When the requirement involves any of the following:
- Conditional judgment ("If...then...")
- Loop operations ("Repeat", "Traverse", "Pagination")
- Variables and dynamic data ("Define variables", "Parameterization")
- External API calls ("Call API")
- Error handling and retries ("If failed...", "Retry")
- Parallel tasks ("Do...simultaneously")
- Data transformation processing ("Filter", "Sort", "Map")
- Import and reuse sub-flows ("Reuse", "Import")
Rule of Thumb: Start with Native mode, switch to Extended mode when you find you need , or variables.
iffor第 2 步:确定目标平台
Step 2: Determine Target Platform
根据用户描述判断平台配置:
| 用户描述 | 平台 | YAML 配置 |
|---|---|---|
| "打开网页/网站/URL" | Web | |
| "测试 Android 应用" | Android | |
| "测试 iOS 应用" | iOS | |
| "桌面自动化" | Computer | |
Web 平台额外配置选项:
- — 是否无头模式运行(默认 false)
headless: true/false - /
viewportWidth— 视口大小(默认 1280×720)viewportHeight - — 自定义 User-Agent
userAgent - — 设备像素比(如 Retina 屏设 2)
deviceScaleFactor - — 网络空闲等待配置,支持
waitForNetworkIdle或对象格式true{ timeout: 2000, continueOnNetworkIdleError: true } - — Cookie JSON 文件路径(实现免登录会话恢复)
cookie - — Bridge 模式:
bridgeMode(默认)|false|'newTabWithUrl',复用已登录的桌面浏览器'currentTab' - — 自定义 Chrome 启动参数数组(如
chromeArgs)['--disable-gpu', '--proxy-server=...'] - — 本地静态文件目录,启动内置服务器
serve - — 忽略 HTTPS 证书错误(默认 false)
acceptInsecureCerts - — 限制导航在当前标签页(默认 true)
forceSameTabNavigation
Determine platform configuration based on user description:
| User Description | Platform | YAML Configuration |
|---|---|---|
| "Open web page/website/URL" | Web | |
| "Test Android app" | Android | |
| "Test iOS app" | iOS | |
| "Desktop automation" | Computer | |
Additional Web Platform Configuration Options:
- — Run in headless mode (default false)
headless: true/false - /
viewportWidth— Viewport size (default 1280×720)viewportHeight - — Custom User-Agent
userAgent - — Device pixel ratio (e.g., set to 2 for Retina screens)
deviceScaleFactor - — Network idle wait configuration, supports
waitForNetworkIdleor object formattrue{ timeout: 2000, continueOnNetworkIdleError: true } - — Path to Cookie JSON file (enables login-free session recovery)
cookie - — Bridge mode:
bridgeMode(default) |false|'newTabWithUrl', reuses logged-in desktop browser'currentTab' - — Array of custom Chrome launch arguments (e.g.,
chromeArgs)['--disable-gpu', '--proxy-server=...'] - — Local static file directory, starts built-in server
serve - — Ignore HTTPS certificate errors (default false)
acceptInsecureCerts - — Restrict navigation to current tab (default true)
forceSameTabNavigation
第 3 步:自然语言 → YAML 转换
Step 3: Natural Language → YAML Conversion
动作选择优先级(重要)
Action Selection Priority (Important)
- 首选 — 用自然语言描述整个意图,让 AI 自动规划并执行多步骤。适合绝大多数场景,成功率最高
ai: - 需要精确控制时 — 使用 、
aiTap等具体动作(如填写特定表单字段)aiInput - 需要提取数据时 — 必须使用 (
aiQuery不能返回结构化数据)ai: - 需要验证状态时 — 使用 或
aiAssertaiWaitFor
经验法则: 如果用户需求可以用一句自然语言描述完成,优先用一个 步骤,而不是拆成多个 + 。
ai:aiInputaiTap黄金路径 — 最简可工作示例:
yaml
web:
url: "https://www.baidu.com"
tasks:
- name: "搜索 Midscene"
flow:
- ai: "在搜索框输入 Midscene 并点击搜索"
- sleep: 3000
- aiAssert: "页面显示了搜索结果"- Prefer — Describe the entire intent in natural language, let AI automatically plan and execute multi-step operations. Suitable for most scenarios with the highest success rate
ai: - When precise control is needed — Use specific actions like ,
aiTap(e.g., filling in specific form fields)aiInput - When data extraction is needed — Must use (
aiQuerycannot return structured data)ai: - When state validation is needed — Use or
aiAssertaiWaitFor
Rule of Thumb: If the user's requirement can be described in a single natural language sentence, prioritize using one step instead of splitting into multiple + steps.
ai:aiInputaiTapGolden Path - Minimal Working Example:
yaml
web:
url: "https://www.baidu.com"
tasks:
- name: "Search for Midscene"
flow:
- ai: "Enter Midscene in the search box and click search"
- sleep: 3000
- aiAssert: "The page displays search results"Native 模式 YAML 格式规范(重要)
Native Mode YAML Format Specification (Important)
Native 模式的动作参数支持两种格式:
扁平格式(推荐,简洁):动作关键字后跟字符串值,额外参数作为同级兄弟键。
yaml
- aiInput: "搜索框"
value: "关键词"
- aiWaitFor: "页面加载完成"
timeout: 10000
- aiTap: "按钮描述"
deepThink: true
- aiAssert: "页面包含预期内容"
errorMessage: "内容验证失败"嵌套格式(也有效,适合复杂参数):
yaml
- aiInput:
locator: "搜索框"
value: "关键词"
- aiQuery:
query: "提取商品列表"
name: "products"使用以下映射规则表将用户需求转换为 YAML:
Native mode supports two formats for action parameters:
Flat Format (Recommended, concise): Action keyword followed by string value, additional parameters as sibling keys.
yaml
- aiInput: "Search box"
value: "Keyword"
- aiWaitFor: "Page loaded completely"
timeout: 10000
- aiTap: "Button description"
deepThink: true
- aiAssert: "Page contains expected content"
errorMessage: "Content validation failed"Nested Format (Also valid, suitable for complex parameters):
yaml
- aiInput:
locator: "Search box"
value: "Keyword"
- aiQuery:
query: "Extract product list"
name: "products"Use the following mapping rule table to convert user requirements to YAML:
Native 动作映射
Native Action Mapping
| 自然语言模式 | YAML 映射 | 说明 |
|---|---|---|
| "打开/访问/进入 XXX 网站" | | 平台配置 |
| "自动规划并执行 XXX" | | AI 自动拆解为多步骤执行;可选 |
| "点击/按/选择 XXX" | | 简写形式 |
| "悬停/移到 XXX 上" | | 触发下拉菜单或 tooltip |
| "在 XXX 输入 YYY" | | 扁平兄弟格式;可选 |
| "按键盘 XXX 键" | | 支持组合键如 "Control+A"; |
| "向下/上/左/右滚动" | | 扁平兄弟格式;可选 |
| "等待 XXX 出现" | | 可选 timeout(毫秒) |
| "检查/验证/确认 XXX" | | 可选 errorMessage |
| "获取/提取/读取 XXX" | | name 用于存储结果 |
| "暂停/等待 N 秒" | | 参数为毫秒 |
| "执行 JS 代码" | | 直接执行 JavaScript |
| "截图记录到报告" | | 截图并记录描述到报告 |
| "双击 XXX" | | 双击操作;可选 |
| "右键点击 XXX" | | 右键操作;可选 |
| "定位 XXX 元素" | | 定位元素,结果存入变量(Extended 模式可引用) |
| "XXX 是否为真?" | | 返回布尔值;可选 |
| "获取 XXX 数量" | | 返回数字;可选 |
| "获取 XXX 文本" | | 返回字符串;可选 |
| "询问 AI XXX" | | 自由提问,返回文本答案 |
| "拖拽 A 到 B" | | 扁平格式;或嵌套 |
| "清空 XXX 输入框" | | 清除输入框内容 |
| "执行 ADB 命令" | | Android 平台特有 |
| "执行 WDA 请求" | | iOS 平台特有 |
| "启动应用" | | 移动端启动应用 |
| Natural Language Pattern | YAML Mapping | Description |
|---|---|---|
| "Open/access/enter XXX website" | | Platform configuration |
| "Automatically plan and execute XXX" | | AI automatically breaks down into multi-step execution; optional |
| "Click/press/select XXX" | | Short form |
| "Hover/move over XXX" | | Trigger dropdown menu or tooltip |
| "Enter YYY in XXX" | | Flat sibling format; optional |
| "Press XXX key on keyboard" | | Supports key combinations like "Control+A"; |
| "Scroll down/up/left/right" | | Flat sibling format; optional |
| "Wait for XXX to appear" | | Optional timeout (in milliseconds) |
| "Check/verify/confirm XXX" | | Optional errorMessage |
| "Get/extract/read XXX" | | name is used to store the result |
| "Pause/wait N seconds" | | Parameter is in milliseconds |
| "Execute JS code" | | Execute JavaScript directly |
| "Take screenshot and record to report" | | Take screenshot and record description to report |
| "Double-click XXX" | | Double-click operation; optional |
| "Right-click XXX" | | Right-click operation; optional |
| "Locate XXX element" | | Locate element, store result in variable (referencable in Extended mode) |
| "Is XXX true?" | | Returns boolean value; optional |
| "Get the number of XXX" | | Returns number; optional |
| "Get text of XXX" | | Returns string; optional |
| "Ask AI about XXX" | | Free-form question, returns text answer |
| "Drag A to B" | | Flat format; or nested |
| "Clear XXX input box" | | Clear input box content |
| "Execute ADB command" | | Android platform only |
| "Execute WDA request" | | iOS platform only |
| "Launch app" | | Mobile app launch |
Extended 控制流映射
Extended Control Flow Mapping
| 自然语言模式 | YAML 映射 |
|---|---|
| "定义变量 XXX 为 YYY" | |
| "使用环境变量 XXX" | |
| "如果 XXX 则 YYY 否则 ZZZ" | |
| "重复 N 次" | |
| "对每个 XXX 执行" | |
| "当 XXX 时持续做 YYY" | |
| "先做 A,失败了就做 B" | |
| "同时做 A 和 B" | |
| "调用 XXX 接口" | |
| "执行 Shell 命令" | |
| "导入/复用 XXX 流程" | |
| "过滤/排序/映射数据" | |
| Natural Language Pattern | YAML Mapping |
|---|---|
| "Define variable XXX as YYY" | |
| "Use environment variable XXX" | |
| "If XXX then YYY else ZZZ" | |
| "Repeat N times" | |
| "Execute for each XXX" | |
| "Continue doing YYY while XXX" | |
| "Do A first, do B if it fails" | |
| "Do A and B simultaneously" | |
| "Call XXX API" | |
| "Execute Shell command" | |
| "Import/reuse XXX flow" | |
| "Filter/sort/map data" | |
第 4 步:选择模板起点
Step 4: Select Template Starting Point
参考 目录下的模板文件,找到最接近用户需求的模板作为起点:
templates/Native 模板:
- — 基础网页操作
templates/native/web-basic.yaml - — 登录流程
templates/native/web-login.yaml - — 数据提取
templates/native/web-data-extract.yaml - — 网页搜索流程
templates/native/web-search.yaml - — 文件上传表单
templates/native/web-file-upload.yaml - — 多标签页操作
templates/native/web-multi-tab.yaml - — 图片辅助定位(deepThink/xpath)
templates/native/deep-think-locator.yaml - — Android 测试
templates/native/android-app.yaml - — iOS 测试
templates/native/ios-app.yaml - — 桌面应用自动化
templates/native/computer-desktop.yaml
Extended 模板:
- — 条件分支
templates/extended/web-conditional-flow.yaml - — 分页循环
templates/extended/web-pagination-loop.yaml - — 数据流水线
templates/extended/web-data-pipeline.yaml - — 带重试的多步骤
templates/extended/multi-step-with-retry.yaml - — API 集成
templates/extended/api-integration-test.yaml - — 端到端完整工作流
templates/extended/e2e-workflow.yaml - — 子流程复用(import/use)
templates/extended/reusable-sub-flows.yaml - — 多视口响应式测试
templates/extended/responsive-test.yaml - — OAuth/登录认证流程(使用变量和环境引用)
templates/extended/web-auth-flow.yaml
模板选择决策:
| 需求特征 | 推荐模板 |
|---|---|
| 简单页面操作(打开、点击、输入) | |
| 登录 / 表单填写 | |
| 数据采集 / 信息提取 | |
| 搜索 + 结果验证 | |
| 文件上传 / 附件提交 | |
| OAuth/第三方认证登录 | |
| 桌面应用自动化(非浏览器) | |
| 需要条件判断(如果登录了就...) | |
| 需要翻页 / 列表遍历 | |
| 数据过滤 / 排序 / 聚合 | |
| 需要失败重试 | |
| 需要调用外部 API | |
| 完整业务流程(多步骤 + 变量 + 导出) | |
| 子流程复用 / 模块化 | |
| 多屏幕尺寸响应式验证 | |
| 复杂元素定位 / deepThink | |
| 多标签页操作 | |
Refer to the template files in the directory and find the template closest to the user's requirement as the starting point:
templates/Native Templates:
- — Basic web operations
templates/native/web-basic.yaml - — Login flow
templates/native/web-login.yaml - — Data extraction
templates/native/web-data-extract.yaml - — Web search flow
templates/native/web-search.yaml - — File upload form
templates/native/web-file-upload.yaml - — Multi-tab operations
templates/native/web-multi-tab.yaml - — Image-assisted location (deepThink/xpath)
templates/native/deep-think-locator.yaml - — Android testing
templates/native/android-app.yaml - — iOS testing
templates/native/ios-app.yaml - — Desktop app automation
templates/native/computer-desktop.yaml
Extended Templates:
- — Conditional branching
templates/extended/web-conditional-flow.yaml - — Pagination loop
templates/extended/web-pagination-loop.yaml - — Data pipeline
templates/extended/web-data-pipeline.yaml - — Multi-step with retry
templates/extended/multi-step-with-retry.yaml - — API integration
templates/extended/api-integration-test.yaml - — End-to-end complete workflow
templates/extended/e2e-workflow.yaml - — Sub-flow reuse (import/use)
templates/extended/reusable-sub-flows.yaml - — Multi-viewport responsive testing
templates/extended/responsive-test.yaml - — OAuth/login authentication flow (using variables and environment references)
templates/extended/web-auth-flow.yaml
Template Selection Decision:
| Requirement Feature | Recommended Template |
|---|---|
| Simple page operations (open, click, input) | |
| Login / Form filling | |
| Data collection / Information extraction | |
| Search + Result validation | |
| File upload / Attachment submission | |
| OAuth/Third-party authentication login | |
| Desktop app automation (non-browser) | |
| Conditional judgment needed (If logged in then...) | |
| Pagination / List traversal needed | |
| Data filtering / Sorting / Aggregation | |
| Retry on failure needed | |
| External API call needed | |
| Complete business flow (multi-step + variables + export) | |
| Sub-flow reuse / Modularization | |
| Multi-screen size responsive validation | |
| Complex element location / deepThink | |
| Multi-tab operations | |
第 5 步:生成 YAML
Step 5: Generate YAML
基于模板和转换规则生成 YAML 内容,注意以下要点:
- 文件头部:添加注释说明需求来源和生成时间
- engine 字段:Extended 模式必须显式声明
engine: extended - features 列表:Extended 模式下声明使用的特性(如 ),Native 模式可省略
features: [logic, variables, loop] - agent 配置(可选):用于标识测试、
testId/groupName用于报告分类、groupDescription可缓存 AI 结果加速重复运行cache: true - aiActContext(可选):为 AI Agent 提供额外上下文信息(如多语言网站标注语言、特殊领域术语),设置在
agent: { aiActContext: "描述" } - continueOnError(可选):如需某个任务失败后继续执行后续任务,设置
continueOnError: true - output 导出(可选):将 等结果导出为 JSON 文件,供后续流程使用
aiQuery
Generate YAML content based on templates and conversion rules, pay attention to the following points:
- File Header: Add comments explaining the requirement source and generation time
- engine field: Extended mode must explicitly declare
engine: extended - features list: In Extended mode, declare the features used (e.g., ), which can be omitted in Native mode
features: [logic, variables, loop] - agent configuration (optional): is used to identify tests,
testId/groupNamefor report classification,groupDescriptioncan cache AI results to speed up repeated runscache: true - aiActContext (optional): Provide additional context information for AI Agent (such as language annotation for multilingual websites, special domain terms), set in
agent: { aiActContext: "Description" } - continueOnError (optional): If you need to continue executing subsequent tasks after a task fails, set
continueOnError: true - output export (optional): Export results like to a JSON file for use in subsequent processes
aiQuery
输出格式
Output Format
yaml
undefinedyaml
undefined自动生成 by Midscene YAML Generator
Auto-generated by Midscene YAML Generator
需求描述: [用户原始需求]
Requirement Description: [Original user requirement]
生成时间: [timestamp]
Generation Time: [timestamp]
engine: native|extended
features: [...] # 仅 extended 模式
engine: native|extended
features: [...] # Extended mode only
可选: agent 配置
Optional: agent configuration
agent:
agent:
testId: "test-001"
testId: "test-001"
groupName: "自动化测试组"
groupName: "Automation Testing Group"
groupDescription: "描述"
groupDescription: "Description"
cache: true
cache: true
[platform_config]
tasks:
- name: "[任务名称]"
continueOnError: true # 可选:失败后继续
flow: [生成的步骤]output: # 可选:导出数据
filePath: "./midscene-output/data.json"
dataName: "variableName"
undefined[platform_config]
tasks:
- name: "[Task Name]"
continueOnError: true # Optional: Continue on failure
flow: [Generated steps]output: # Optional: Export data
filePath: "./midscene-output/data.json"
dataName: "variableName"
undefined第 6 步:验证并输出
Step 6: Validate and Output
- 输出文件到 目录
./midscene-output/ - 调用验证器确认 YAML 有效:
bash
node scripts/midscene-run.js <file> --dry-run - 如果验证失败,分析错误原因并自动修复
- 验证通过后,提示用户可以使用 Runner 执行:
bash
node scripts/midscene-run.js <file>
- Output the file to the directory
./midscene-output/ - Call the validator to confirm the YAML is valid:
bash
node scripts/midscene-run.js <file> --dry-run - If validation fails, analyze the error cause and auto-fix
- After validation passes, prompt the user to execute using Runner:
bash
node scripts/midscene-run.js <file>
AI 指令编写最佳实践
Best Practices for Writing AI Instructions
生成 YAML 时,AI 指令(、 等参数)的质量直接影响执行成功率。遵循以下原则:
aiTapaiAssertWhen generating YAML, the quality of AI instructions (parameters for , , etc.) directly affects execution success rate. Follow these principles:
aiTapaiAssert描述精确性
Description Precision
- 差: — 页面可能有多个按钮
aiTap: "按钮" - 好: — 位置 + 颜色 + 功能
aiTap: "页面右上角的蓝色登录按钮" - 更好: — 精确到文字内容
aiTap: "导航栏中文字为'立即登录'的按钮"
- Poor: — There may be multiple buttons on the page
aiTap: "Button" - Good: — Position + Color + Function
aiTap: "Blue login button at the top right corner of the page" - Better: — Precise to text content
aiTap: "Button with text 'Login Now' in the navigation bar"
定位策略优先级
Location Strategy Priority
- 自然语言描述(首选):可读性高,适应页面变化
- deepThink 模式:复杂页面中多个相似元素时启用,AI 会进行更深层分析,准确率更高但耗时更长
- 图片辅助定位(image prompting):当文字描述不够时,可通过截图标注辅助 AI 理解目标元素(官方 能力)
locate.images - xpath 选择器(最后手段):当自然语言无法精确定位时。注意:xpath 仅适用于 Web 平台,Android/iOS 应使用自然语言描述
yaml
undefined- Natural language description (Preferred): High readability, adapts to page changes
- deepThink mode: Enable when there are multiple similar elements on complex pages, AI will perform deeper analysis with higher accuracy but longer time consumption
- Image-assisted location (image prompting): When text description is insufficient, screenshot annotations can be used to help AI understand the target element (official capability)
locate.images - xpath selector (Last resort): When natural language cannot locate precisely. Note: xpath is only applicable to Web platform, Android/iOS should use natural language description
yaml
undefined优先使用自然语言
Prefer natural language
- aiTap: "商品列表中第三行的编辑按钮"
- aiTap: "Edit button in the third row of the product list"
复杂场景启用 deepThink(相似元素多、定位不准时使用)
Enable deepThink for complex scenarios (when there are many similar elements or location is inaccurate)
- aiTap: "第三行数据中的编辑图标" deepThink: true
- aiTap: "Edit icon in the third row of data" deepThink: true
最后手段使用 xpath(仅 Web 平台)
Last resort: use xpath (Web platform only)
- aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"
undefined- aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"
undefined图片辅助定位(locate 对象)
Image-assisted Location (locate object)
当自然语言描述不够精确时,可通过 对象提供参考图片:
locateyaml
undefinedWhen natural language description is not precise enough, reference images can be provided via the object:
locateyaml
undefined使用图片辅助 AI 识别目标元素
Use image to assist AI in identifying target element
- aiTap: locate: prompt: "与参考图片相似的图标按钮" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true
- aiTap: locate: prompt: "Icon button similar to the reference image" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true
简化形式:直接在 images 选项中提供
Simplified form: directly provide in images option
- aiTap: "与参考图片相似的图标按钮"
images:
- "./images/target-icon.png"
undefined- aiTap: "Icon button similar to the reference image"
images:
- "./images/target-icon.png"
undefinedaiQuery 结果格式化
aiQuery Result Formatting
在 中明确指定期望的数据结构:
queryyaml
- aiQuery:
query: >
提取页面上所有商品信息,返回数组格式。
每个元素包含以下字段:
- name: 商品名称(字符串)
- price: 价格(数字)
- inStock: 是否有库存(布尔值)
name: "productList"Clearly specify the expected data structure in :
queryyaml
- aiQuery:
query: >
Extract all product information on the page and return it as an array.
Each element should contain the following fields:
- name: Product name (string)
- price: Price (number)
- inStock: In stock (boolean)
name: "productList"等待策略
Wait Strategy
在关键操作后添加 ,确保页面状态就绪:
aiWaitForyaml
- aiTap: "提交按钮"
- aiWaitFor: "提交成功提示出现,或页面跳转到结果页"
timeout: 10000Add after key operations to ensure the page state is ready:
aiWaitForyaml
- aiTap: "Submit button"
- aiWaitFor: "Submit success prompt appears, or page redirects to result page"
timeout: 10000数据转换操作参考
Data Transformation Operation Reference
Extended 模式下 支持的操作:
data_transform| 操作 | 说明 | 关键参数 |
|---|---|---|
| 按条件过滤 | |
| 排序 | |
| 映射/变换 | |
| 聚合计算 | |
| 去重 | |
| 截取子集 | |
| 展平嵌套数组 | |
| 按字段分组 | |
两种格式: 平面格式适合单步操作;嵌套格式{source, operation, name}支持链式多步操作。两种格式均支持所有 8 种操作。{input, operations:[], output}
Operations supported by in Extended mode:
data_transform| Operation | Description | Key Parameters |
|---|---|---|
| Filter by condition | |
| Sort | |
| Map/Transform | |
| Aggregation calculation | |
| Deduplicate | |
| Extract subset | |
| Flatten nested array | |
| Group by field | |
Two Formats: Flat formatis suitable for single-step operations; nested format{source, operation, name}supports chained multi-step operations. Both formats support all 8 operations.{input, operations:[], output}
平台特定注意事项
Platform-Specific Notes
Web 平台
Web Platform
- 必须包含完整协议(
url)https:// - 使用 等待页面加载完成后再操作
aiWaitFor - 表单操作前确保输入框处于可交互状态
- must include the full protocol (
url)https:// - Use to wait for page loading to complete before operations
aiWaitFor - Ensure input boxes are interactive before form operations
Android 平台
Android Platform
- 需要配置 (ADB 设备 ID,如
deviceId)emulator-5554 - 使用 启动应用(在 flow 中作为 action 步骤)
launch: "com.example.app" - 可使用 执行 ADB 命令
runAdbShell
- Need to configure (ADB device ID, e.g.,
deviceId)emulator-5554 - Use to launch the app (as an action step in flow)
launch: "com.example.app" - Can use to execute ADB commands
runAdbShell
iOS 平台
iOS Platform
- 需要配置 (WebDriverAgent 端口,默认 8100)和
wdaPort(默认 localhost)wdaHost - 使用 启动应用(在 flow 中作为 action 步骤)
launch: "com.example.app" - 可使用 发送 WebDriverAgent 请求
runWdaRequest
- Need to configure (WebDriverAgent port, default 8100) and
wdaPort(default localhost)wdaHost - Use to launch the app (as an action step in flow)
launch: "com.example.app" - Can use to send WebDriverAgent requests
runWdaRequest
Computer 平台
Computer Platform
- 用于通用桌面自动化场景
- For general desktop automation scenarios
常见错误模式(Anti-patterns)
Common Anti-patterns
生成 YAML 时应避免以下常见错误:
- 不必要地使用嵌套对象格式 — 推荐扁平格式(+
aiInput: "搜索框"),更简洁可读。嵌套格式(value: "关键词")在两种模式中均有效,但通常只在需要aiInput: { locator: "搜索框", value: "关键词" }图片定位等复杂参数时才使用locate - Extended 模式遗漏 — 使用任何扩展功能(变量、循环、条件等)时必须声明引擎
engine: extended - 循环忘记 —
maxIterations循环必须设置安全上限,while和for循环的 count 不应超过 10000repeat - 使用嵌套对象格式 — 应使用
aiWaitFor+aiWaitFor: "条件",而非timeout: 10000aiWaitFor: { condition: "条件" } - 缺少 声明 — Extended 模式应列出使用的特性,便于检测和优化
features
Avoid the following common mistakes when generating YAML:
- Unnecessary use of nested object format — Flat format is recommended (+
aiInput: "Search box"), which is more concise and readable. Nested format (value: "Keyword") is valid in both modes but is usually only used when complex parameters likeaiInput: { locator: "Search box", value: "Keyword" }image positioning are neededlocate - Missing in Extended mode — Must declare the engine when using any extended features (variables, loops, conditions, etc.)
engine: extended - Forgetting in loops —
maxIterationsloops must set a safety upper limit, the count ofwhileandforloops should not exceed 10000repeat - Using nested object format for — Should use
aiWaitFor+aiWaitFor: "Condition"instead oftimeout: 10000aiWaitFor: { condition: "Condition" } - Missing declaration — Extended mode should list the features used to facilitate detection and optimization
features
输出前自检清单
Pre-output Self-check List
生成 YAML 后,在输出前核验以下事项:
- 每个 都有对应的
aiInput参数?value - 关键操作后有 确保页面状态就绪?
aiWaitFor - Extended 模式声明了 和
engine: extended列表?features - 循环有安全上限(或合理的
maxIterations)?count - 敏感信息(密码、Token)使用 引用环境变量?
${ENV:XXX} - AI 指令描述足够精确(包含位置、文字、颜色等特征)?
After generating YAML, verify the following items before output:
- Does each have a corresponding
aiInputparameter?value - Is there after key operations to ensure page state is ready?
aiWaitFor - Does Extended mode declare and
engine: extendedlist?features - Does the loop have a safety upper limit (or reasonable
maxIterations)?count - Are sensitive information (passwords, Tokens) referenced via environment variables using ?
${ENV:XXX} - Are AI instruction descriptions precise enough (including features like position, text, color)?
注意事项
Notes
- AI 指令(aiTap、aiAssert 等)的参数使用自然语言描述,不需要 CSS 选择器
- 中文和英文描述均可,Midscene 的 AI 引擎支持多语言
- 的结果通过
aiQuery字段存储,在后续步骤中用name引用(仅 Extended 模式)${name} - 建议设置合理的
aiWaitFor(毫秒),默认通常为 15 秒timeout - 循环中务必设置 作为安全上限,防止无限循环
maxIterations - 或
${ENV:XXX}可引用环境变量,避免在 YAML 中硬编码敏感信息${ENV.XXX} - 始终显式声明 字段,避免自动检测带来的意外行为
engine - 变量引用区分大小写:和
${userName}是不同的变量${username} - 避免循环导入:A.yaml 导入 B.yaml、B.yaml 又导入 A.yaml 会导致运行时错误
- 生成后务必通过 验证语法和结构(注意:
--dry-run不检测模型配置,AI 操作需要配置--dry-run才能实际执行)MIDSCENE_MODEL_API_KEY - 提示用户可以用 Midscene Runner skill 来执行生成的文件
- Parameters for AI instructions (aiTap, aiAssert, etc.) are described in natural language, no CSS selectors needed
- Both Chinese and English descriptions are acceptable, Midscene's AI engine supports multiple languages
- Results of are stored via the
aiQueryfield and can be referenced in subsequent steps usingname(Extended mode only)${name} - It is recommended to set a reasonable (in milliseconds) for
timeout, default is usually 15 secondsaiWaitFor - Be sure to set as a safety upper limit in loops to prevent infinite loops
maxIterations - or
${ENV:XXX}can be used to reference environment variables, avoiding hardcoding sensitive information in YAML${ENV.XXX} - Always explicitly declare the field to avoid unexpected behavior from automatic detection
engine - Variable references are case-sensitive: and
${userName}are different variables${username} - Avoid circular imports: Importing B.yaml in A.yaml and A.yaml in B.yaml will cause runtime errors
- Be sure to verify syntax and structure via after generation (Note:
--dry-rundoes not detect model configuration, AI operations require--dry-runto be configured for actual execution)MIDSCENE_MODEL_API_KEY - Prompt users to use the Midscene Runner skill to execute the generated file
迭代修复流程
Iterative Fix Process
当生成的 YAML 执行失败时:
- Runner 可自行修复:如果错误可以通过修改 YAML 解决(如定位描述不够精确、等待时间不足),Runner Skill 会直接修改并重试
- 需要重新生成时:如果错误涉及根本性设计问题(如选错模式、缺少关键步骤),用户可以向 Generator 描述失败情况,Generator 会基于错误信息重新生成改进版 YAML
- 推荐流程:生成 → dry-run 验证 → 执行 → 如失败,描述错误让 Generator 修复 → 重新执行
When the generated YAML fails to execute:
- Runner can fix it automatically: If the error can be resolved by modifying YAML (e.g., imprecise location description, insufficient wait time), Runner Skill will directly modify and retry
- When regeneration is needed: If the error involves fundamental design issues (e.g., wrong mode selected, missing key steps), users can describe the failure to Generator, which will regenerate an improved YAML based on the error information
- Recommended Flow: Generate → dry-run validation → Execute → If failed, describe error for Generator to fix → Re-execute
协作协议
Collaboration Agreement
生成完成后,向用户返回以下结构化信息:
- 生成的文件路径:
./midscene-output/<filename>.yaml - 执行模式: native 或 extended
- 建议的下一步命令:
node scripts/midscene-run.js <path> --dry-run - 如果 dry-run 验证失败,自动分析错误并修复 YAML,重新验证
After generation is complete, return the following structured information to the user:
- Generated File Path:
./midscene-output/<filename>.yaml - Execution Mode: native or extended
- Recommended Next Command:
node scripts/midscene-run.js <path> --dry-run - If dry-run validation fails, automatically analyze the error, fix the YAML, and re-validate