web-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

browser-automation

browser-automation

Overview

概述

Browser automation skill with two approaches:
agent-browser - Snapshot-based interaction model optimized for AI agents
  • Compact element refs (
    @e1
    ,
    @e2
    ) reduce token usage dramatically
  • Workflow:
    open
    snapshot -i
    → interact with refs → re-snapshot
  • Best for: dynamic exploration, form filling, scraping with unknown structure
playwright - Direct Playwright CLI and Node.js scripts
  • Full Playwright API access via scripts
  • Codegen for recording interactions
  • Best for: scripted automation, testing, batch operations, complex workflows
该浏览器自动化技能包含两种实现方案:
agent-browser - 针对AI Agent优化的基于快照的交互模型
  • 精简的元素引用(
    @e1
    @e2
    )可大幅降低token消耗
  • 工作流:
    open
    snapshot -i
    → 通过引用交互 → 重新生成快照
  • 适用场景:动态探索、表单填写、未知结构网页的数据爬取
playwright - 直接调用Playwright CLI和Node.js脚本的方案
  • 可通过脚本调用完整的Playwright API
  • 支持Codegen功能录制交互操作
  • 适用场景:脚本化自动化、测试、批量操作、复杂工作流

Sub-skills

子技能

CRITICAL: You MUST load the appropriate sub-skill from the
sub-skills/
directory based on user intent.
重要提示:你必须根据用户意图从
sub-skills/
目录加载对应的子技能。

When to use each

各子技能适用场景

Sub-skillWhen to useTriggers
agent-browser.mdInteractive exploration, AI-driven navigation, unknown page structure"navigate to", "fill this form", "click the button", "scrape this page", "explore the site"
playwright.mdScripted automation, testing, batch screenshots, codegen"write a script", "generate test", "batch screenshot", "record my actions", "create automation script"
子技能适用场景触发词
agent-browser.md交互式探索、AI驱动的导航、未知页面结构的场景"navigate to", "fill this form", "click the button", "scrape this page", "explore the site"
playwright.md脚本化自动化、测试、批量截图、代码生成"write a script", "generate test", "batch screenshot", "record my actions", "create automation script"

Default behavior

默认行为

  • If user intent is unclear, prefer agent-browser for interactive tasks
  • If user asks for "a script" or "automation code", use playwright
  • If user mentions "codegen" or "record", use playwright
  • 若用户意图不明确,交互式任务优先使用agent-browser
  • 若用户要求"a script"或"automation code",使用playwright
  • 若用户提到"codegen"或"record",使用playwright

Process

流程

  1. Determine user intent from their request
  2. Load the appropriate sub-skill from
    sub-skills/
  3. Execute the sub-skill process
  4. Verify expected outcome was achieved
  1. 从用户请求中判断意图
  2. sub-skills/
    目录加载对应的子技能
  3. 执行子技能对应的流程
  4. 验证是否达到预期结果

Resources

资源

  • sub-skills/: Approach-specific instructions
    • agent-browser.md
      : Snapshot/refs workflow with npx agent-browser
    • playwright.md
      : Playwright CLI and Node.js scripts
  • references/agent-browser/: Deep-dive documentation for agent-browser
  • templates/agent-browser/: Ready-to-use shell scripts for agent-browser
  • sub-skills/:不同实现方案的专属说明
    • agent-browser.md
      :基于npx agent-browser的快照/引用工作流说明
    • playwright.md
      :Playwright CLI和Node.js脚本使用说明
  • references/agent-browser/:agent-browser的深度文档
  • templates/agent-browser/:开箱即用的agent-browser Shell脚本模板

Quick reference

快速参考

agent-browser (default for interactive tasks)

agent-browser(交互式任务默认方案)

bash
undefined
bash
undefined

Session isolation (generate random slug like bright-falcon)

Session isolation (generate random slug like bright-falcon)

npx agent-browser --session <slug> open https://example.com npx agent-browser --session <slug> snapshot -i npx agent-browser --session <slug> click @e1 npx agent-browser --session <slug> fill @e2 "text"
undefined
npx agent-browser --session <slug> open https://example.com npx agent-browser --session <slug> snapshot -i npx agent-browser --session <slug> click @e1 npx agent-browser --session <slug> fill @e2 "text"
undefined

playwright (for scripts and codegen)

playwright(适用于脚本和codegen场景)

bash
undefined
bash
undefined

Quick screenshot

Quick screenshot

npx playwright screenshot https://example.com output.png
npx playwright screenshot https://example.com output.png

Record interactions as code

Record interactions as code

npx playwright codegen https://example.com
npx playwright codegen https://example.com

PDF generation

PDF generation

npx playwright pdf https://example.com output.pdf
undefined
npx playwright pdf https://example.com output.pdf
undefined