browser-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser Agent

Browser Agent

AI Agent 浏览器自动化工具集,提供三种互补的工具用于网页数据获取和自动化操作。
AI Agent browser automation toolset that provides three complementary tools for web data retrieval and automation operations.

工具选择指南

Tool Selection Guide

用户请求
    ├── 简单抓取静态内容?
    │   └── 用 curl / WebFetch(更快)
    ├── 需要 JS 渲染 / 绕过反爬?
    │   ├── agent-browser ── 提取无障碍树
    │   │
    │   ├── 截图? ── agent-browser -s
    │   │
    │   └── 目标网站在 actionbook 列表?
    │       └── actionbook get <site> ── 获取专用食谱
    └── 复杂多步骤自动化?
        └── browser-use (Python) ── AI 驱动自主操作
User Request
    ├── Simple static content scraping?
    │   └── Use curl / WebFetch (faster)
    ├── Need JS rendering / bypass anti-scraping?
    │   ├── agent-browser ── Extract accessibility tree
    │   │
    │   ├── Screenshot? ── agent-browser -s
    │   │
    │   └── Target site in actionbook list?
    │       └── actionbook get <site> ── Get dedicated recipe
    └── Complex multi-step automation?
        └── browser-use (Python) ── AI-powered autonomous operation

agent-browser

agent-browser

CLI 工具,使用 Playwright 启动无头浏览器并提取页面的无障碍树(Accessibility Tree)。
核心优势
  • 无需登录即可访问大部分内容
  • 获取结构化的可读文本
  • 支持截图
  • 自动处理 JS 渲染
CLI tool that uses Playwright to launch a headless browser and extract the page's accessibility tree.
Core Advantages:
  • Access most content without login
  • Get structured, readable text
  • Supports screenshots
  • Automatically handles JS rendering

基本用法

Basic Usage

bash
undefined
bash
undefined

提取网页内容(无障碍树)

Extract web content (accessibility tree)

agent-browser <URL>
agent-browser <URL>

截图

Take screenshot

agent-browser -s <URL>
agent-browser -s <URL>

指定输出格式

Specify output format

agent-browser -f markdown <URL> agent-browser -f html <URL> agent-browser -f text <URL>
agent-browser -f markdown <URL> agent-browser -f html <URL> agent-browser -f text <URL>

交互模式(可点击、滚动)

Interactive mode (click, scroll available)

agent-browser -i <URL>
agent-browser -i <URL>

指定浏览器

Specify browser

agent-browser --browser chromium <URL> agent-browser --browser firefox <URL>
undefined
agent-browser --browser chromium <URL> agent-browser --browser firefox <URL>
undefined

常见场景

Common Scenarios

bash
undefined
bash
undefined

获取 X/Twitter 帖子内容

Get X/Twitter post content

获取 GitHub 仓库信息

Get GitHub repository information

获取 Reddit 帖子

Get Reddit post

获取新闻文章(JS 渲染)

Get news article (JS-rendered)


**详细命令参考**: [references/agent-browser-reference.md](references/agent-browser-reference.md)

**Detailed Command Reference**: [references/agent-browser-reference.md](references/agent-browser-reference.md)

actionbook

actionbook

50+ 网站的预计算自动化"食谱",提供经过验证的自动化模板。
Pre-computed automation "recipes" for 50+ websites, providing validated automation templates.

基本用法

Basic Usage

bash
undefined
bash
undefined

列出所有支持的网站

List all supported websites

actionbook list
actionbook list

获取特定网站的食谱

Get recipe for a specific site

actionbook get <site>
actionbook get <site>

示例

Examples

actionbook get github actionbook get reddit actionbook get amazon
undefined
actionbook get github actionbook get reddit actionbook get amazon
undefined

工作流程

Workflow

  1. 运行
    actionbook list
    查看支持的网站
  2. 运行
    actionbook get <site>
    获取该网站的自动化模板
  3. 根据模板编写自动化脚本(使用 browser-use 或直接使用 agent-browser)
详细命令参考: references/actionbook-reference.md
  1. Run
    actionbook list
    to view supported websites
  2. Run
    actionbook get <site>
    to obtain the automation template for that site
  3. Write automation scripts based on the template (using browser-use or directly with agent-browser)
Detailed Command Reference: references/actionbook-reference.md

browser-use

browser-use

Python 库,使用 AI 自主控制浏览器完成复杂任务。
Python library that uses AI to autonomously control browsers for complex tasks.

安装

Installation

bash
pip install browser-use
playwright install chromium
bash
pip install browser-use
playwright install chromium

基本用法

Basic Usage

python
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)
python
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)

常见场景

Common Scenarios

python
undefined
python
undefined

表单填写

Form filling

agent = Agent( task="Go to example.com and fill out the contact form with test data", llm=llm, )
agent = Agent( task="Go to example.com and fill out the contact form with test data", llm=llm, )

数据抓取

Data scraping

agent = Agent( task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices", llm=llm, )
agent = Agent( task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices", llm=llm, )

多步骤操作

Multi-step operations

agent = Agent( task="Log into Twitter, navigate to settings, and enable two-factor authentication", llm=llm, )

**详细 API 参考**: [references/browser-use-reference.md](references/browser-use-reference.md)
agent = Agent( task="Log into Twitter, navigate to settings, and enable two-factor authentication", llm=llm, )

**Detailed API Reference**: [references/browser-use-reference.md](references/browser-use-reference.md)

决策流程

Decision Flow

任务类型 → 工具选择

Task Type → Tool Selection

任务类型推荐工具原因
快速抓取单个页面agent-browser简单直接,无障碍树输出
需要页面截图agent-browser -s内置截图功能
目标网站在 actionbook 中actionbook + browser-use有现成的最佳实践
复杂多步骤操作browser-useAI 自主决策和执行
需要登录的网站browser-use可以处理登录流程
批量数据采集browser-use支持循环和条件判断
Task TypeRecommended ToolReason
Quick single-page scrapingagent-browserSimple and straightforward, accessibility tree output
Need page screenshotsagent-browser -sBuilt-in screenshot functionality
Target site is in actionbookactionbook + browser-useReady-made best practices available
Complex multi-step operationsbrowser-useAI autonomous decision-making and execution
Sites requiring loginbrowser-useCan handle login flows
Batch data collectionbrowser-useSupports loops and conditional judgments

示例工作流

Example Workflows

场景:获取 X/Twitter 帖子内容
bash
undefined
Scenario: Get X/Twitter Post Content
bash
undefined

方法 1:直接使用 agent-browser(推荐)

Method 1: Directly use agent-browser (recommended)

方法 2:使用 browser-use 进行更复杂操作

Method 2: Use browser-use for more complex operations

编写 Python 脚本

Write a Python script


**场景:GitHub Trending 分析**

```bash

**Scenario: GitHub Trending Analysis**

```bash

方法 1:agent-browser

Method 1: agent-browser

方法 2:使用 actionbook 获取 GitHub 食谱

Method 2: Use actionbook to get GitHub recipe

actionbook get github
actionbook get github

然后根据食谱编写脚本

Then write a script based on the recipe

undefined
undefined

注意事项

Notes

  1. 速率限制:频繁请求可能被目标网站封禁,建议添加延迟
  2. 登录要求:某些内容需要登录才能访问,使用 browser-use 处理
  3. 动态内容:agent-browser 会等待页面加载完成,但对于无限滚动页面需要交互模式
  4. 法律合规:确保抓取行为符合目标网站的服务条款
  1. Rate Limiting: Frequent requests may lead to being blocked by the target website, it is recommended to add delays
  2. Login Requirements: Some content requires login to access, use browser-use to handle this
  3. Dynamic Content: agent-browser waits for page loading to complete, but interactive mode is required for infinite scroll pages
  4. Legal Compliance: Ensure scraping behavior complies with the target website's terms of service

故障排除

Troubleshooting

问题解决方案
页面加载超时使用
-t
增加超时时间
内容未渲染使用交互模式
-i
手动等待
反爬虫拦截尝试不同的 user-agent 或使用 browser-use
截图空白确保页面完全加载后再截图
IssueSolution
Page loading timeoutUse
-t
to increase timeout duration
Content not renderedUse interactive mode
-i
to wait manually
Anti-scraping blockTry different user-agents or use browser-use
Blank screenshotEnsure the page is fully loaded before taking the screenshot