browser-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Browser Agent

AI Agent 浏览器自动化工具集，提供三种互补的工具用于网页数据获取和自动化操作。

AI Agent browser automation toolset that provides three complementary tools for web data retrieval and automation operations.

工具选择指南

Tool Selection Guide

用户请求
    │
    ├── 简单抓取静态内容？
    │   └── 用 curl / WebFetch（更快）
    │
    ├── 需要 JS 渲染 / 绕过反爬？
    │   ├── agent-browser ── 提取无障碍树
    │   │
    │   ├── 截图？ ── agent-browser -s
    │   │
    │   └── 目标网站在 actionbook 列表？
    │       └── actionbook get <site> ── 获取专用食谱
    │
    └── 复杂多步骤自动化？
        └── browser-use (Python) ── AI 驱动自主操作

User Request
    │
    ├── Simple static content scraping?
    │   └── Use curl / WebFetch (faster)
    │
    ├── Need JS rendering / bypass anti-scraping?
    │   ├── agent-browser ── Extract accessibility tree
    │   │
    │   ├── Screenshot? ── agent-browser -s
    │   │
    │   └── Target site in actionbook list?
    │       └── actionbook get <site> ── Get dedicated recipe
    │
    └── Complex multi-step automation?
        └── browser-use (Python) ── AI-powered autonomous operation

agent-browser

CLI 工具，使用 Playwright 启动无头浏览器并提取页面的无障碍树（Accessibility Tree）。

核心优势：

无需登录即可访问大部分内容
获取结构化的可读文本
支持截图
自动处理 JS 渲染

CLI tool that uses Playwright to launch a headless browser and extract the page's accessibility tree.

Core Advantages:

Access most content without login
Get structured, readable text
Supports screenshots
Automatically handles JS rendering

基本用法

Basic Usage

bash

undefined

bash

undefined

提取网页内容（无障碍树）

Extract web content (accessibility tree)

agent-browser <URL>

截图

Take screenshot

agent-browser -s <URL>

指定输出格式

Specify output format

agent-browser -f markdown <URL> agent-browser -f html <URL> agent-browser -f text <URL>

交互模式（可点击、滚动）

Interactive mode (click, scroll available)

agent-browser -i <URL>

指定浏览器

Specify browser

agent-browser --browser chromium <URL> agent-browser --browser firefox <URL>

undefined

agent-browser --browser chromium <URL> agent-browser --browser firefox <URL>

undefined

常见场景

Common Scenarios

bash

undefined

bash

undefined

获取 X/Twitter 帖子内容

Get X/Twitter post content

agent-browser "https://x.com/username/status/123456"

获取 GitHub 仓库信息

Get GitHub repository information

agent-browser "https://github.com/owner/repo"

获取 Reddit 帖子

Get Reddit post

agent-browser "https://reddit.com/r/subreddit/comments/abc123"

获取新闻文章（JS 渲染）

Get news article (JS-rendered)

agent-browser "https://example.com/article"


**详细命令参考**: [references/agent-browser-reference.md](references/agent-browser-reference.md)

agent-browser "https://example.com/article"


**Detailed Command Reference**: [references/agent-browser-reference.md](references/agent-browser-reference.md)

actionbook

50+ 网站的预计算自动化"食谱"，提供经过验证的自动化模板。

Pre-computed automation "recipes" for 50+ websites, providing validated automation templates.

基本用法

Basic Usage

bash

undefined

bash

undefined

列出所有支持的网站

List all supported websites

actionbook list

获取特定网站的食谱

Get recipe for a specific site

actionbook get <site>

示例

Examples

actionbook get github actionbook get reddit actionbook get amazon

undefined

actionbook get github actionbook get reddit actionbook get amazon

undefined

工作流程

Workflow

运行
```
actionbook list
```
查看支持的网站
运行
```
actionbook get <site>
```
获取该网站的自动化模板
根据模板编写自动化脚本（使用 browser-use 或直接使用 agent-browser）

详细命令参考: references/actionbook-reference.md

Run
```
actionbook list
```
to view supported websites
Run
```
actionbook get <site>
```
to obtain the automation template for that site
Write automation scripts based on the template (using browser-use or directly with agent-browser)

Detailed Command Reference: references/actionbook-reference.md

browser-use

Python 库，使用 AI 自主控制浏览器完成复杂任务。

Python library that uses AI to autonomously control browsers for complex tasks.

安装

Installation

bash

pip install browser-use
playwright install chromium

bash

pip install browser-use
playwright install chromium

基本用法

Basic Usage

python

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)

python

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)

常见场景

Common Scenarios

python

undefined

python

undefined

表单填写

Form filling

agent = Agent( task="Go to example.com and fill out the contact form with test data", llm=llm, )

数据抓取

Data scraping

agent = Agent( task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices", llm=llm, )

多步骤操作

Multi-step operations

agent = Agent( task="Log into Twitter, navigate to settings, and enable two-factor authentication", llm=llm, )


**详细 API 参考**: [references/browser-use-reference.md](references/browser-use-reference.md)

agent = Agent( task="Log into Twitter, navigate to settings, and enable two-factor authentication", llm=llm, )


**Detailed API Reference**: [references/browser-use-reference.md](references/browser-use-reference.md)

决策流程

Decision Flow

任务类型 → 工具选择

Task Type → Tool Selection

任务类型	推荐工具	原因
快速抓取单个页面	agent-browser	简单直接，无障碍树输出
需要页面截图	agent-browser -s	内置截图功能
目标网站在 actionbook 中	actionbook + browser-use	有现成的最佳实践
复杂多步骤操作	browser-use	AI 自主决策和执行
需要登录的网站	browser-use	可以处理登录流程
批量数据采集	browser-use	支持循环和条件判断

Task Type	Recommended Tool	Reason
Quick single-page scraping	agent-browser	Simple and straightforward, accessibility tree output
Need page screenshots	agent-browser -s	Built-in screenshot functionality
Target site is in actionbook	actionbook + browser-use	Ready-made best practices available
Complex multi-step operations	browser-use	AI autonomous decision-making and execution
Sites requiring login	browser-use	Can handle login flows
Batch data collection	browser-use	Supports loops and conditional judgments

示例工作流

Example Workflows

场景：获取 X/Twitter 帖子内容

bash

undefined

Scenario: Get X/Twitter Post Content

bash

undefined

方法 1：直接使用 agent-browser（推荐）

Method 1: Directly use agent-browser (recommended)

agent-browser "https://x.com/username/status/123456"

方法 2：使用 browser-use 进行更复杂操作

Method 2: Use browser-use for more complex operations

编写 Python 脚本

Write a Python script


**场景：GitHub Trending 分析**

```bash


**Scenario: GitHub Trending Analysis**

```bash

方法 1：agent-browser

Method 1: agent-browser

agent-browser "https://github.com/trending"

方法 2：使用 actionbook 获取 GitHub 食谱

Method 2: Use actionbook to get GitHub recipe

actionbook get github

然后根据食谱编写脚本

Then write a script based on the recipe

undefined

undefined

注意事项

Notes

速率限制：频繁请求可能被目标网站封禁，建议添加延迟
登录要求：某些内容需要登录才能访问，使用 browser-use 处理
动态内容：agent-browser 会等待页面加载完成，但对于无限滚动页面需要交互模式
法律合规：确保抓取行为符合目标网站的服务条款

Rate Limiting: Frequent requests may lead to being blocked by the target website, it is recommended to add delays
Login Requirements: Some content requires login to access, use browser-use to handle this
Dynamic Content: agent-browser waits for page loading to complete, but interactive mode is required for infinite scroll pages
Legal Compliance: Ensure scraping behavior complies with the target website's terms of service

故障排除

Troubleshooting

问题	解决方案
页面加载超时	使用 `-t` 增加超时时间
内容未渲染	使用交互模式 `-i` 手动等待
反爬虫拦截	尝试不同的 user-agent 或使用 browser-use
截图空白	确保页面完全加载后再截图

Issue	Solution
Page loading timeout	Use `-t` to increase timeout duration
Content not rendered	Use interactive mode `-i` to wait manually
Anti-scraping block	Try different user-agents or use browser-use
Blank screenshot	Ensure the page is fully loaded before taking the screenshot