Browser Agent

AI Agent browser automation toolset that provides three complementary tools for web data retrieval and automation operations.

Tool Selection Guide

User Request
    │
    ├── Simple static content scraping?
    │   └── Use curl / WebFetch (faster)
    │
    ├── Need JS rendering / bypass anti-scraping?
    │   ├── agent-browser ── Extract accessibility tree
    │   │
    │   ├── Screenshot? ── agent-browser -s
    │   │
    │   └── Target site in actionbook list?
    │       └── actionbook get <site> ── Get dedicated recipe
    │
    └── Complex multi-step automation?
        └── browser-use (Python) ── AI-powered autonomous operation

agent-browser

CLI tool that uses Playwright to launch a headless browser and extract the page's accessibility tree.

Core Advantages:

Access most content without login
Get structured, readable text
Supports screenshots
Automatically handles JS rendering

Basic Usage

bash

# Extract web content (accessibility tree)
agent-browser <URL>

# Take screenshot
agent-browser -s <URL>

# Specify output format
agent-browser -f markdown <URL>
agent-browser -f html <URL>
agent-browser -f text <URL>

# Interactive mode (click, scroll available)
agent-browser -i <URL>

# Specify browser
agent-browser --browser chromium <URL>
agent-browser --browser firefox <URL>

Common Scenarios

bash

# Get X/Twitter post content
agent-browser "https://x.com/username/status/123456"

# Get GitHub repository information
agent-browser "https://github.com/owner/repo"

# Get Reddit post
agent-browser "https://reddit.com/r/subreddit/comments/abc123"

# Get news article (JS-rendered)
agent-browser "https://example.com/article"

Detailed Command Reference: references/agent-browser-reference.md

actionbook

Pre-computed automation "recipes" for 50+ websites, providing validated automation templates.

Basic Usage

bash

# List all supported websites
actionbook list

# Get recipe for a specific site
actionbook get <site>

# Examples
actionbook get github
actionbook get reddit
actionbook get amazon

Workflow

Run
```
actionbook list
```
to view supported websites
Run
```
actionbook get <site>
```
to obtain the automation template for that site
Write automation scripts based on the template (using browser-use or directly with agent-browser)

Detailed Command Reference: references/actionbook-reference.md

browser-use

Python library that uses AI to autonomously control browsers for complex tasks.

Installation

bash

pip install browser-use
playwright install chromium

Basic Usage

python

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)

Common Scenarios

python

# Form filling
agent = Agent(
    task="Go to example.com and fill out the contact form with test data",
    llm=llm,
)

# Data scraping
agent = Agent(
    task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices",
    llm=llm,
)

# Multi-step operations
agent = Agent(
    task="Log into Twitter, navigate to settings, and enable two-factor authentication",
    llm=llm,
)

Detailed API Reference: references/browser-use-reference.md

Decision Flow

Task Type → Tool Selection

Task Type	Recommended Tool	Reason
Quick single-page scraping	agent-browser	Simple and straightforward, accessibility tree output
Need page screenshots	agent-browser -s	Built-in screenshot functionality
Target site is in actionbook	actionbook + browser-use	Ready-made best practices available
Complex multi-step operations	browser-use	AI autonomous decision-making and execution
Sites requiring login	browser-use	Can handle login flows
Batch data collection	browser-use	Supports loops and conditional judgments

Example Workflows

Scenario: Get X/Twitter Post Content

bash

# Method 1: Directly use agent-browser (recommended)
agent-browser "https://x.com/username/status/123456"

# Method 2: Use browser-use for more complex operations
# Write a Python script

Scenario: GitHub Trending Analysis

bash

# Method 1: agent-browser
agent-browser "https://github.com/trending"

# Method 2: Use actionbook to get GitHub recipe
actionbook get github
# Then write a script based on the recipe

Notes

Rate Limiting: Frequent requests may lead to being blocked by the target website, it is recommended to add delays
Login Requirements: Some content requires login to access, use browser-use to handle this
Dynamic Content: agent-browser waits for page loading to complete, but interactive mode is required for infinite scroll pages
Legal Compliance: Ensure scraping behavior complies with the target website's terms of service

Troubleshooting

Issue	Solution
Page loading timeout	Use `-t` to increase timeout duration
Content not rendered	Use interactive mode `-i` to wait manually
Anti-scraping block	Try different user-agents or use browser-use
Blank screenshot	Ensure the page is fully loaded before taking the screenshot

browser-agent

NPX Install

Tags

SKILL.md Content (Chinese)

Browser Agent

Tool Selection Guide

agent-browser

Basic Usage

Common Scenarios

actionbook

Basic Usage

Workflow

browser-use

Installation

Basic Usage

Common Scenarios

Decision Flow

Task Type → Tool Selection

Example Workflows

Notes

Troubleshooting