browser-agent

Original🇨🇳 Chinese
Translated

AI-powered browser automation toolset, including agent-browser (accessibility tree extraction), actionbook (50+ website automation recipes), and browser-use (Python automation library). Use cases: (1) Scrape web content that requires JS rendering (2) Fetch data from platforms like X/Twitter, GitHub, Reddit, etc. (3) Take web page screenshots (4) Automate browser operations (5) Retrieve the accessibility tree structure of web pages. Use this skill when you need to access dynamic web pages, bypass anti-scraping measures, or perform browser automation.

10installs
Added on

NPX Install

npx skill4agent add azure12355/weilan-skills browser-agent

SKILL.md Content (Chinese)

View Translation Comparison →

Browser Agent

AI Agent browser automation toolset that provides three complementary tools for web data retrieval and automation operations.

Tool Selection Guide

User Request
    ├── Simple static content scraping?
    │   └── Use curl / WebFetch (faster)
    ├── Need JS rendering / bypass anti-scraping?
    │   ├── agent-browser ── Extract accessibility tree
    │   │
    │   ├── Screenshot? ── agent-browser -s
    │   │
    │   └── Target site in actionbook list?
    │       └── actionbook get <site> ── Get dedicated recipe
    └── Complex multi-step automation?
        └── browser-use (Python) ── AI-powered autonomous operation

agent-browser

CLI tool that uses Playwright to launch a headless browser and extract the page's accessibility tree.
Core Advantages:
  • Access most content without login
  • Get structured, readable text
  • Supports screenshots
  • Automatically handles JS rendering

Basic Usage

bash
# Extract web content (accessibility tree)
agent-browser <URL>

# Take screenshot
agent-browser -s <URL>

# Specify output format
agent-browser -f markdown <URL>
agent-browser -f html <URL>
agent-browser -f text <URL>

# Interactive mode (click, scroll available)
agent-browser -i <URL>

# Specify browser
agent-browser --browser chromium <URL>
agent-browser --browser firefox <URL>

Common Scenarios

bash
# Get X/Twitter post content
agent-browser "https://x.com/username/status/123456"

# Get GitHub repository information
agent-browser "https://github.com/owner/repo"

# Get Reddit post
agent-browser "https://reddit.com/r/subreddit/comments/abc123"

# Get news article (JS-rendered)
agent-browser "https://example.com/article"
Detailed Command Reference: references/agent-browser-reference.md

actionbook

Pre-computed automation "recipes" for 50+ websites, providing validated automation templates.

Basic Usage

bash
# List all supported websites
actionbook list

# Get recipe for a specific site
actionbook get <site>

# Examples
actionbook get github
actionbook get reddit
actionbook get amazon

Workflow

  1. Run
    actionbook list
    to view supported websites
  2. Run
    actionbook get <site>
    to obtain the automation template for that site
  3. Write automation scripts based on the template (using browser-use or directly with agent-browser)
Detailed Command Reference: references/actionbook-reference.md

browser-use

Python library that uses AI to autonomously control browsers for complex tasks.

Installation

bash
pip install browser-use
playwright install chromium

Basic Usage

python
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to GitHub and find the trending Python repositories",
        llm=ChatOpenAI(model="gpt-4"),
    )
    result = await agent.run()
    print(result)

Common Scenarios

python
# Form filling
agent = Agent(
    task="Go to example.com and fill out the contact form with test data",
    llm=llm,
)

# Data scraping
agent = Agent(
    task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices",
    llm=llm,
)

# Multi-step operations
agent = Agent(
    task="Log into Twitter, navigate to settings, and enable two-factor authentication",
    llm=llm,
)
Detailed API Reference: references/browser-use-reference.md

Decision Flow

Task Type → Tool Selection

Task TypeRecommended ToolReason
Quick single-page scrapingagent-browserSimple and straightforward, accessibility tree output
Need page screenshotsagent-browser -sBuilt-in screenshot functionality
Target site is in actionbookactionbook + browser-useReady-made best practices available
Complex multi-step operationsbrowser-useAI autonomous decision-making and execution
Sites requiring loginbrowser-useCan handle login flows
Batch data collectionbrowser-useSupports loops and conditional judgments

Example Workflows

Scenario: Get X/Twitter Post Content
bash
# Method 1: Directly use agent-browser (recommended)
agent-browser "https://x.com/username/status/123456"

# Method 2: Use browser-use for more complex operations
# Write a Python script
Scenario: GitHub Trending Analysis
bash
# Method 1: agent-browser
agent-browser "https://github.com/trending"

# Method 2: Use actionbook to get GitHub recipe
actionbook get github
# Then write a script based on the recipe

Notes

  1. Rate Limiting: Frequent requests may lead to being blocked by the target website, it is recommended to add delays
  2. Login Requirements: Some content requires login to access, use browser-use to handle this
  3. Dynamic Content: agent-browser waits for page loading to complete, but interactive mode is required for infinite scroll pages
  4. Legal Compliance: Ensure scraping behavior complies with the target website's terms of service

Troubleshooting

IssueSolution
Page loading timeoutUse
-t
to increase timeout duration
Content not renderedUse interactive mode
-i
to wait manually
Anti-scraping blockTry different user-agents or use browser-use
Blank screenshotEnsure the page is fully loaded before taking the screenshot