browser-agent
Original:🇨🇳 Chinese
Translated
AI-powered browser automation toolset, including agent-browser (accessibility tree extraction), actionbook (50+ website automation recipes), and browser-use (Python automation library). Use cases: (1) Scrape web content that requires JS rendering (2) Fetch data from platforms like X/Twitter, GitHub, Reddit, etc. (3) Take web page screenshots (4) Automate browser operations (5) Retrieve the accessibility tree structure of web pages. Use this skill when you need to access dynamic web pages, bypass anti-scraping measures, or perform browser automation.
10installs
Sourceazure12355/weilan-skills
Added on
NPX Install
npx skill4agent add azure12355/weilan-skills browser-agentTags
Translated version includes tags in frontmatterSKILL.md Content (Chinese)
View Translation Comparison →Browser Agent
AI Agent browser automation toolset that provides three complementary tools for web data retrieval and automation operations.
Tool Selection Guide
User Request
│
├── Simple static content scraping?
│ └── Use curl / WebFetch (faster)
│
├── Need JS rendering / bypass anti-scraping?
│ ├── agent-browser ── Extract accessibility tree
│ │
│ ├── Screenshot? ── agent-browser -s
│ │
│ └── Target site in actionbook list?
│ └── actionbook get <site> ── Get dedicated recipe
│
└── Complex multi-step automation?
└── browser-use (Python) ── AI-powered autonomous operationagent-browser
CLI tool that uses Playwright to launch a headless browser and extract the page's accessibility tree.
Core Advantages:
- Access most content without login
- Get structured, readable text
- Supports screenshots
- Automatically handles JS rendering
Basic Usage
bash
# Extract web content (accessibility tree)
agent-browser <URL>
# Take screenshot
agent-browser -s <URL>
# Specify output format
agent-browser -f markdown <URL>
agent-browser -f html <URL>
agent-browser -f text <URL>
# Interactive mode (click, scroll available)
agent-browser -i <URL>
# Specify browser
agent-browser --browser chromium <URL>
agent-browser --browser firefox <URL>Common Scenarios
bash
# Get X/Twitter post content
agent-browser "https://x.com/username/status/123456"
# Get GitHub repository information
agent-browser "https://github.com/owner/repo"
# Get Reddit post
agent-browser "https://reddit.com/r/subreddit/comments/abc123"
# Get news article (JS-rendered)
agent-browser "https://example.com/article"Detailed Command Reference: references/agent-browser-reference.md
actionbook
Pre-computed automation "recipes" for 50+ websites, providing validated automation templates.
Basic Usage
bash
# List all supported websites
actionbook list
# Get recipe for a specific site
actionbook get <site>
# Examples
actionbook get github
actionbook get reddit
actionbook get amazonWorkflow
- Run to view supported websites
actionbook list - Run to obtain the automation template for that site
actionbook get <site> - Write automation scripts based on the template (using browser-use or directly with agent-browser)
Detailed Command Reference: references/actionbook-reference.md
browser-use
Python library that uses AI to autonomously control browsers for complex tasks.
Installation
bash
pip install browser-use
playwright install chromiumBasic Usage
python
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task="Go to GitHub and find the trending Python repositories",
llm=ChatOpenAI(model="gpt-4"),
)
result = await agent.run()
print(result)Common Scenarios
python
# Form filling
agent = Agent(
task="Go to example.com and fill out the contact form with test data",
llm=llm,
)
# Data scraping
agent = Agent(
task="Go to Amazon, search for 'wireless headphones', and extract the top 5 products with prices",
llm=llm,
)
# Multi-step operations
agent = Agent(
task="Log into Twitter, navigate to settings, and enable two-factor authentication",
llm=llm,
)Detailed API Reference: references/browser-use-reference.md
Decision Flow
Task Type → Tool Selection
| Task Type | Recommended Tool | Reason |
|---|---|---|
| Quick single-page scraping | agent-browser | Simple and straightforward, accessibility tree output |
| Need page screenshots | agent-browser -s | Built-in screenshot functionality |
| Target site is in actionbook | actionbook + browser-use | Ready-made best practices available |
| Complex multi-step operations | browser-use | AI autonomous decision-making and execution |
| Sites requiring login | browser-use | Can handle login flows |
| Batch data collection | browser-use | Supports loops and conditional judgments |
Example Workflows
Scenario: Get X/Twitter Post Content
bash
# Method 1: Directly use agent-browser (recommended)
agent-browser "https://x.com/username/status/123456"
# Method 2: Use browser-use for more complex operations
# Write a Python scriptScenario: GitHub Trending Analysis
bash
# Method 1: agent-browser
agent-browser "https://github.com/trending"
# Method 2: Use actionbook to get GitHub recipe
actionbook get github
# Then write a script based on the recipeNotes
- Rate Limiting: Frequent requests may lead to being blocked by the target website, it is recommended to add delays
- Login Requirements: Some content requires login to access, use browser-use to handle this
- Dynamic Content: agent-browser waits for page loading to complete, but interactive mode is required for infinite scroll pages
- Legal Compliance: Ensure scraping behavior complies with the target website's terms of service
Troubleshooting
| Issue | Solution |
|---|---|
| Page loading timeout | Use |
| Content not rendered | Use interactive mode |
| Anti-scraping block | Try different user-agents or use browser-use |
| Blank screenshot | Ensure the page is fully loaded before taking the screenshot |