web-browser-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

macOS Web Browser Automation Guide

macOS浏览器自动化指南

Overview

概述

This guide covers comprehensive web browser automation on macOS desktop, focusing on automation (not testing). We cover four major automation frameworks with practical examples for real-world scenarios.

PyXA Installation: To use PyXA examples in this skill, see the installation instructions in

automating-mac-apps

skill (PyXA Installation section).

本指南介绍了macOS桌面端的全面浏览器自动化方案，重点在于自动化（而非测试）。我们将讲解四大主流自动化框架，并提供适用于实际场景的示例。

PyXA安装： 若要使用本技能中的PyXA示例，请查看

automating-mac-apps

技能中的安装说明（PyXA安装章节）。

Primary Automation Tools

核心自动化工具

PyXA: macOS-native Python wrapper with direct browser integration
Playwright: Cross-platform framework with Python bindings for modern web automation
Selenium: Industry-standard automation with ChromeDriver integration
Puppeteer: Node.js framework for Chrome/Chromium automation

PyXA：macOS原生Python包装器，支持直接浏览器集成
Playwright：跨平台框架，提供Python绑定，适用于现代网页自动化
Selenium：行业标准自动化工具，支持ChromeDriver集成
Puppeteer：适用于Chrome/Chromium的Node.js框架

Tool Selection Guide

工具选择指南

Tool	Primary Use	Key Advantages
PyXA	macOS-native control	Direct OS integration, Arc spaces
Playwright	Cross-browser testing	Auto-waiting, mobile emulation
Selenium	Legacy enterprise	Mature ecosystem, wide language support
Puppeteer	Headless Chrome	Fast execution, PDF generation

See

references/browser-compatibility-matrix.md

for detailed browser support.

工具	主要用途	核心优势
PyXA	macOS原生控制	直接系统集成，支持Arc空间
Playwright	跨浏览器测试	自动等待，移动设备模拟
Selenium	传统企业级场景	成熟生态系统，多语言支持
Puppeteer	无头Chrome	执行速度快，支持PDF生成

详细浏览器支持情况请查看

references/browser-compatibility-matrix.md

。

Getting Started

快速开始

Choose your framework based on your needs (see Tool Selection Guide above)
Install dependencies for your chosen framework
Follow framework-specific guides linked below
Review workflows for common automation patterns

根据需求选择框架（参考上方的工具选择指南）
安装所选框架的依赖
遵循下方链接的框架专属指南
查看工作流，了解常见自动化模式

Framework Guides

框架指南

PyXA Browser Integration

PyXA浏览器集成

Best for: macOS-native browser control with Arc spaces support
Installation:
```
pip install PyXA
```
Guide:
```
references/pyxa-integration.md
```

最佳适用场景：支持Arc空间的macOS原生浏览器控制
安装命令：
```
pip install PyXA
```
指南链接：
```
references/pyxa-integration.md
```

Playwright Automation

Playwright自动化

Best for: Cross-browser testing with auto-waiting

Installation:

pip install playwright && playwright install

Guide:
```
references/playwright-automation.md
```

最佳适用场景：具备自动等待功能的跨浏览器测试

安装命令：

pip install playwright && playwright install

指南链接：
```
references/playwright-automation.md
```

Selenium WebDriver

Best for: Legacy enterprise automation
Installation:
```
pip install selenium
```
Guide:
```
references/selenium-webdriver.md
```

最佳适用场景：传统企业级自动化
安装命令：
```
pip install selenium
```
指南链接：
```
references/selenium-webdriver.md
```

Puppeteer Node.js

Best for: Headless Chrome with PDF generation
Installation:
```
npm install puppeteer
```
Guide:
```
references/puppeteer-automation.md
```

最佳适用场景：支持PDF生成的无头Chrome自动化
安装命令：
```
npm install puppeteer
```
指南链接：
```
references/puppeteer-automation.md
```

Automation Workflows

自动化工作流

Complete workflow examples for common automation scenarios:

以下是适用于常见自动化场景的完整工作流示例：

Multi-Browser Tab Management

多浏览器标签管理

Guide:

references/workflows.md#workflow-1-multi-browser-tab-management

指南链接：

references/workflows.md#workflow-1-multi-browser-tab-management

Automated Research and Data Collection

自动化调研与数据收集

Guide:

references/workflows.md#workflow-2-automated-research-and-data-collection

指南链接：

references/workflows.md#workflow-2-automated-research-and-data-collection

Cross-Browser Testing Suite

跨浏览器测试套件

Guide:

references/workflows.md#workflow-3-cross-browser-testing-suite

指南链接：

references/workflows.md#workflow-3-cross-browser-testing-suite

Web Scraping and Data Extraction

网页爬取与数据提取

Guide:

references/workflows.md#workflow-4-web-scraping-and-data-extraction

指南链接：

references/workflows.md#workflow-4-web-scraping-and-data-extraction

Brief Automation Patterns

简短自动化模式

Browser Launch and Profile Management

浏览器启动与配置文件管理

python

undefined

python

undefined

PyXA approach for Chrome

chrome = PyXA.Application("Google Chrome") chrome.new_window("https://example.com")

undefined

chrome = PyXA.Application("Google Chrome") chrome.new_window("https://example.com")

undefined

Tab Organization and Grouping

标签页整理与分组

python

undefined

python

undefined

PyXA tab filtering

tabs = chrome.windows()[0].tabs() work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()] for tab in work_tabs: tab.close()

undefined

tabs = chrome.windows()[0].tabs() work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()] for tab in work_tabs: tab.close()

undefined

JavaScript Injection for Content Extraction

JavaScript注入实现内容提取

python

undefined

python

undefined

PyXA JavaScript execution

content = tab.execute_javascript("document.body.innerText") links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")

undefined

content = tab.execute_javascript("document.body.innerText") links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")

undefined

Cross-Browser Synchronization

跨浏览器同步操作

python

undefined

python

undefined

PyXA multi-browser control

browsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")] for browser in browsers: browser.new_tab("https://shared-resource.com")

undefined

browsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")] for browser in browsers: browser.new_tab("https://shared-resource.com")

undefined

Form Filling and Interaction

表单填写与交互

python

undefined

python

undefined

Playwright auto-waiting

page.fill("#username", "user@example.com") page.click("text=Submit") # Auto-waits for element

undefined

page.fill("#username", "user@example.com") page.click("text=Submit") # Auto-waits for element

undefined

Screenshot and Content Capture

截图与内容捕获

javascript

// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });

javascript

// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });

Playwright Quickstart

Playwright快速入门

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.click("text=Get Started")  # Auto-waits for element
    page.fill("#search-input", "automation")
    browser.close()

For advanced Playwright features (contexts, viewports, dynamic content), see

references/playwright-automation.md

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.click("text=Get Started")  # Auto-waits for element
    page.fill("#search-input", "automation")
    browser.close()

如需了解Playwright的高级功能（上下文、视口、动态内容处理等），请查看

references/playwright-automation.md

。

Additional Resources

额外资源

Advanced Techniques

高级技巧

Network Interception:

references/playwright-automation.md#network-interception

Parallel Browser Automation:

references/selenium-webdriver.md#parallel-testing

Performance Monitoring:

references/puppeteer-automation.md#performance-monitoring

Browser Context Management:

references/selenium-webdriver.md#browser-contexts-and-pages

网络拦截：

references/playwright-automation.md#network-interception

并行浏览器自动化：

references/selenium-webdriver.md#parallel-testing

性能监控：

references/puppeteer-automation.md#performance-monitoring

浏览器上下文管理：

references/selenium-webdriver.md#browser-contexts-and-pages

Validation Checklist

验证清单

Troubleshooting

故障排查

Element Not Found Errors: Common solutions across frameworks
Stale Element References: Handling dynamic content
Browser Detection: Avoiding automation detection
Network Timeout Issues: Timeout configuration

元素未找到错误：各框架通用解决方案
过时元素引用：处理动态内容的方法
浏览器检测规避：避免自动化被检测
网络超时问题：超时配置方法

Security Considerations

安全注意事项

Credential Management: Secure storage of login credentials
Certificate Handling: SSL/TLS certificate validation
Sandbox and Isolation: Running automation in isolated environments
Data Sanitization: Cleaning extracted data

凭证管理：安全存储登录凭证
证书处理：SSL/TLS证书验证
沙箱与隔离：在隔离环境中运行自动化
数据清理：清洗提取的数据

Performance Optimization

性能优化

Browser Configuration: Disabling unnecessary features
Network Optimization: Blocking unwanted resources
Parallel Execution: Running tests concurrently
Resource Pooling: Managing browser instances efficiently

浏览器配置：禁用不必要的功能
网络优化：拦截不必要的资源
并行执行：同时运行多个自动化任务
资源池化：高效管理浏览器实例

When Not to Use

不适用场景

For mobile browser automation (use Appium or native testing frameworks)
For Windows/Linux-only environments (some PyXA features are macOS-only)
When CAPTCHA solving is required (use specialized services)
For production scraping without respecting robots.txt

移动端浏览器自动化（请使用Appium或原生测试框架）
仅Windows/Linux的环境（部分PyXA功能仅支持macOS）
需要解决CAPTCHA的场景（请使用专门的服务）
未遵守robots.txt的生产环境爬取

What to Load

需加载的资源

PyXA:
```
references/pyxa-integration.md
```
- macOS-native browser control
Playwright:
```
references/playwright-automation.md
```
- Cross-browser testing
Selenium:
```
references/selenium-webdriver.md
```
- Enterprise automation
Puppeteer:
```
references/puppeteer-automation.md
```
- Node.js Chrome automation
Workflows:
```
references/workflows.md
```
- Complete automation scenarios

PyXA：
```
references/pyxa-integration.md
```
- macOS原生浏览器控制
Playwright：
```
references/playwright-automation.md
```
- 跨浏览器测试
Selenium：
```
references/selenium-webdriver.md
```
- 企业级自动化
Puppeteer：
```
references/puppeteer-automation.md
```
- Node.js Chrome自动化
工作流：
```
references/workflows.md
```
- 完整自动化场景

web-browser-automation

Original

Translation

macOS Web Browser Automation Guide

macOS浏览器自动化指南

Table of Contents

目录

Overview

概述

Primary Automation Tools

核心自动化工具

Tool Selection Guide

工具选择指南

Getting Started

快速开始

Framework Guides

框架指南

PyXA Browser Integration

PyXA浏览器集成

Playwright Automation

Playwright自动化

Selenium WebDriver

Selenium WebDriver

Puppeteer Node.js

Puppeteer Node.js

Automation Workflows

自动化工作流

Multi-Browser Tab Management

多浏览器标签管理

Automated Research and Data Collection

自动化调研与数据收集

Cross-Browser Testing Suite

跨浏览器测试套件

Web Scraping and Data Extraction

网页爬取与数据提取

Brief Automation Patterns

简短自动化模式

Browser Launch and Profile Management

浏览器启动与配置文件管理

PyXA approach for Chrome

PyXA approach for Chrome

Tab Organization and Grouping

标签页整理与分组

PyXA tab filtering

PyXA tab filtering

JavaScript Injection for Content Extraction

JavaScript注入实现内容提取

PyXA JavaScript execution

PyXA JavaScript execution

Cross-Browser Synchronization

跨浏览器同步操作

PyXA multi-browser control

PyXA multi-browser control

Form Filling and Interaction

表单填写与交互

Playwright auto-waiting

Playwright auto-waiting

Screenshot and Content Capture

截图与内容捕获

Playwright Quickstart

Playwright快速入门

Additional Resources

额外资源

Advanced Techniques

高级技巧

Validation Checklist

验证清单

Troubleshooting

故障排查

Security Considerations

安全注意事项

Performance Optimization

性能优化

When Not to Use

不适用场景

What to Load

需加载的资源