web-browser-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

macOS Web Browser Automation Guide

macOS浏览器自动化指南

Table of Contents

目录

Overview

概述

This guide covers comprehensive web browser automation on macOS desktop, focusing on automation (not testing). We cover four major automation frameworks with practical examples for real-world scenarios.
PyXA Installation: To use PyXA examples in this skill, see the installation instructions in
automating-mac-apps
skill (PyXA Installation section).
本指南介绍了macOS桌面端的全面浏览器自动化方案,重点在于自动化(而非测试)。我们将讲解四大主流自动化框架,并提供适用于实际场景的示例。
PyXA安装: 若要使用本技能中的PyXA示例,请查看
automating-mac-apps
技能中的安装说明(PyXA安装章节)。

Primary Automation Tools

核心自动化工具

  • PyXA: macOS-native Python wrapper with direct browser integration
  • Playwright: Cross-platform framework with Python bindings for modern web automation
  • Selenium: Industry-standard automation with ChromeDriver integration
  • Puppeteer: Node.js framework for Chrome/Chromium automation
  • PyXA:macOS原生Python包装器,支持直接浏览器集成
  • Playwright:跨平台框架,提供Python绑定,适用于现代网页自动化
  • Selenium:行业标准自动化工具,支持ChromeDriver集成
  • Puppeteer:适用于Chrome/Chromium的Node.js框架

Tool Selection Guide

工具选择指南

ToolPrimary UseKey Advantages
PyXAmacOS-native controlDirect OS integration, Arc spaces
PlaywrightCross-browser testingAuto-waiting, mobile emulation
SeleniumLegacy enterpriseMature ecosystem, wide language support
PuppeteerHeadless ChromeFast execution, PDF generation
See
references/browser-compatibility-matrix.md
for detailed browser support.
工具主要用途核心优势
PyXAmacOS原生控制直接系统集成,支持Arc空间
Playwright跨浏览器测试自动等待,移动设备模拟
Selenium传统企业级场景成熟生态系统,多语言支持
Puppeteer无头Chrome执行速度快,支持PDF生成
详细浏览器支持情况请查看
references/browser-compatibility-matrix.md

Getting Started

快速开始

  1. Choose your framework based on your needs (see Tool Selection Guide above)
  2. Install dependencies for your chosen framework
  3. Follow framework-specific guides linked below
  4. Review workflows for common automation patterns
  1. 根据需求选择框架(参考上方的工具选择指南)
  2. 安装所选框架的依赖
  3. 遵循下方链接的框架专属指南
  4. 查看工作流,了解常见自动化模式

Framework Guides

框架指南

PyXA Browser Integration

PyXA浏览器集成

  • Best for: macOS-native browser control with Arc spaces support
  • Installation:
    pip install PyXA
  • Guide:
    references/pyxa-integration.md
  • 最佳适用场景:支持Arc空间的macOS原生浏览器控制
  • 安装命令
    pip install PyXA
  • 指南链接
    references/pyxa-integration.md

Playwright Automation

Playwright自动化

  • Best for: Cross-browser testing with auto-waiting
  • Installation:
    pip install playwright && playwright install
  • Guide:
    references/playwright-automation.md
  • 最佳适用场景:具备自动等待功能的跨浏览器测试
  • 安装命令
    pip install playwright && playwright install
  • 指南链接
    references/playwright-automation.md

Selenium WebDriver

Selenium WebDriver

  • Best for: Legacy enterprise automation
  • Installation:
    pip install selenium
  • Guide:
    references/selenium-webdriver.md
  • 最佳适用场景:传统企业级自动化
  • 安装命令
    pip install selenium
  • 指南链接
    references/selenium-webdriver.md

Puppeteer Node.js

Puppeteer Node.js

  • Best for: Headless Chrome with PDF generation
  • Installation:
    npm install puppeteer
  • Guide:
    references/puppeteer-automation.md
  • 最佳适用场景:支持PDF生成的无头Chrome自动化
  • 安装命令
    npm install puppeteer
  • 指南链接
    references/puppeteer-automation.md

Automation Workflows

自动化工作流

Complete workflow examples for common automation scenarios:
以下是适用于常见自动化场景的完整工作流示例:

Multi-Browser Tab Management

多浏览器标签管理

Guide:
references/workflows.md#workflow-1-multi-browser-tab-management
指南链接
references/workflows.md#workflow-1-multi-browser-tab-management

Automated Research and Data Collection

自动化调研与数据收集

Guide:
references/workflows.md#workflow-2-automated-research-and-data-collection
指南链接
references/workflows.md#workflow-2-automated-research-and-data-collection

Cross-Browser Testing Suite

跨浏览器测试套件

Guide:
references/workflows.md#workflow-3-cross-browser-testing-suite
指南链接
references/workflows.md#workflow-3-cross-browser-testing-suite

Web Scraping and Data Extraction

网页爬取与数据提取

Guide:
references/workflows.md#workflow-4-web-scraping-and-data-extraction
指南链接
references/workflows.md#workflow-4-web-scraping-and-data-extraction

Brief Automation Patterns

简短自动化模式

Browser Launch and Profile Management

浏览器启动与配置文件管理

python
undefined
python
undefined

PyXA approach for Chrome

PyXA approach for Chrome

chrome = PyXA.Application("Google Chrome") chrome.new_window("https://example.com")
undefined
chrome = PyXA.Application("Google Chrome") chrome.new_window("https://example.com")
undefined

Tab Organization and Grouping

标签页整理与分组

python
undefined
python
undefined

PyXA tab filtering

PyXA tab filtering

tabs = chrome.windows()[0].tabs() work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()] for tab in work_tabs: tab.close()
undefined
tabs = chrome.windows()[0].tabs() work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()] for tab in work_tabs: tab.close()
undefined

JavaScript Injection for Content Extraction

JavaScript注入实现内容提取

python
undefined
python
undefined

PyXA JavaScript execution

PyXA JavaScript execution

content = tab.execute_javascript("document.body.innerText") links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")
undefined
content = tab.execute_javascript("document.body.innerText") links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")
undefined

Cross-Browser Synchronization

跨浏览器同步操作

python
undefined
python
undefined

PyXA multi-browser control

PyXA multi-browser control

browsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")] for browser in browsers: browser.new_tab("https://shared-resource.com")
undefined
browsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")] for browser in browsers: browser.new_tab("https://shared-resource.com")
undefined

Form Filling and Interaction

表单填写与交互

python
undefined
python
undefined

Playwright auto-waiting

Playwright auto-waiting

page.fill("#username", "user@example.com") page.click("text=Submit") # Auto-waits for element
undefined
page.fill("#username", "user@example.com") page.click("text=Submit") # Auto-waits for element
undefined

Screenshot and Content Capture

截图与内容捕获

javascript
// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });
javascript
// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });

Playwright Quickstart

Playwright快速入门

python
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.click("text=Get Started")  # Auto-waits for element
    page.fill("#search-input", "automation")
    browser.close()
For advanced Playwright features (contexts, viewports, dynamic content), see
references/playwright-automation.md
.
python
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.click("text=Get Started")  # Auto-waits for element
    page.fill("#search-input", "automation")
    browser.close()
如需了解Playwright的高级功能(上下文、视口、动态内容处理等),请查看
references/playwright-automation.md

Additional Resources

额外资源

Advanced Techniques

高级技巧

  • Network Interception:
    references/playwright-automation.md#network-interception
  • Parallel Browser Automation:
    references/selenium-webdriver.md#parallel-testing
  • Performance Monitoring:
    references/puppeteer-automation.md#performance-monitoring
  • Browser Context Management:
    references/selenium-webdriver.md#browser-contexts-and-pages
  • 网络拦截
    references/playwright-automation.md#network-interception
  • 并行浏览器自动化
    references/selenium-webdriver.md#parallel-testing
  • 性能监控
    references/puppeteer-automation.md#performance-monitoring
  • 浏览器上下文管理
    references/selenium-webdriver.md#browser-contexts-and-pages

Validation Checklist

验证清单

After implementing browser automation:
  • Verify browser launches without errors
  • Confirm page navigation completes successfully
  • Test element selectors locate expected elements
  • Validate extracted data matches page content
  • Check screenshots/PDFs are generated correctly
  • For PyXA: verify macOS permissions are granted
实现浏览器自动化后,请完成以下验证:
  • 验证浏览器可正常启动,无报错
  • 确认页面导航可成功完成
  • 测试元素选择器可定位到预期元素
  • 验证提取的数据与页面内容一致
  • 检查截图/PDF可正确生成
  • 对于PyXA:确认已授予macOS权限

Troubleshooting

故障排查

  • Element Not Found Errors: Common solutions across frameworks
  • Stale Element References: Handling dynamic content
  • Browser Detection: Avoiding automation detection
  • Network Timeout Issues: Timeout configuration
  • 元素未找到错误:各框架通用解决方案
  • 过时元素引用:处理动态内容的方法
  • 浏览器检测规避:避免自动化被检测
  • 网络超时问题:超时配置方法

Security Considerations

安全注意事项

  • Credential Management: Secure storage of login credentials
  • Certificate Handling: SSL/TLS certificate validation
  • Sandbox and Isolation: Running automation in isolated environments
  • Data Sanitization: Cleaning extracted data
  • 凭证管理:安全存储登录凭证
  • 证书处理:SSL/TLS证书验证
  • 沙箱与隔离:在隔离环境中运行自动化
  • 数据清理:清洗提取的数据

Performance Optimization

性能优化

  • Browser Configuration: Disabling unnecessary features
  • Network Optimization: Blocking unwanted resources
  • Parallel Execution: Running tests concurrently
  • Resource Pooling: Managing browser instances efficiently
  • 浏览器配置:禁用不必要的功能
  • 网络优化:拦截不必要的资源
  • 并行执行:同时运行多个自动化任务
  • 资源池化:高效管理浏览器实例

When Not to Use

不适用场景

  • For mobile browser automation (use Appium or native testing frameworks)
  • For Windows/Linux-only environments (some PyXA features are macOS-only)
  • When CAPTCHA solving is required (use specialized services)
  • For production scraping without respecting robots.txt
  • 移动端浏览器自动化(请使用Appium或原生测试框架)
  • 仅Windows/Linux的环境(部分PyXA功能仅支持macOS)
  • 需要解决CAPTCHA的场景(请使用专门的服务)
  • 未遵守robots.txt的生产环境爬取

What to Load

需加载的资源

  • PyXA:
    references/pyxa-integration.md
    - macOS-native browser control
  • Playwright:
    references/playwright-automation.md
    - Cross-browser testing
  • Selenium:
    references/selenium-webdriver.md
    - Enterprise automation
  • Puppeteer:
    references/puppeteer-automation.md
    - Node.js Chrome automation
  • Workflows:
    references/workflows.md
    - Complete automation scenarios
  • PyXA
    references/pyxa-integration.md
    - macOS原生浏览器控制
  • Playwright
    references/playwright-automation.md
    - 跨浏览器测试
  • Selenium
    references/selenium-webdriver.md
    - 企业级自动化
  • Puppeteer
    references/puppeteer-automation.md
    - Node.js Chrome自动化
  • 工作流
    references/workflows.md
    - 完整自动化场景