web-browser-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesemacOS Web Browser Automation Guide
macOS浏览器自动化指南
Table of Contents
目录
Overview
概述
This guide covers comprehensive web browser automation on macOS desktop, focusing on automation (not testing). We cover four major automation frameworks with practical examples for real-world scenarios.
PyXA Installation: To use PyXA examples in this skill, see the installation instructions in skill (PyXA Installation section).
automating-mac-apps本指南介绍了macOS桌面端的全面浏览器自动化方案,重点在于自动化(而非测试)。我们将讲解四大主流自动化框架,并提供适用于实际场景的示例。
PyXA安装: 若要使用本技能中的PyXA示例,请查看技能中的安装说明(PyXA安装章节)。
automating-mac-appsPrimary Automation Tools
核心自动化工具
- PyXA: macOS-native Python wrapper with direct browser integration
- Playwright: Cross-platform framework with Python bindings for modern web automation
- Selenium: Industry-standard automation with ChromeDriver integration
- Puppeteer: Node.js framework for Chrome/Chromium automation
- PyXA:macOS原生Python包装器,支持直接浏览器集成
- Playwright:跨平台框架,提供Python绑定,适用于现代网页自动化
- Selenium:行业标准自动化工具,支持ChromeDriver集成
- Puppeteer:适用于Chrome/Chromium的Node.js框架
Tool Selection Guide
工具选择指南
| Tool | Primary Use | Key Advantages |
|---|---|---|
| PyXA | macOS-native control | Direct OS integration, Arc spaces |
| Playwright | Cross-browser testing | Auto-waiting, mobile emulation |
| Selenium | Legacy enterprise | Mature ecosystem, wide language support |
| Puppeteer | Headless Chrome | Fast execution, PDF generation |
See for detailed browser support.
references/browser-compatibility-matrix.md| 工具 | 主要用途 | 核心优势 |
|---|---|---|
| PyXA | macOS原生控制 | 直接系统集成,支持Arc空间 |
| Playwright | 跨浏览器测试 | 自动等待,移动设备模拟 |
| Selenium | 传统企业级场景 | 成熟生态系统,多语言支持 |
| Puppeteer | 无头Chrome | 执行速度快,支持PDF生成 |
详细浏览器支持情况请查看。
references/browser-compatibility-matrix.mdGetting Started
快速开始
- Choose your framework based on your needs (see Tool Selection Guide above)
- Install dependencies for your chosen framework
- Follow framework-specific guides linked below
- Review workflows for common automation patterns
- 根据需求选择框架(参考上方的工具选择指南)
- 安装所选框架的依赖
- 遵循下方链接的框架专属指南
- 查看工作流,了解常见自动化模式
Framework Guides
框架指南
PyXA Browser Integration
PyXA浏览器集成
- Best for: macOS-native browser control with Arc spaces support
- Installation:
pip install PyXA - Guide:
references/pyxa-integration.md
- 最佳适用场景:支持Arc空间的macOS原生浏览器控制
- 安装命令:
pip install PyXA - 指南链接:
references/pyxa-integration.md
Playwright Automation
Playwright自动化
- Best for: Cross-browser testing with auto-waiting
- Installation:
pip install playwright && playwright install - Guide:
references/playwright-automation.md
- 最佳适用场景:具备自动等待功能的跨浏览器测试
- 安装命令:
pip install playwright && playwright install - 指南链接:
references/playwright-automation.md
Selenium WebDriver
Selenium WebDriver
- Best for: Legacy enterprise automation
- Installation:
pip install selenium - Guide:
references/selenium-webdriver.md
- 最佳适用场景:传统企业级自动化
- 安装命令:
pip install selenium - 指南链接:
references/selenium-webdriver.md
Puppeteer Node.js
Puppeteer Node.js
- Best for: Headless Chrome with PDF generation
- Installation:
npm install puppeteer - Guide:
references/puppeteer-automation.md
- 最佳适用场景:支持PDF生成的无头Chrome自动化
- 安装命令:
npm install puppeteer - 指南链接:
references/puppeteer-automation.md
Automation Workflows
自动化工作流
Complete workflow examples for common automation scenarios:
以下是适用于常见自动化场景的完整工作流示例:
Multi-Browser Tab Management
多浏览器标签管理
Guide:
references/workflows.md#workflow-1-multi-browser-tab-management指南链接:
references/workflows.md#workflow-1-multi-browser-tab-managementAutomated Research and Data Collection
自动化调研与数据收集
Guide:
references/workflows.md#workflow-2-automated-research-and-data-collection指南链接:
references/workflows.md#workflow-2-automated-research-and-data-collectionCross-Browser Testing Suite
跨浏览器测试套件
Guide:
references/workflows.md#workflow-3-cross-browser-testing-suite指南链接:
references/workflows.md#workflow-3-cross-browser-testing-suiteWeb Scraping and Data Extraction
网页爬取与数据提取
Guide:
references/workflows.md#workflow-4-web-scraping-and-data-extraction指南链接:
references/workflows.md#workflow-4-web-scraping-and-data-extractionBrief Automation Patterns
简短自动化模式
Browser Launch and Profile Management
浏览器启动与配置文件管理
python
undefinedpython
undefinedPyXA approach for Chrome
PyXA approach for Chrome
chrome = PyXA.Application("Google Chrome")
chrome.new_window("https://example.com")
undefinedchrome = PyXA.Application("Google Chrome")
chrome.new_window("https://example.com")
undefinedTab Organization and Grouping
标签页整理与分组
python
undefinedpython
undefinedPyXA tab filtering
PyXA tab filtering
tabs = chrome.windows()[0].tabs()
work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()]
for tab in work_tabs:
tab.close()
undefinedtabs = chrome.windows()[0].tabs()
work_tabs = [tab for tab in tabs if "meeting" in tab.title().lower()]
for tab in work_tabs:
tab.close()
undefinedJavaScript Injection for Content Extraction
JavaScript注入实现内容提取
python
undefinedpython
undefinedPyXA JavaScript execution
PyXA JavaScript execution
content = tab.execute_javascript("document.body.innerText")
links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")
undefinedcontent = tab.execute_javascript("document.body.innerText")
links = tab.execute_javascript("Array.from(document.querySelectorAll('a')).map(a => a.href)")
undefinedCross-Browser Synchronization
跨浏览器同步操作
python
undefinedpython
undefinedPyXA multi-browser control
PyXA multi-browser control
browsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")]
for browser in browsers:
browser.new_tab("https://shared-resource.com")
undefinedbrowsers = [PyXA.Application("Google Chrome"), PyXA.Application("Microsoft Edge")]
for browser in browsers:
browser.new_tab("https://shared-resource.com")
undefinedForm Filling and Interaction
表单填写与交互
python
undefinedpython
undefinedPlaywright auto-waiting
Playwright auto-waiting
page.fill("#username", "user@example.com")
page.click("text=Submit") # Auto-waits for element
undefinedpage.fill("#username", "user@example.com")
page.click("text=Submit") # Auto-waits for element
undefinedScreenshot and Content Capture
截图与内容捕获
javascript
// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });javascript
// Puppeteer screenshot
await page.screenshot({ path: 'capture.png', fullPage: true });
await page.pdf({ path: 'page.pdf', format: 'A4' });Playwright Quickstart
Playwright快速入门
python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
page.click("text=Get Started") # Auto-waits for element
page.fill("#search-input", "automation")
browser.close()For advanced Playwright features (contexts, viewports, dynamic content), see .
references/playwright-automation.mdpython
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
page.click("text=Get Started") # Auto-waits for element
page.fill("#search-input", "automation")
browser.close()如需了解Playwright的高级功能(上下文、视口、动态内容处理等),请查看。
references/playwright-automation.mdAdditional Resources
额外资源
Advanced Techniques
高级技巧
- Network Interception:
references/playwright-automation.md#network-interception - Parallel Browser Automation:
references/selenium-webdriver.md#parallel-testing - Performance Monitoring:
references/puppeteer-automation.md#performance-monitoring - Browser Context Management:
references/selenium-webdriver.md#browser-contexts-and-pages
- 网络拦截:
references/playwright-automation.md#network-interception - 并行浏览器自动化:
references/selenium-webdriver.md#parallel-testing - 性能监控:
references/puppeteer-automation.md#performance-monitoring - 浏览器上下文管理:
references/selenium-webdriver.md#browser-contexts-and-pages
Validation Checklist
验证清单
After implementing browser automation:
- Verify browser launches without errors
- Confirm page navigation completes successfully
- Test element selectors locate expected elements
- Validate extracted data matches page content
- Check screenshots/PDFs are generated correctly
- For PyXA: verify macOS permissions are granted
实现浏览器自动化后,请完成以下验证:
- 验证浏览器可正常启动,无报错
- 确认页面导航可成功完成
- 测试元素选择器可定位到预期元素
- 验证提取的数据与页面内容一致
- 检查截图/PDF可正确生成
- 对于PyXA:确认已授予macOS权限
Troubleshooting
故障排查
- Element Not Found Errors: Common solutions across frameworks
- Stale Element References: Handling dynamic content
- Browser Detection: Avoiding automation detection
- Network Timeout Issues: Timeout configuration
- 元素未找到错误:各框架通用解决方案
- 过时元素引用:处理动态内容的方法
- 浏览器检测规避:避免自动化被检测
- 网络超时问题:超时配置方法
Security Considerations
安全注意事项
- Credential Management: Secure storage of login credentials
- Certificate Handling: SSL/TLS certificate validation
- Sandbox and Isolation: Running automation in isolated environments
- Data Sanitization: Cleaning extracted data
- 凭证管理:安全存储登录凭证
- 证书处理:SSL/TLS证书验证
- 沙箱与隔离:在隔离环境中运行自动化
- 数据清理:清洗提取的数据
Performance Optimization
性能优化
- Browser Configuration: Disabling unnecessary features
- Network Optimization: Blocking unwanted resources
- Parallel Execution: Running tests concurrently
- Resource Pooling: Managing browser instances efficiently
- 浏览器配置:禁用不必要的功能
- 网络优化:拦截不必要的资源
- 并行执行:同时运行多个自动化任务
- 资源池化:高效管理浏览器实例
When Not to Use
不适用场景
- For mobile browser automation (use Appium or native testing frameworks)
- For Windows/Linux-only environments (some PyXA features are macOS-only)
- When CAPTCHA solving is required (use specialized services)
- For production scraping without respecting robots.txt
- 移动端浏览器自动化(请使用Appium或原生测试框架)
- 仅Windows/Linux的环境(部分PyXA功能仅支持macOS)
- 需要解决CAPTCHA的场景(请使用专门的服务)
- 未遵守robots.txt的生产环境爬取
What to Load
需加载的资源
- PyXA: - macOS-native browser control
references/pyxa-integration.md - Playwright: - Cross-browser testing
references/playwright-automation.md - Selenium: - Enterprise automation
references/selenium-webdriver.md - Puppeteer: - Node.js Chrome automation
references/puppeteer-automation.md - Workflows: - Complete automation scenarios
references/workflows.md
- PyXA:- macOS原生浏览器控制
references/pyxa-integration.md - Playwright:- 跨浏览器测试
references/playwright-automation.md - Selenium:- 企业级自动化
references/selenium-webdriver.md - Puppeteer:- Node.js Chrome自动化
references/puppeteer-automation.md - 工作流:- 完整自动化场景
references/workflows.md