Playwright 浏览器自动化

概述

Playwright 是一个强大的浏览器自动化工具，可以模拟真实用户操作，支持：

无头浏览器模式（后台运行）
数据采集和爬虫
表单自动填写
UI 自动化测试
截图和 PDF 生成

Playwright 是一个强大的浏览器自动化工具，可以模拟真实用户操作，支持：

无头浏览器模式（后台运行）
数据采集和爬虫
表单自动填写
UI 自动化测试
截图和 PDF 生成

为什么需要 Playwright

与 browser tool 的区别

特性	browser tool	Playwright
需要用户参与	✅ 需要手动打开浏览器	❌ 完全自动
适合定时任务	❌	✅
后台运行	❌	✅
调试友好	✅ 可视化操作	⚠️ 需要日志
无需安装	✅ 已集成	❌ 需要安装

特性	browser tool	Playwright
需要用户参与	✅ 需要手动打开浏览器	❌ 完全自动
适合定时任务	❌	✅
后台运行	❌	✅
调试友好	✅ 可视化操作	⚠️ 需要日志
无需安装	✅ 已集成	❌ 需要安装

使用场景

使用 Playwright:

✅ 定时监控（cron 任务）
✅ 大规模数据采集
✅ 无人值守运行
✅ 生产环境部署

使用 browser tool:

✅ 交互式调试
✅ 需要人工决策的操作
✅ 一次性任务
✅ 绕过复杂验证码

使用 Playwright:

✅ 定时监控（cron 任务）
✅ 大规模数据采集
✅ 无人值守运行
✅ 生产环境部署

使用 browser tool:

✅ 交互式调试
✅ 需要人工决策的操作
✅ 一次性任务
✅ 绕过复杂验证码

快速开始

1. 安装 Playwright

bash

undefined

bash

undefined

安装 Python 包

pip install playwright

安装浏览器（Chromium）

playwright install chromium

验证安装

python3 -c "from playwright.sync_api import sync_playwright; print('✅ 安装成功')"

undefined

python3 -c "from playwright.sync_api import sync_playwright; print('✅ 安装成功')"

undefined

2. 基本使用

同步 API（简单任务）

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)  # headless=False 显示浏览器
    page = browser.new_page()
    page.goto('https://example.com')
    print(page.title())
    browser.close()

python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)  # headless=False 显示浏览器
    page = browser.new_page()
    page.goto('https://example.com')
    print(page.title())
    browser.close()

异步 API（推荐，性能更好）

python

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto('https://example.com')
        title = await page.title()
        print(title)
        await browser.close()

asyncio.run(main())

python

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto('https://example.com')
        title = await page.title()
        print(title)
        await browser.close()

asyncio.run(main())

常用功能

1. 页面导航

python

undefined

python

undefined

等待加载完成

await page.goto('https://example.com', wait_until='domcontentloaded')

等待选项：

- 'load' - 页面完全加载

- 'domcontentloaded' - DOM 加载完成（更快）

- 'networkidle' - 网络空闲（最慢但最稳）

undefined

undefined

2. 元素定位

python

undefined

python

undefined

CSS 选择器

await page.click('button.submit') await page.fill('input[name="username"]', 'myuser')

XPath

await page.click('xpath=//button[@type="submit"]')

文本选择器

await page.click('text=登录')

组合选择器

await page.click('div.login-form >> text=登录')

undefined

await page.click('div.login-form >> text=登录')

undefined

3. 提取数据

python

undefined

python

undefined

获取文本

text = await page.text_content('h1.title')

获取属性

href = await page.get_attribute('a.link', 'href')

获取多个元素

items = await page.query_selector_all('div.item') for item in items: text = await item.text_content() print(text)

执行 JavaScript

result = await page.evaluate('() => document.title')

获取整个 HTML

html = await page.content()

undefined

html = await page.content()

undefined

4. 表单操作

python

undefined

python

undefined

填写表单

await page.fill('input[name="username"]', 'myuser') await page.fill('input[name="password"]', 'mypass') await page.click('button[type="submit"]')

下拉选择

await page.select_option('select#country', 'China')

复选框

await page.check('input#agree')

上传文件

await page.set_input_files('input[type="file"]', 'path/to/file.pdf')

undefined

await page.set_input_files('input[type="file"]', 'path/to/file.pdf')

undefined

5. 等待策略

python

undefined

python

undefined

等待元素出现

await page.wait_for_selector('div.result', timeout=5000)

等待导航

async with page.expect_navigation(): await page.click('a.link')

等待特定条件

await page.wait_for_function('() => document.title.includes("加载完成")')

固定延迟

import asyncio await asyncio.sleep(2) # 等待 2 秒

undefined

import asyncio await asyncio.sleep(2) # 等待 2 秒

undefined

6. 滚动和交互

python

undefined

python

undefined

滚动到页面底部

await page.evaluate('window.scrollTo(0, document.body.scrollHeight)')

滚动到元素

await page.locator('div.footer').scroll_into_view_if_needed()

鼠标悬停

await page.hover('div.menu')

拖拽

await page.drag_and_drop('div.draggable', 'div.dropzone')

undefined

await page.drag_and_drop('div.draggable', 'div.dropzone')

undefined

7. 截图和 PDF

python

undefined

python

undefined

截图

await page.screenshot(path='screenshot.png')

全页截图

await page.screenshot(path='full.png', full_page=True)

元素截图

await page.locator('div.content').screenshot(path='element.png')

生成 PDF

await page.pdf(path='page.pdf', format='A4')

undefined

await page.pdf(path='page.pdf', format='A4')

undefined

8. 处理弹窗

python

undefined

python

undefined

接受 alert

async with page.expect_event('dialog') as dialog_info: await page.click('button') dialog = await dialog_info.value await dialog.accept()

输入 prompt

async with page.expect_event('dialog') as dialog_info: await page.click('button') dialog = await dialog_info.value await dialog.accept('my input')

undefined

async with page.expect_event('dialog') as dialog_info: await page.click('button') dialog = await dialog_info.value await dialog.accept('my input')

undefined

反爬策略

1. User-Agent 轮换

python

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]

browser = await p.chromium.launch(
    user_agent=random.choice(user_agents)
)

python

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]

browser = await p.chromium.launch(
    user_agent=random.choice(user_agents)
)

2. 随机延迟

python

import random
import asyncio

python

import random
import asyncio

在操作之间添加随机延迟

await asyncio.sleep(random.uniform(2, 5))

undefined

await asyncio.sleep(random.uniform(2, 5))

undefined

3. Cookie 保存和加载

python

undefined

python

undefined

第一次登录后保存 Cookie

context = await browser.new_context() await page.goto('https://example.com/login')

... 登录操作 ...

await context.storage_state(path='cookies.json')

后续使用保存的 Cookie

context = await browser.new_context(storage_state='cookies.json')

undefined

context = await browser.new_context(storage_state='cookies.json')

undefined

4. 代理设置

python

browser = await p.chromium.launch(
    proxy={
        'server': 'http://proxy.example.com:8080',
        'username': 'user',
        'password': 'pass'
    }
)

python

browser = await p.chromium.launch(
    proxy={
        'server': 'http://proxy.example.com:8080',
        'username': 'user',
        'password': 'pass'
    }
)

5. 浏览器上下文隔离

python

undefined

python

undefined

创建独立的上下文（相当于无痕模式）

context = await browser.new_context( viewport={'width': 1920, 'height': 1080}, user_agent='Custom UA', locale='zh-CN' ) page = await context.new_page()

undefined

context = await browser.new_context( viewport={'width': 1920, 'height': 1080}, user_agent='Custom UA', locale='zh-CN' ) page = await context.new_page()

undefined

调试技巧

1. 显示浏览器

python

undefined

python

undefined

开启有头模式，可以看到操作过程

browser = await p.chromium.launch(headless=False, slow_mo=1000)

slow_mo=1000 会在每个操作间延迟 1 秒

undefined

undefined

2. 截图调试

python

undefined

python

undefined

在关键步骤截图

await page.goto('https://example.com') await page.screenshot(path='step1.png')

await page.click('button') await page.screenshot(path='step2.png')

undefined

await page.goto('https://example.com') await page.screenshot(path='step1.png')

await page.click('button') await page.screenshot(path='step2.png')

undefined

3. 查看日志

python

undefined

python

undefined

监听控制台消息

page.on('console', lambda msg: print(f'Console: {msg.text}'))

监听网络请求

page.on('request', lambda request: print(f'Request: {request.url}')) page.on('response', lambda response: print(f'Response: {response.status}'))

undefined

page.on('request', lambda request: print(f'Request: {request.url}')) page.on('response', lambda response: print(f'Response: {response.status}'))

undefined

4. Playwright Inspector

bash

undefined

bash

undefined

启动 Inspector 模式

PWDEBUG=1 python3 your_script.py

undefined

PWDEBUG=1 python3 your_script.py

undefined

常见问题

Q: 如何处理验证码？

A: 几种方案

使用打码平台（超级鹰、若快打码）
手动处理：暂停等待用户输入
降级：用 browser tool 让用户手动操作

python

undefined

A: 几种方案

使用打码平台（超级鹰、若快打码）
手动处理：暂停等待用户输入
降级：用 browser tool 让用户手动操作

python

undefined

方案 2: 手动处理

input("遇到验证码，请在浏览器中完成，然后按回车继续...") await asyncio.sleep(2) # 等待验证通过

undefined

input("遇到验证码，请在浏览器中完成，然后按回车继续...") await asyncio.sleep(2) # 等待验证通过

undefined

Q: 元素找不到怎么办？

A: 检查以下几点

是否在 iframe 中（需要切换）
是否动态加载（需要等待）
选择器是否正确

python

undefined

A: 检查以下几点

是否在 iframe 中（需要切换）
是否动态加载（需要等待）
选择器是否正确

python

undefined

切换到 iframe

frame = page.frame('iframe-id') await frame.click('button')

等待动态加载

await page.wait_for_selector('div.dynamic-content')

undefined

await page.wait_for_selector('div.dynamic-content')

undefined

Q: 如何提高性能？

A:

使用异步 API
并发多个页面
减少不必要的等待

python

undefined

A:

使用异步 API
并发多个页面
减少不必要的等待

python

undefined

并发多个页面

async def fetch(url): page = await browser.new_page() await page.goto(url) # ...

tasks = [fetch(url) for url in urls] await asyncio.gather(*tasks)

undefined

async def fetch(url): page = await browser.new_page() await page.goto(url) # ...

tasks = [fetch(url) for url in urls] await asyncio.gather(*tasks)

undefined

与 OpenClaw 集成

在 healthcare-monitor 中的使用

python

undefined

python

undefined

scraper_free.py

from playwright.async_api import async_playwright

async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent="Mozilla/5.0 ..." ) page = await context.new_page() await page.goto(url) # ... 采集数据 ... await browser.close()

undefined

from playwright.async_api import async_playwright

async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent="Mozilla/5.0 ..." ) page = await context.new_page() await page.goto(url) # ... 采集数据 ... await browser.close()

undefined

与 browser tool 配合

Playwright → 日常自动监控
browser tool → 调试和异常处理

Playwright → 日常自动监控
browser tool → 调试和异常处理

资源

官方文档: https://playwright.dev/python/
API 参考: https://playwright.dev/python/docs/api/class-playwright

示例代码:

~/clawd/skills/playwright-automation/examples/

官方文档: https://playwright.dev/python/
API 参考: https://playwright.dev/python/docs/api/class-playwright

示例代码:

~/clawd/skills/playwright-automation/examples/

快速命令

bash

undefined

bash

undefined

安装

pip install playwright && playwright install chromium

运行脚本

python3 script.py

调试模式

PWDEBUG=1 python3 script.py

查看版本

playwright --version


---

**记住**: Playwright 让你的自动化任务完全无人值守！🚀

playwright --version


---

**记住**: Playwright 让你的自动化任务完全无人值守！🚀

playwright-automation

Original

Translation

Playwright 浏览器自动化

Playwright 浏览器自动化

概述

概述

为什么需要 Playwright

为什么需要 Playwright

与 browser tool 的区别

与 browser tool 的区别

使用场景

使用场景

快速开始

快速开始

1. 安装 Playwright

1. 安装 Playwright

安装 Python 包

安装 Python 包

安装浏览器（Chromium）

安装浏览器（Chromium）

验证安装

验证安装

2. 基本使用

2. 基本使用

同步 API（简单任务）

同步 API（简单任务）

异步 API（推荐，性能更好）

异步 API（推荐，性能更好）

常用功能

常用功能

1. 页面导航

1. 页面导航

等待加载完成

等待加载完成

等待选项：

等待选项：

- 'load' - 页面完全加载

- 'load' - 页面完全加载

- 'domcontentloaded' - DOM 加载完成（更快）

- 'domcontentloaded' - DOM 加载完成（更快）

- 'networkidle' - 网络空闲（最慢但最稳）

- 'networkidle' - 网络空闲（最慢但最稳）

2. 元素定位

2. 元素定位

CSS 选择器

CSS 选择器

XPath

XPath

文本选择器

文本选择器

组合选择器

组合选择器

3. 提取数据

3. 提取数据

获取文本

获取文本

获取属性

获取属性

获取多个元素

获取多个元素

执行 JavaScript

执行 JavaScript

获取整个 HTML

获取整个 HTML

4. 表单操作

4. 表单操作

填写表单

填写表单

下拉选择

下拉选择

复选框

复选框

上传文件

上传文件

5. 等待策略

5. 等待策略

等待元素出现

等待元素出现

等待导航