browser-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser Automation
浏览器自动化
When to Use This Skill
何时使用本技能
- Setting up browser automation (Playwright, Puppeteer, Selenium)
- Testing Chrome extensions (Manifest V3)
- Cloud browser testing (LambdaTest, BrowserStack)
- Using Chrome DevTools Protocol (CDP)
- Handling dynamic/lazy-loaded content
- Debugging automation issues
- 搭建浏览器自动化环境(Playwright、Puppeteer、Selenium)
- 测试Chrome扩展(Manifest V3)
- 云端浏览器测试(LambdaTest、BrowserStack)
- 使用Chrome DevTools Protocol(CDP)
- 处理动态/懒加载内容
- 调试自动化问题
Tool Selection
工具选择
Playwright (Recommended)
Playwright(推荐)
bash
npm install -D @playwright/testPros:
- Multi-browser support (Chrome, Firefox, Safari, Edge)
- Built-in test runner with great DX
- Auto-wait mechanisms reduce flakiness
- Excellent debugging tools (trace viewer, inspector)
- Strong TypeScript support
Use when:
- Cross-browser testing needed
- Writing end-to-end tests
- TypeScript project
bash
npm install -D @playwright/test优势:
- 多浏览器支持(Chrome、Firefox、Safari、Edge)
- 内置测试运行器,开发者体验(DX)出色
- 自动等待机制减少测试不稳定问题
- 优秀的调试工具(追踪查看器、检查器)
- 完善的TypeScript支持
适用场景:
- 需要跨浏览器测试
- 编写端到端测试
- TypeScript项目
Puppeteer
Puppeteer
bash
npm install puppeteerPros:
- Simpler API, easier to learn
- Smaller footprint
- Direct Chrome/Chromium control
- Official Chrome team project
Use when:
- Only need Chrome
- Simple automation tasks
- Quick scripts/prototypes
bash
npm install puppeteer优势:
- API更简洁,易于学习
- 资源占用更小
- 可直接控制Chrome/Chromium
- Chrome官方团队维护项目
适用场景:
- 仅需支持Chrome浏览器
- 简单自动化任务
- 快速编写脚本/原型
Selenium
Selenium
bash
npm install selenium-webdriverUse when:
- Legacy projects already using it
- Multi-language team
- Need specific Selenium features
bash
npm install selenium-webdriver适用场景:
- 已有项目正在使用Selenium
- 团队使用多语言开发
- 需要Selenium特定功能
AI-Powered Automation
AI驱动的自动化工具
Stagehand
bash
npm install @anthropic-ai/stagehandAI agent that automates web tasks using Claude + CDP.
Use when:
- Complex multi-step web workflows
- Dynamic/changing UIs
- Natural language task descriptions
- Have budget for LLM API calls
Not suitable for:
- Chrome extension testing
- Simple, predictable automation
- Cost-sensitive projects
Browser-Use
bash
pip install browser-usePython library for LLM-controlled browser automation.
Use when:
- Python-based projects
- Need AI to navigate/interact with sites
- Exploratory automation
Skyvern
Vision-based web automation using computer vision + LLMs.
Use when:
- Sites with no accessible DOM selectors
- Need to handle CAPTCHAs/complex visuals
- Budget for vision API calls
Stagehand
bash
npm install @anthropic-ai/stagehand基于Claude + CDP实现网页任务自动化的AI Agent。
适用场景:
- 复杂多步骤网页工作流
- 动态/频繁变化的UI
- 支持自然语言任务描述
- 有预算调用大语言模型API
不适用场景:
- Chrome扩展测试
- 简单、可预测的自动化任务
- 对成本敏感的项目
Browser-Use
bash
pip install browser-use由大语言模型控制的Python浏览器自动化库。
适用场景:
- Python技术栈项目
- 需要AI导航/交互网页
- 探索性自动化任务
Skyvern
结合计算机视觉与大语言模型的视觉型网页自动化工具。
适用场景:
- 网站无可用DOM选择器
- 需要处理验证码/复杂视觉元素
- 有预算调用视觉API
Chrome Extension Testing
Chrome扩展测试
Local Testing (Recommended)
本地测试(推荐)
For Manifest V3 Extensions:
javascript
// playwright.config.ts
export default defineConfig({
use: {
headless: false,
args: [
`--disable-extensions-except=${extensionPath}`,
`--load-extension=${extensionPath}`,
],
},
})Find extension ID via CDP:
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')
const extensionTarget = targetInfos.find((target: any) =>
target.type === 'service_worker' &&
target.url.startsWith('chrome-extension://')
)
const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]Navigate to extension pages:
typescript
await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)针对Manifest V3扩展:
javascript
// playwright.config.ts
export default defineConfig({
use: {
headless: false,
args: [
`--disable-extensions-except=${extensionPath}`,
`--load-extension=${extensionPath}`,
],
},
})通过CDP获取扩展ID:
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')
const extensionTarget = targetInfos.find((target: any) =>
target.type === 'service_worker' &&
target.url.startsWith('chrome-extension://')
)
const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]导航至扩展页面:
typescript
await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)Cloud Testing Limitations
云端测试限制
What works:
- Extension uploads to LambdaTest/BrowserStack
- Extensions load in cloud browsers
- Service workers run
- Can test content scripts on regular sites
What doesn't work:
- Cannot navigate to URLs
chrome-extension:// - All attempts blocked with
net::ERR_BLOCKED_BY_CLIENT
Why: Cloud platforms block extension URLs for security in shared environments.
Verdict: Use local testing for extension UI testing. Cloud for content script testing only.
支持的功能:
- 扩展可上传至LambdaTest/BrowserStack
- 扩展可在云端浏览器中加载
- 服务工作者可正常运行
- 可在常规网站上测试内容脚本
不支持的功能:
- 无法导航至协议的URL
chrome-extension:// - 所有尝试都会被拦截并返回
net::ERR_BLOCKED_BY_CLIENT
原因: 云端平台为保障共享环境的安全性,拦截了扩展协议的URL。
结论: 扩展UI测试请使用本地环境,仅内容脚本测试可使用云端环境。
Chrome DevTools Protocol (CDP)
Chrome DevTools Protocol(CDP)
Get All Browser Targets
获取所有浏览器目标
typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')
const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')typescript
const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')
const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')Execute Code in Extension Context
在扩展上下文中执行代码
typescript
// Attach to extension service worker
const swTarget = await client.send('Target.attachToTarget', {
targetId: extensionTarget.targetId,
flatten: true,
})
// Execute in service worker context
await client.send('Runtime.evaluate', {
expression: `
chrome.storage.local.get(['key']).then(console.log)
`,
awaitPromise: true,
})typescript
// 连接至扩展服务工作者
const swTarget = await client.send('Target.attachToTarget', {
targetId: extensionTarget.targetId,
flatten: true,
})
// 在服务工作者上下文中执行代码
await client.send('Runtime.evaluate', {
expression: `
chrome.storage.local.get(['key']).then(console.log)
`,
awaitPromise: true,
})Intercept Network Requests
拦截网络请求
typescript
await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
patterns: [{ urlPattern: '*' }],
})
client.on('Network.requestIntercepted', async (event) => {
await client.send('Network.continueInterceptedRequest', {
interceptionId: event.interceptionId,
headers: { ...event.request.headers, 'X-Custom': 'value' },
})
})typescript
await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
patterns: [{ urlPattern: '*' }],
})
client.on('Network.requestIntercepted', async (event) => {
await client.send('Network.continueInterceptedRequest', {
interceptionId: event.interceptionId,
headers: { ...event.request.headers, 'X-Custom': 'value' },
})
})Get Console Messages
获取控制台消息
typescript
await client.send('Runtime.enable')
await client.send('Log.enable')
client.on('Runtime.consoleAPICalled', (event) => {
console.log('Console:', event.args.map(a => a.value))
})
client.on('Runtime.exceptionThrown', (event) => {
console.error('Exception:', event.exceptionDetails)
})typescript
await client.send('Runtime.enable')
await client.send('Log.enable')
client.on('Runtime.consoleAPICalled', (event) => {
console.log('控制台输出:', event.args.map(a => a.value))
})
client.on('Runtime.exceptionThrown', (event) => {
console.error('异常信息:', event.exceptionDetails)
})Handling Dynamic Content
处理动态内容
Wait Strategies
等待策略
typescript
// Wait for specific content
await page.waitForSelector('.product-price', { timeout: 10000 })
// Wait for network to be idle
await page.goto(url, { waitUntil: 'networkidle' })
// Wait for custom condition
await page.waitForFunction(() => {
return document.querySelectorAll('.item').length > 10
})typescript
// 等待指定内容加载完成
await page.waitForSelector('.product-price', { timeout: 10000 })
// 等待网络空闲
await page.goto(url, { waitUntil: 'networkidle' })
// 等待自定义条件满足
await page.waitForFunction(() => {
return document.querySelectorAll('.item').length > 10
})Time-Based vs Scroll-Based Lazy Loading
基于时间 vs 基于滚动的懒加载
Key insight: Some sites load content based on time elapsed, not scroll position.
Testing approach:
javascript
// Test 1: Wait with no scroll
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length
// Test 2: Scroll immediately
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length
// If same result: site uses time-based loading
// No scroll automation needed - just waitBenefits of detecting time-based loading:
- Simpler automation code
- No visual disruption
- More reliable extraction
关键洞察: 部分网站的内容加载基于时间流逝,而非滚动位置。
测试方法:
javascript
// 测试1:不滚动,仅等待
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length
// 测试2:立即滚动
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length
// 若结果相同:网站使用基于时间的加载策略
// 无需实现滚动自动化,仅需等待即可检测基于时间的加载策略的优势:
- 自动化代码更简洁
- 无视觉干扰
- 内容提取更可靠
Handling Lazy-Loaded Images
处理懒加载图片
javascript
// Force lazy images to load
await page.evaluate(() => {
// Handle data-src → src pattern
document.querySelectorAll('[data-src]').forEach(el => {
if (!el.src) el.src = el.dataset.src
})
// Handle loading="lazy" attribute
document.querySelectorAll('[loading="lazy"]').forEach(el => {
el.loading = 'eager'
})
})javascript
// 强制加载懒加载图片
await page.evaluate(() => {
// 处理data-src → src的懒加载模式
document.querySelectorAll('[data-src]').forEach(el => {
if (!el.src) el.src = el.dataset.src
})
// 处理带有loading="lazy"属性的图片
document.querySelectorAll('[loading="lazy"]').forEach(el => {
el.loading = 'eager'
})
})Advanced Lazy Loading Techniques
高级懒加载处理技巧
Googlebot-Style Tall Viewport
模拟Googlebot的高视口策略
Key insight: Googlebot doesn't scroll - it uses a 12,140px viewport and manipulates IntersectionObserver.
javascript
// Temporarily expand document for IntersectionObserver
async function triggerLazyLoadViaViewport() {
const originalHeight = document.documentElement.style.height;
const originalOverflow = document.documentElement.style.overflow;
// Googlebot uses 12,140px mobile / 9,307px desktop
document.documentElement.style.height = '20000px';
document.documentElement.style.overflow = 'visible';
// Wait for observers to trigger
await new Promise(r => setTimeout(r, 500));
// Restore
document.documentElement.style.height = originalHeight;
document.documentElement.style.overflow = originalOverflow;
}Pros: No visible scrolling, works with standard IntersectionObserver
Cons: Won't work with scroll-event listeners or virtualized lists
关键洞察: Googlebot不会滚动页面,它会使用12140px的视口并操纵IntersectionObserver。
javascript
// 临时扩展文档高度以触发IntersectionObserver
async function triggerLazyLoadViaViewport() {
const originalHeight = document.documentElement.style.height;
const originalOverflow = document.documentElement.style.overflow;
// Googlebot移动端使用12140px,桌面端使用9307px
document.documentElement.style.height = '20000px';
document.documentElement.style.overflow = 'visible';
// 等待观察者触发
await new Promise(r => setTimeout(r, 500));
// 恢复原始设置
document.documentElement.style.height = originalHeight;
document.documentElement.style.overflow = originalOverflow;
}优势: 无视觉滚动,兼容标准IntersectionObserver
劣势: 对滚动事件监听或虚拟列表无效
IntersectionObserver Override
重写IntersectionObserver
Patch IntersectionObserver before page loads to force everything to "intersect":
javascript
// Must inject at document_start (before page JS runs)
const script = document.createElement('script');
script.textContent = `
const OriginalIO = window.IntersectionObserver;
window.IntersectionObserver = function(callback, options) {
// Override rootMargin to include everything off-screen
const modifiedOptions = {
...options,
rootMargin: '10000px 10000px 10000px 10000px'
};
return new OriginalIO(callback, modifiedOptions);
};
window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);Pros: Elegant, works at the source, no DOM manipulation
Cons: Must inject before page JS runs, may break other functionality
在页面加载前注入代码,重写IntersectionObserver以强制所有元素触发“交叉”状态:
javascript
// 必须在document_start阶段注入(页面JS加载前)
const script = document.createElement('script');
script.textContent = `
const OriginalIO = window.IntersectionObserver;
window.IntersectionObserver = function(callback, options) {
// 修改rootMargin以包含所有屏幕外元素
const modifiedOptions = {
...options,
rootMargin: '10000px 10000px 10000px 10000px'
};
return new OriginalIO(callback, modifiedOptions);
};
window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);优势: 实现优雅,从根源解决问题,无需DOM操作
劣势: 必须在页面JS加载前注入,可能影响其他功能
Direct Attribute Manipulation
直接修改属性
Force lazy elements to load by modifying their attributes:
javascript
function forceLoadLazyContent() {
// Handle data-src → src pattern
document.querySelectorAll('[data-src]').forEach(el => {
if (!el.src) el.src = el.dataset.src;
});
document.querySelectorAll('[data-srcset]').forEach(el => {
if (!el.srcset) el.srcset = el.dataset.srcset;
});
// Handle background images
document.querySelectorAll('[data-background]').forEach(el => {
el.style.backgroundImage = `url(${el.dataset.background})`;
});
// Trigger lazysizes library if present
if (window.lazySizes) {
document.querySelectorAll('.lazyload').forEach(el => {
window.lazySizes.loader.unveil(el);
});
}
}通过修改元素属性强制加载懒加载内容:
javascript
function forceLoadLazyContent() {
// 处理data-src → src模式
document.querySelectorAll('[data-src]').forEach(el => {
if (!el.src) el.src = el.dataset.src;
});
document.querySelectorAll('[data-srcset]').forEach(el => {
if (!el.srcset) el.srcset = el.dataset.srcset;
});
// 处理背景图片
document.querySelectorAll('[data-background]').forEach(el => {
el.style.backgroundImage = `url(${el.dataset.background})`;
});
// 若页面存在lazysizes库,触发加载
if (window.lazySizes) {
document.querySelectorAll('.lazyload').forEach(el => {
window.lazySizes.loader.unveil(el);
});
}
}MutationObserver for Progressive Extraction
使用MutationObserver实现渐进式提取
Watch for DOM changes and extract content as it loads:
javascript
function setupProgressiveExtraction(onNewContent) {
let debounceTimer = null;
const observer = new MutationObserver((mutations) => {
clearTimeout(debounceTimer);
debounceTimer = setTimeout(() => {
const addedNodes = mutations
.flatMap(m => Array.from(m.addedNodes))
.filter(n => n.nodeType === Node.ELEMENT_NODE);
if (addedNodes.length > 0) {
onNewContent(addedNodes);
}
}, 300);
});
observer.observe(document.body, {
childList: true,
subtree: true
});
return () => observer.disconnect();
}监听DOM变化,在内容加载时实时提取:
javascript
function setupProgressiveExtraction(onNewContent) {
let debounceTimer = null;
const observer = new MutationObserver((mutations) => {
clearTimeout(debounceTimer);
debounceTimer = setTimeout(() => {
const addedNodes = mutations
.flatMap(m => Array.from(m.addedNodes))
.filter(n => n.nodeType === Node.ELEMENT_NODE);
if (addedNodes.length > 0) {
onNewContent(addedNodes);
}
}, 300);
});
observer.observe(document.body, {
childList: true,
subtree: true
});
return () => observer.disconnect();
}Lazy Loading Decision Matrix
懒加载策略决策矩阵
| Approach | Scrolling? | Reliability | Complexity |
|---|---|---|---|
| Tall Viewport | No | Medium | Low |
| IO Override | No | Medium | Medium |
| Attribute Manipulation | No | Low | Low |
| MutationObserver | User-initiated | High | Low |
Recommendation: Start with IO Override + Tall Viewport for most cases. Use MutationObserver when user scrolling is acceptable.
| 方法 | 是否需要滚动 | 可靠性 | 复杂度 |
|---|---|---|---|
| 高视口策略 | 否 | 中等 | 低 |
| 重写IntersectionObserver | 否 | 中等 | 中等 |
| 直接修改属性 | 否 | 低 | 低 |
| MutationObserver | 需要用户触发滚动 | 高 | 低 |
推荐方案: 大多数场景优先使用重写IntersectionObserver + 高视口策略。若允许用户触发滚动,可使用MutationObserver。
Vanity URLs vs Internal IDs
vanity URL vs 内部ID
Problem: Some sites use vanity URLs that differ from internal identifiers.
URL: /user/john-smith
Internal ID: john-smith-a2b3c4d5Solution: Match by displayed content, not URL:
javascript
// Strategy 1: Try URL-based ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)
// Strategy 2: Fall back to displayed name
if (!profile) {
const displayedName = document.querySelector('h1')?.textContent?.trim()
profile = findByName(displayedName)
}问题: 部分网站使用 vanity URL(友好URL),与内部标识符不一致。
URL: /user/john-smith
内部ID: john-smith-a2b3c4d5解决方案: 基于显示内容匹配,而非URL:
javascript
// 策略1:尝试从URL中提取ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)
// 策略2: fallback到显示的名称
if (!profile) {
const displayedName = document.querySelector('h1')?.textContent?.trim()
profile = findByName(displayedName)
}Cloud Browser Integration
云端浏览器集成
LambdaTest Setup
LambdaTest配置
typescript
// playwright.lambdatest.config.ts
const capabilities = {
'LT:Options': {
'username': process.env.LT_USERNAME,
'accessKey': process.env.LT_ACCESS_KEY,
'platformName': 'Windows 10',
'browserName': 'Chrome',
'browserVersion': 'latest',
}
}
export default defineConfig({
projects: [{
name: 'lambdatest',
use: {
connectOptions: {
wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
},
},
}],
})typescript
// playwright.lambdatest.config.ts
const capabilities = {
'LT:Options': {
'username': process.env.LT_USERNAME,
'accessKey': process.env.LT_ACCESS_KEY,
'platformName': 'Windows 10',
'browserName': 'Chrome',
'browserVersion': 'latest',
}
}
export default defineConfig({
projects: [{
name: 'lambdatest',
use: {
connectOptions: {
wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
},
},
}],
})Performance Optimization
性能优化
Block Unnecessary Resources
拦截不必要的资源
typescript
await page.route('**/*', route => {
const type = route.request().resourceType()
if (['image', 'font', 'media'].includes(type)) {
route.abort()
} else {
route.continue()
}
})typescript
await page.route('**/*', route => {
const type = route.request().resourceType()
if (['image', 'font', 'media'].includes(type)) {
route.abort()
} else {
route.continue()
}
})Reuse Browser Context
复用浏览器上下文
typescript
// Good: Reuse browser, create new contexts
const browser = await chromium.launch()
for (const url of urls) {
const context = await browser.newContext()
const page = await context.newPage()
// ...
await context.close()
}
await browser.close()typescript
// 推荐:复用浏览器实例,创建新的上下文
const browser = await chromium.launch()
for (const url of urls) {
const context = await browser.newContext()
const page = await context.newPage()
// ...执行任务
await context.close()
}
await browser.close()Parallel Execution
并行执行
typescript
import pLimit from 'p-limit'
const limit = pLimit(5) // Max 5 concurrent
await Promise.all(
urls.map(url => limit(() => processUrl(url)))
)typescript
import pLimit from 'p-limit'
const limit = pLimit(5) // 最大并发数为5
await Promise.all(
urls.map(url => limit(() => processUrl(url)))
)Debugging
调试技巧
Visual Debugging
可视化调试
typescript
// Screenshots
await page.screenshot({ path: 'debug.png' })
// Video recording
const context = await browser.newContext({
recordVideo: { dir: 'videos/' }
})typescript
// 截图
await page.screenshot({ path: 'debug.png' })
// 录制视频
const context = await browser.newContext({
recordVideo: { dir: 'videos/' }
})Trace Viewer
追踪查看器
typescript
await context.tracing.start({ screenshots: true, snapshots: true })
// ... run test
await context.tracing.stop({ path: 'trace.zip' })
// View: npx playwright show-trace trace.ziptypescript
await context.tracing.start({ screenshots: true, snapshots: true })
// ...运行测试
await context.tracing.stop({ path: 'trace.zip' })
// 查看追踪:npx playwright show-trace trace.zipSlow Motion & Pause
慢动作执行与暂停
typescript
const browser = await chromium.launch({
headless: false,
slowMo: 1000,
})
await page.pause() // Opens Playwright Inspectortypescript
const browser = await chromium.launch({
headless: false,
slowMo: 1000,
})
await page.pause() // 打开Playwright InspectorQuick Reference
快速参考
Common Selectors
常用选择器
typescript
// CSS
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')
// Text
await page.locator('text="Exact text"')
// Playwright-specific
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')typescript
// CSS选择器
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')
// 文本选择器
await page.locator('text="Exact text"')
// Playwright专属选择器
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')Data Extraction
数据提取
typescript
// Single element
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')
// Multiple elements
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))
// Complex extraction
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product')).map(el => ({
title: el.querySelector('.title')?.textContent,
price: el.querySelector('.price')?.textContent,
}))
})typescript
// 提取单个元素内容
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')
// 提取多个元素内容
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))
// 复杂数据提取
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product')).map(el => ({
title: el.querySelector('.title')?.textContent,
price: el.querySelector('.price')?.textContent,
}))
})Common Issues
常见问题
Element Not Found
元素未找到
typescript
// Wait for element
await page.waitForSelector('.element', { state: 'visible' })
// Check if in iframe
const frame = page.frame({ url: /example\.com/ })
if (frame) {
await frame.waitForSelector('.element')
}typescript
// 等待元素出现
await page.waitForSelector('.element', { state: 'visible' })
// 检查元素是否在iframe中
const frame = page.frame({ url: /example\.com/ })
if (frame) {
await frame.waitForSelector('.element')
}Browser Connection Lost
浏览器连接丢失
typescript
try {
await page.goto(url)
} catch (error) {
if (error.message.includes('Browser closed')) {
browser = await chromium.launch()
// retry
}
}typescript
try {
await page.goto(url)
} catch (error) {
if (error.message.includes('Browser closed')) {
browser = await chromium.launch()
// 重试任务
}
}