browser-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Browser Automation

浏览器自动化

When to Use This Skill

何时使用本技能

Setting up browser automation (Playwright, Puppeteer, Selenium)
Testing Chrome extensions (Manifest V3)
Cloud browser testing (LambdaTest, BrowserStack)
Using Chrome DevTools Protocol (CDP)
Handling dynamic/lazy-loaded content
Debugging automation issues

搭建浏览器自动化环境（Playwright、Puppeteer、Selenium）
测试Chrome扩展（Manifest V3）
云端浏览器测试（LambdaTest、BrowserStack）
使用Chrome DevTools Protocol（CDP）
处理动态/懒加载内容
调试自动化问题

Tool Selection

工具选择

Playwright (Recommended)

Playwright（推荐）

bash

npm install -D @playwright/test

Pros:

Multi-browser support (Chrome, Firefox, Safari, Edge)
Built-in test runner with great DX
Auto-wait mechanisms reduce flakiness
Excellent debugging tools (trace viewer, inspector)
Strong TypeScript support

Use when:

Cross-browser testing needed
Writing end-to-end tests
TypeScript project

bash

npm install -D @playwright/test

优势：

多浏览器支持（Chrome、Firefox、Safari、Edge）
内置测试运行器，开发者体验（DX）出色
自动等待机制减少测试不稳定问题
优秀的调试工具（追踪查看器、检查器）
完善的TypeScript支持

适用场景：

需要跨浏览器测试
编写端到端测试
TypeScript项目

Puppeteer

bash

npm install puppeteer

Pros:

Simpler API, easier to learn
Smaller footprint
Direct Chrome/Chromium control
Official Chrome team project

Use when:

Only need Chrome
Simple automation tasks
Quick scripts/prototypes

bash

npm install puppeteer

优势：

API更简洁，易于学习
资源占用更小
可直接控制Chrome/Chromium
Chrome官方团队维护项目

适用场景：

仅需支持Chrome浏览器
简单自动化任务
快速编写脚本/原型

Selenium

bash

npm install selenium-webdriver

Use when:

Legacy projects already using it
Multi-language team
Need specific Selenium features

bash

npm install selenium-webdriver

适用场景：

已有项目正在使用Selenium
团队使用多语言开发
需要Selenium特定功能

AI-Powered Automation

AI驱动的自动化工具

Stagehand

bash

npm install @anthropic-ai/stagehand

AI agent that automates web tasks using Claude + CDP.

Use when:

Complex multi-step web workflows
Dynamic/changing UIs
Natural language task descriptions
Have budget for LLM API calls

Not suitable for:

Chrome extension testing
Simple, predictable automation
Cost-sensitive projects

Browser-Use

bash

pip install browser-use

Python library for LLM-controlled browser automation.

Use when:

Python-based projects
Need AI to navigate/interact with sites
Exploratory automation

Skyvern

Vision-based web automation using computer vision + LLMs.

Use when:

Sites with no accessible DOM selectors
Need to handle CAPTCHAs/complex visuals
Budget for vision API calls

Stagehand

bash

npm install @anthropic-ai/stagehand

基于Claude + CDP实现网页任务自动化的AI Agent。

适用场景：

复杂多步骤网页工作流
动态/频繁变化的UI
支持自然语言任务描述
有预算调用大语言模型API

不适用场景：

Chrome扩展测试
简单、可预测的自动化任务
对成本敏感的项目

Browser-Use

bash

pip install browser-use

由大语言模型控制的Python浏览器自动化库。

适用场景：

Python技术栈项目
需要AI导航/交互网页
探索性自动化任务

Skyvern

结合计算机视觉与大语言模型的视觉型网页自动化工具。

适用场景：

网站无可用DOM选择器
需要处理验证码/复杂视觉元素
有预算调用视觉API

Chrome Extension Testing

Chrome扩展测试

Local Testing (Recommended)

本地测试（推荐）

For Manifest V3 Extensions:

javascript

// playwright.config.ts
export default defineConfig({
  use: {
    headless: false,
    args: [
      `--disable-extensions-except=${extensionPath}`,
      `--load-extension=${extensionPath}`,
    ],
  },
})

Find extension ID via CDP:

typescript

const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensionTarget = targetInfos.find((target: any) =>
  target.type === 'service_worker' &&
  target.url.startsWith('chrome-extension://')
)

const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]

Navigate to extension pages:

typescript

await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)

针对Manifest V3扩展：

javascript

// playwright.config.ts
export default defineConfig({
  use: {
    headless: false,
    args: [
      `--disable-extensions-except=${extensionPath}`,
      `--load-extension=${extensionPath}`,
    ],
  },
})

通过CDP获取扩展ID：

typescript

const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensionTarget = targetInfos.find((target: any) =>
  target.type === 'service_worker' &&
  target.url.startsWith('chrome-extension://')
)

const extensionId = extensionTarget.url.match(/chrome-extension:\/\/([^\/]+)/)?.[1]

导航至扩展页面：

typescript

await page.goto(`chrome-extension://${extensionId}/popup.html`)
await page.goto(`chrome-extension://${extensionId}/options.html`)
await page.goto(`chrome-extension://${extensionId}/sidepanel.html`)

Cloud Testing Limitations

云端测试限制

What works:

Extension uploads to LambdaTest/BrowserStack
Extensions load in cloud browsers
Service workers run
Can test content scripts on regular sites

What doesn't work:

Cannot navigate to
```
chrome-extension://
```
URLs
All attempts blocked with
```
net::ERR_BLOCKED_BY_CLIENT
```

Why: Cloud platforms block extension URLs for security in shared environments.

Verdict: Use local testing for extension UI testing. Cloud for content script testing only.

支持的功能：

扩展可上传至LambdaTest/BrowserStack
扩展可在云端浏览器中加载
服务工作者可正常运行
可在常规网站上测试内容脚本

不支持的功能：

无法导航至
```
chrome-extension://
```
协议的URL
所有尝试都会被拦截并返回
```
net::ERR_BLOCKED_BY_CLIENT
```

原因： 云端平台为保障共享环境的安全性，拦截了扩展协议的URL。

结论： 扩展UI测试请使用本地环境，仅内容脚本测试可使用云端环境。

Chrome DevTools Protocol (CDP)

Chrome DevTools Protocol（CDP）

Get All Browser Targets

获取所有浏览器目标

typescript

const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')

typescript

const client = await context.newCDPSession(page)
const { targetInfos } = await client.send('Target.getTargets')

const extensions = targetInfos.filter(t => t.type === 'service_worker')
const pages = targetInfos.filter(t => t.type === 'page')
const workers = targetInfos.filter(t => t.type === 'worker')

Execute Code in Extension Context

在扩展上下文中执行代码

typescript

// Attach to extension service worker
const swTarget = await client.send('Target.attachToTarget', {
  targetId: extensionTarget.targetId,
  flatten: true,
})

// Execute in service worker context
await client.send('Runtime.evaluate', {
  expression: `
    chrome.storage.local.get(['key']).then(console.log)
  `,
  awaitPromise: true,
})

typescript

// 连接至扩展服务工作者
const swTarget = await client.send('Target.attachToTarget', {
  targetId: extensionTarget.targetId,
  flatten: true,
})

// 在服务工作者上下文中执行代码
await client.send('Runtime.evaluate', {
  expression: `
    chrome.storage.local.get(['key']).then(console.log)
  `,
  awaitPromise: true,
})

Intercept Network Requests

拦截网络请求

typescript

await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
  patterns: [{ urlPattern: '*' }],
})

client.on('Network.requestIntercepted', async (event) => {
  await client.send('Network.continueInterceptedRequest', {
    interceptionId: event.interceptionId,
    headers: { ...event.request.headers, 'X-Custom': 'value' },
  })
})

typescript

await client.send('Network.enable')
await client.send('Network.setRequestInterception', {
  patterns: [{ urlPattern: '*' }],
})

client.on('Network.requestIntercepted', async (event) => {
  await client.send('Network.continueInterceptedRequest', {
    interceptionId: event.interceptionId,
    headers: { ...event.request.headers, 'X-Custom': 'value' },
  })
})

Get Console Messages

获取控制台消息

typescript

await client.send('Runtime.enable')
await client.send('Log.enable')

client.on('Runtime.consoleAPICalled', (event) => {
  console.log('Console:', event.args.map(a => a.value))
})

client.on('Runtime.exceptionThrown', (event) => {
  console.error('Exception:', event.exceptionDetails)
})

typescript

await client.send('Runtime.enable')
await client.send('Log.enable')

client.on('Runtime.consoleAPICalled', (event) => {
  console.log('控制台输出:', event.args.map(a => a.value))
})

client.on('Runtime.exceptionThrown', (event) => {
  console.error('异常信息:', event.exceptionDetails)
})

Handling Dynamic Content

处理动态内容

Wait Strategies

等待策略

typescript

// Wait for specific content
await page.waitForSelector('.product-price', { timeout: 10000 })

// Wait for network to be idle
await page.goto(url, { waitUntil: 'networkidle' })

// Wait for custom condition
await page.waitForFunction(() => {
  return document.querySelectorAll('.item').length > 10
})

typescript

// 等待指定内容加载完成
await page.waitForSelector('.product-price', { timeout: 10000 })

// 等待网络空闲
await page.goto(url, { waitUntil: 'networkidle' })

// 等待自定义条件满足
await page.waitForFunction(() => {
  return document.querySelectorAll('.item').length > 10
})

Time-Based vs Scroll-Based Lazy Loading

基于时间 vs 基于滚动的懒加载

Key insight: Some sites load content based on time elapsed, not scroll position.

Testing approach:

javascript

// Test 1: Wait with no scroll
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length

// Test 2: Scroll immediately
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length

// If same result: site uses time-based loading
// No scroll automation needed - just wait

Benefits of detecting time-based loading:

Simpler automation code
No visual disruption
More reliable extraction

关键洞察： 部分网站的内容加载基于时间流逝，而非滚动位置。

测试方法：

javascript

// 测试1：不滚动，仅等待
await page.goto(url)
await page.waitForTimeout(3000)
const sectionsNoScroll = await page.$$('.section').length

// 测试2：立即滚动
await page.goto(url)
await page.evaluate(() => window.scrollTo(0, 5000))
await page.waitForTimeout(500)
const sectionsWithScroll = await page.$$('.section').length

// 若结果相同：网站使用基于时间的加载策略
// 无需实现滚动自动化，仅需等待即可

检测基于时间的加载策略的优势：

自动化代码更简洁
无视觉干扰
内容提取更可靠

Handling Lazy-Loaded Images

处理懒加载图片

javascript

// Force lazy images to load
await page.evaluate(() => {
  // Handle data-src → src pattern
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src
  })

  // Handle loading="lazy" attribute
  document.querySelectorAll('[loading="lazy"]').forEach(el => {
    el.loading = 'eager'
  })
})

javascript

// 强制加载懒加载图片
await page.evaluate(() => {
  // 处理data-src → src的懒加载模式
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src
  })

  // 处理带有loading="lazy"属性的图片
  document.querySelectorAll('[loading="lazy"]').forEach(el => {
    el.loading = 'eager'
  })
})

Advanced Lazy Loading Techniques

高级懒加载处理技巧

Googlebot-Style Tall Viewport

模拟Googlebot的高视口策略

Key insight: Googlebot doesn't scroll - it uses a 12,140px viewport and manipulates IntersectionObserver.

javascript

// Temporarily expand document for IntersectionObserver
async function triggerLazyLoadViaViewport() {
  const originalHeight = document.documentElement.style.height;
  const originalOverflow = document.documentElement.style.overflow;

  // Googlebot uses 12,140px mobile / 9,307px desktop
  document.documentElement.style.height = '20000px';
  document.documentElement.style.overflow = 'visible';

  // Wait for observers to trigger
  await new Promise(r => setTimeout(r, 500));

  // Restore
  document.documentElement.style.height = originalHeight;
  document.documentElement.style.overflow = originalOverflow;
}

Pros: No visible scrolling, works with standard IntersectionObserver Cons: Won't work with scroll-event listeners or virtualized lists

关键洞察： Googlebot不会滚动页面，它会使用12140px的视口并操纵IntersectionObserver。

javascript

// 临时扩展文档高度以触发IntersectionObserver
async function triggerLazyLoadViaViewport() {
  const originalHeight = document.documentElement.style.height;
  const originalOverflow = document.documentElement.style.overflow;

  // Googlebot移动端使用12140px，桌面端使用9307px
  document.documentElement.style.height = '20000px';
  document.documentElement.style.overflow = 'visible';

  // 等待观察者触发
  await new Promise(r => setTimeout(r, 500));

  // 恢复原始设置
  document.documentElement.style.height = originalHeight;
  document.documentElement.style.overflow = originalOverflow;
}

优势： 无视觉滚动，兼容标准IntersectionObserver 劣势： 对滚动事件监听或虚拟列表无效

IntersectionObserver Override

重写IntersectionObserver

Patch IntersectionObserver before page loads to force everything to "intersect":

javascript

// Must inject at document_start (before page JS runs)
const script = document.createElement('script');
script.textContent = `
  const OriginalIO = window.IntersectionObserver;
  window.IntersectionObserver = function(callback, options) {
    // Override rootMargin to include everything off-screen
    const modifiedOptions = {
      ...options,
      rootMargin: '10000px 10000px 10000px 10000px'
    };
    return new OriginalIO(callback, modifiedOptions);
  };
  window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);

Pros: Elegant, works at the source, no DOM manipulation Cons: Must inject before page JS runs, may break other functionality

在页面加载前注入代码，重写IntersectionObserver以强制所有元素触发“交叉”状态：

javascript

// 必须在document_start阶段注入（页面JS加载前）
const script = document.createElement('script');
script.textContent = `
  const OriginalIO = window.IntersectionObserver;
  window.IntersectionObserver = function(callback, options) {
    // 修改rootMargin以包含所有屏幕外元素
    const modifiedOptions = {
      ...options,
      rootMargin: '10000px 10000px 10000px 10000px'
    };
    return new OriginalIO(callback, modifiedOptions);
  };
  window.IntersectionObserver.prototype = OriginalIO.prototype;
`;
document.documentElement.prepend(script);

优势： 实现优雅，从根源解决问题，无需DOM操作 劣势： 必须在页面JS加载前注入，可能影响其他功能

Direct Attribute Manipulation

直接修改属性

Force lazy elements to load by modifying their attributes:

javascript

function forceLoadLazyContent() {
  // Handle data-src → src pattern
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src;
  });

  document.querySelectorAll('[data-srcset]').forEach(el => {
    if (!el.srcset) el.srcset = el.dataset.srcset;
  });

  // Handle background images
  document.querySelectorAll('[data-background]').forEach(el => {
    el.style.backgroundImage = `url(${el.dataset.background})`;
  });

  // Trigger lazysizes library if present
  if (window.lazySizes) {
    document.querySelectorAll('.lazyload').forEach(el => {
      window.lazySizes.loader.unveil(el);
    });
  }
}

通过修改元素属性强制加载懒加载内容：

javascript

function forceLoadLazyContent() {
  // 处理data-src → src模式
  document.querySelectorAll('[data-src]').forEach(el => {
    if (!el.src) el.src = el.dataset.src;
  });

  document.querySelectorAll('[data-srcset]').forEach(el => {
    if (!el.srcset) el.srcset = el.dataset.srcset;
  });

  // 处理背景图片
  document.querySelectorAll('[data-background]').forEach(el => {
    el.style.backgroundImage = `url(${el.dataset.background})`;
  });

  // 若页面存在lazysizes库，触发加载
  if (window.lazySizes) {
    document.querySelectorAll('.lazyload').forEach(el => {
      window.lazySizes.loader.unveil(el);
    });
  }
}

MutationObserver for Progressive Extraction

使用MutationObserver实现渐进式提取

Watch for DOM changes and extract content as it loads:

javascript

function setupProgressiveExtraction(onNewContent) {
  let debounceTimer = null;

  const observer = new MutationObserver((mutations) => {
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => {
      const addedNodes = mutations
        .flatMap(m => Array.from(m.addedNodes))
        .filter(n => n.nodeType === Node.ELEMENT_NODE);

      if (addedNodes.length > 0) {
        onNewContent(addedNodes);
      }
    }, 300);
  });

  observer.observe(document.body, {
    childList: true,
    subtree: true
  });

  return () => observer.disconnect();
}

监听DOM变化，在内容加载时实时提取：

javascript

function setupProgressiveExtraction(onNewContent) {
  let debounceTimer = null;

  const observer = new MutationObserver((mutations) => {
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => {
      const addedNodes = mutations
        .flatMap(m => Array.from(m.addedNodes))
        .filter(n => n.nodeType === Node.ELEMENT_NODE);

      if (addedNodes.length > 0) {
        onNewContent(addedNodes);
      }
    }, 300);
  });

  observer.observe(document.body, {
    childList: true,
    subtree: true
  });

  return () => observer.disconnect();
}

Lazy Loading Decision Matrix

懒加载策略决策矩阵

Approach	Scrolling?	Reliability	Complexity
Tall Viewport	No	Medium	Low
IO Override	No	Medium	Medium
Attribute Manipulation	No	Low	Low
MutationObserver	User-initiated	High	Low

Recommendation: Start with IO Override + Tall Viewport for most cases. Use MutationObserver when user scrolling is acceptable.

方法	是否需要滚动	可靠性	复杂度
高视口策略	否	中等	低
重写IntersectionObserver	否	中等	中等
直接修改属性	否	低	低
MutationObserver	需要用户触发滚动	高	低

推荐方案： 大多数场景优先使用重写IntersectionObserver + 高视口策略。若允许用户触发滚动，可使用MutationObserver。

Vanity URLs vs Internal IDs

vanity URL vs 内部ID

Problem: Some sites use vanity URLs that differ from internal identifiers.

URL: /user/john-smith
Internal ID: john-smith-a2b3c4d5

Solution: Match by displayed content, not URL:

javascript

// Strategy 1: Try URL-based ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)

// Strategy 2: Fall back to displayed name
if (!profile) {
  const displayedName = document.querySelector('h1')?.textContent?.trim()
  profile = findByName(displayedName)
}

问题： 部分网站使用 vanity URL（友好URL），与内部标识符不一致。

URL: /user/john-smith
内部ID: john-smith-a2b3c4d5

解决方案： 基于显示内容匹配，而非URL：

javascript

// 策略1：尝试从URL中提取ID
const urlId = location.pathname.split('/').pop()
let profile = findById(urlId)

// 策略2： fallback到显示的名称
if (!profile) {
  const displayedName = document.querySelector('h1')?.textContent?.trim()
  profile = findByName(displayedName)
}

Cloud Browser Integration

云端浏览器集成

LambdaTest Setup

LambdaTest配置

typescript

// playwright.lambdatest.config.ts
const capabilities = {
  'LT:Options': {
    'username': process.env.LT_USERNAME,
    'accessKey': process.env.LT_ACCESS_KEY,
    'platformName': 'Windows 10',
    'browserName': 'Chrome',
    'browserVersion': 'latest',
  }
}

export default defineConfig({
  projects: [{
    name: 'lambdatest',
    use: {
      connectOptions: {
        wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
      },
    },
  }],
})

typescript

// playwright.lambdatest.config.ts
const capabilities = {
  'LT:Options': {
    'username': process.env.LT_USERNAME,
    'accessKey': process.env.LT_ACCESS_KEY,
    'platformName': 'Windows 10',
    'browserName': 'Chrome',
    'browserVersion': 'latest',
  }
}

export default defineConfig({
  projects: [{
    name: 'lambdatest',
    use: {
      connectOptions: {
        wsEndpoint: `wss://cdp.lambdatest.com/playwright?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
      },
    },
  }],
})

Performance Optimization

性能优化

Block Unnecessary Resources

拦截不必要的资源

typescript

await page.route('**/*', route => {
  const type = route.request().resourceType()
  if (['image', 'font', 'media'].includes(type)) {
    route.abort()
  } else {
    route.continue()
  }
})

typescript

await page.route('**/*', route => {
  const type = route.request().resourceType()
  if (['image', 'font', 'media'].includes(type)) {
    route.abort()
  } else {
    route.continue()
  }
})

Reuse Browser Context

复用浏览器上下文

typescript

// Good: Reuse browser, create new contexts
const browser = await chromium.launch()
for (const url of urls) {
  const context = await browser.newContext()
  const page = await context.newPage()
  // ...
  await context.close()
}
await browser.close()

typescript

// 推荐：复用浏览器实例，创建新的上下文
const browser = await chromium.launch()
for (const url of urls) {
  const context = await browser.newContext()
  const page = await context.newPage()
  // ...执行任务
  await context.close()
}
await browser.close()

Parallel Execution

并行执行

typescript

import pLimit from 'p-limit'
const limit = pLimit(5) // Max 5 concurrent

await Promise.all(
  urls.map(url => limit(() => processUrl(url)))
)

typescript

import pLimit from 'p-limit'
const limit = pLimit(5) // 最大并发数为5

await Promise.all(
  urls.map(url => limit(() => processUrl(url)))
)

Debugging

调试技巧

Visual Debugging

可视化调试

typescript

// Screenshots
await page.screenshot({ path: 'debug.png' })

// Video recording
const context = await browser.newContext({
  recordVideo: { dir: 'videos/' }
})

typescript

// 截图
await page.screenshot({ path: 'debug.png' })

// 录制视频
const context = await browser.newContext({
  recordVideo: { dir: 'videos/' }
})

Trace Viewer

追踪查看器

typescript

await context.tracing.start({ screenshots: true, snapshots: true })
// ... run test
await context.tracing.stop({ path: 'trace.zip' })

// View: npx playwright show-trace trace.zip

typescript

await context.tracing.start({ screenshots: true, snapshots: true })
// ...运行测试
await context.tracing.stop({ path: 'trace.zip' })

// 查看追踪：npx playwright show-trace trace.zip

Slow Motion & Pause

慢动作执行与暂停

typescript

const browser = await chromium.launch({
  headless: false,
  slowMo: 1000,
})

await page.pause() // Opens Playwright Inspector

typescript

const browser = await chromium.launch({
  headless: false,
  slowMo: 1000,
})

await page.pause() // 打开Playwright Inspector

Quick Reference

快速参考

Common Selectors

常用选择器

typescript

// CSS
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')

// Text
await page.locator('text="Exact text"')

// Playwright-specific
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')

typescript

// CSS选择器
await page.locator('.class')
await page.locator('#id')
await page.locator('[data-testid="value"]')

// 文本选择器
await page.locator('text="Exact text"')

// Playwright专属选择器
await page.getByRole('button', { name: 'Submit' })
await page.getByText('Welcome')
await page.getByLabel('Email')

Data Extraction

数据提取

typescript

// Single element
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')

// Multiple elements
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))

// Complex extraction
const data = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.product')).map(el => ({
    title: el.querySelector('.title')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }))
})

typescript

// 提取单个元素内容
const text = await page.textContent('.element')
const attr = await page.getAttribute('.element', 'href')

// 提取多个元素内容
const texts = await page.$$eval('.item', els => els.map(e => e.textContent))

// 复杂数据提取
const data = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.product')).map(el => ({
    title: el.querySelector('.title')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }))
})

Common Issues

常见问题

Element Not Found

元素未找到

typescript

// Wait for element
await page.waitForSelector('.element', { state: 'visible' })

// Check if in iframe
const frame = page.frame({ url: /example\.com/ })
if (frame) {
  await frame.waitForSelector('.element')
}

typescript

// 等待元素出现
await page.waitForSelector('.element', { state: 'visible' })

// 检查元素是否在iframe中
const frame = page.frame({ url: /example\.com/ })
if (frame) {
  await frame.waitForSelector('.element')
}

Browser Connection Lost

浏览器连接丢失

typescript

try {
  await page.goto(url)
} catch (error) {
  if (error.message.includes('Browser closed')) {
    browser = await chromium.launch()
    // retry
  }
}

typescript

try {
  await page.goto(url)
} catch (error) {
  if (error.message.includes('Browser closed')) {
    browser = await chromium.launch()
    // 重试任务
  }
}