puppeteer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Puppeteer

Puppeteer

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol.
Puppeteer 是一个 Node 库,它提供了高层级 API,可通过 DevTools Protocol 控制 Chrome 或 Chromium 浏览器。

When to Use

使用场景

  • Chrome Specific: If testing cross-browser isn't a priority (or you only care about Chromium).
  • Web Scraping: Excellent for scraping SPAs because it renders JS.
  • PDF/Screenshots: The industry standard for "HTML to PDF" generation.
  • Chrome 专属场景:如果跨浏览器测试不是优先级(或者你只关注 Chromium 浏览器)。
  • 网页抓取:非常适合抓取单页应用(SPA),因为它可以渲染 JavaScript。
  • PDF/截图生成:是“HTML 转 PDF”生成的行业标准工具。

Quick Start

快速开始

javascript
import puppeteer from "puppeteer";

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://developer.chrome.com/");
  await page.pdf({ path: "dv.pdf", format: "A4" });

  await browser.close();
})();
javascript
import puppeteer from "puppeteer";

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://developer.chrome.com/");
  await page.pdf({ path: "dv.pdf", format: "A4" });

  await browser.close();
})();

Core Concepts

核心概念

DevTools Protocol (CDP)

DevTools Protocol (CDP)

Puppeteer talks directly to Chrome via CDP. This allows deeper control (intercepting network at a low level, CPU profiling) than WebDriver.
Puppeteer 通过 CDP 直接与 Chrome 通信。这比 WebDriver 能实现更深度的控制(比如底层拦截网络请求、CPU 性能分析)。

Headless by Default

默认无头模式

Puppeteer launches Chrome in headless mode by default. Use
headless: false
to see it.
Puppeteer 默认以无头模式启动 Chrome。使用
headless: false
参数可以显示浏览器界面。

Best Practices (2025)

2025年最佳实践

Do:
  • Use
    page.waitForSelector
    : Before clicking or scraping.
  • Use
    stealth
    plugins
    : If scraping, use
    puppeteer-extra-plugin-stealth
    to avoid detection.
  • Use Playwright: Consider switching. Playwright is maintained by the team that built Puppeteer (after moving to Microsoft) and has a better API.
Don't:
  • Don't leak browsers: Always ensure
    browser.close()
    is called in a
    finally
    block or via a test runner hook.
建议做
  • 使用
    page.waitForSelector
    :在点击元素或抓取内容前调用该方法。
  • 使用
    stealth
    插件
    :如果用于网页抓取,使用
    puppeteer-extra-plugin-stealth
    插件来避免被检测到。
  • 考虑切换到 Playwright:Playwright 由原 Puppeteer 开发团队(加入微软后)维护,拥有更优秀的 API。
不建议做
  • 不要遗漏关闭浏览器:务必在
    finally
    块中或通过测试运行器钩子调用
    browser.close()
    ,确保浏览器被关闭。

References

参考资料