scrapling

Scrapling is a powerful Python web scraping library with a comprehensive CLI for extracting data from websites directly from the terminal without writing code. The primary use case is the

extract

command group for quick data extraction.

Scrapling是一款功能强大的Python网页抓取库，附带全面的CLI，无需编写代码即可直接从终端提取网站数据。其核心使用场景是

extract

命令组，用于快速提取数据。

Installation

安装

Install with the shell extras using uv:

bash

uv tool install "scrapling[shell]"

Then install fetcher dependencies (browsers, system dependencies, fingerprint manipulation):

bash

scrapling install

使用uv安装包含shell扩展的版本：

bash

uv tool install "scrapling[shell]"

然后安装抓取器依赖（浏览器、系统依赖、指纹处理）：

bash

scrapling install

Extract Commands (Primary Usage)

提取命令（核心用法）

The

scrapling extract

command group allows you to download and extract content from websites without writing any code. Output format is determined by file extension:

```
.md
```
- Convert HTML to Markdown
```
.html
```
- Save raw HTML
```
.txt
```
- Extract clean text content

scrapling extract

命令组允许你无需编写代码即可下载并提取网站内容。输出格式由文件扩展名决定：

```
.md
```
- 将HTML转换为Markdown
```
.html
```
- 保存原始HTML
```
.txt
```
- 提取纯文本内容

Quick Start

快速开始

bash

undefined

bash

undefined

Basic website download as text

基础网站下载为文本格式

scrapling extract get "https://example.com" page_content.txt

Download as markdown

以Markdown格式下载

scrapling extract get "https://blog.example.com" article.md

Save raw HTML

保存原始HTML

scrapling extract get "https://example.com" page.html

undefined

scrapling extract get "https://example.com" page.html

undefined

Decision Guide: Which Command to Use?

命令选择指南：该用哪个命令？

Use Case	Command
Simple websites, blogs, news articles	`get`
Modern web apps, dynamic content (JavaScript)	`fetch`
Protected sites, Cloudflare, anti-bot	`stealthy-fetch`
Form submissions, APIs	`post` , `put` , `delete`

使用场景	命令
简单网站、博客、新闻文章	`get`
现代Web应用、动态内容（JavaScript）	`fetch`
受保护站点、Cloudflare、反机器人机制	`stealthy-fetch`
表单提交、API	`post` , `put` , `delete`

HTTP Request Commands

HTTP请求命令

GET Request

GET请求

Most common command for downloading website content:

bash

undefined

最常用的网站内容下载命令：

bash

undefined

Basic download

基础下载

scrapling extract get "https://news.site.com" news.md

Download with custom timeout

自定义超时时间下载

scrapling extract get "https://example.com" content.txt --timeout 60

Extract specific content using CSS selectors

使用CSS选择器提取特定内容

scrapling extract get "https://blog.example.com" articles.md --css-selector "article"

Send request with cookies

携带Cookie发送请求

scrapling extract get "https://scrapling.requestcatcher.com" content.md
--cookies "session=abc123; user=john"

Add user agent

添加用户代理

scrapling extract get "https://api.site.com" data.json
-H "User-Agent: MyBot 1.0"

Add multiple headers

添加多个请求头

scrapling extract get "https://site.com" page.html
-H "Accept: text/html"
-H "Accept-Language: en-US"

With query parameters

携带查询参数

scrapling extract get "https://api.example.com" data.json
-p "page=1" -p "limit=10"


**GET options:**

-H, --headers TEXT HTTP headers "Key: Value" (multiple allowed) --cookies TEXT Cookies "name1=value1;name2=value2" --timeout INTEGER Request timeout in seconds (default: 30) --proxy TEXT Proxy URL from $PROXY_URL env variable -s, --css-selector TEXT Extract specific content with CSS selector -p, --params TEXT Query parameters "key=value" (multiple) --follow-redirects / --no-follow-redirects (default: True) --verify / --no-verify SSL verification (default: True) --impersonate TEXT Browser to impersonate (chrome, firefox) --stealthy-headers / --no-stealthy-headers (default: True)

undefined

scrapling extract get "https://api.example.com" data.json
-p "page=1" -p "limit=10"


**GET选项：**

-H, --headers TEXT HTTP请求头 "Key: Value"（可多次使用） --cookies TEXT Cookie "name1=value1;name2=value2" --timeout INTEGER 请求超时时间（秒，默认：30） --proxy TEXT 代理URL，可从$PROXY_URL环境变量获取 -s, --css-selector TEXT 使用CSS选择器提取特定内容 -p, --params TEXT 查询参数 "key=value"（可多次使用） --follow-redirects / --no-follow-redirects （默认：开启） --verify / --no-verify SSL验证（默认：开启） --impersonate TEXT 模拟的浏览器类型（chrome、firefox） --stealthy-headers / --no-stealthy-headers （默认：开启）

undefined

POST Request

POST请求

bash

undefined

bash

undefined

Submit form data

提交表单数据

scrapling extract post "https://api.site.com/search" results.html
--data "query=python&type=tutorial"

Send JSON data

发送JSON数据

scrapling extract post "https://api.site.com" response.json
--json '{"username": "test", "action": "search"}'


**POST options:** (same as GET plus)

-d, --data TEXT Form data "param1=value1&param2=value2" -j, --json TEXT JSON data as string

undefined

scrapling extract post "https://api.site.com" response.json
--json '{"username": "test", "action": "search"}'


**POST选项：**（包含所有GET选项，新增以下选项）

-d, --data TEXT 表单数据 "param1=value1&param2=value2" -j, --json TEXT JSON格式的字符串数据

undefined

PUT Request

PUT请求

bash

undefined

bash

undefined

Send data

发送数据

scrapling extract put "https://api.example.com" results.html
--data "update=info"
--impersonate "firefox"

Send JSON data

发送JSON数据

scrapling extract put "https://api.example.com" response.json
--json '{"username": "test", "action": "search"}'

undefined

scrapling extract put "https://api.example.com" response.json
--json '{"username": "test", "action": "search"}'

undefined

DELETE Request

DELETE请求

bash

scrapling extract delete "https://api.example.com/resource" response.txt

bash

scrapling extract delete "https://api.example.com/resource" response.txt

With impersonation

模拟浏览器发送请求

scrapling extract delete "https://api.example.com/" response.txt
--impersonate "chrome"

undefined

scrapling extract delete "https://api.example.com/" response.txt
--impersonate "chrome"

undefined

Browser Fetching Commands

浏览器抓取命令

Use browser-based fetching for JavaScript-heavy sites or when HTTP requests fail.

当HTTP请求无法获取内容时，可使用基于浏览器的抓取方式处理JavaScript渲染的站点。

fetch - Handle Dynamic Content

fetch - 处理动态内容

For websites that load content dynamically or have slight protection:

bash

undefined

适用于动态加载内容或具备基础防护的网站：

bash

undefined

Wait for JavaScript to load and network activity to finish

等待JavaScript加载完成并停止网络活动

scrapling extract fetch "https://example.com" content.md --network-idle

Wait for specific element to appear

等待特定元素出现

scrapling extract fetch "https://example.com" data.txt
--wait-selector ".content-loaded"

Visible browser mode for debugging

可见浏览器模式用于调试

scrapling extract fetch "https://example.com" page.html
--no-headless --disable-resources

Use installed Chrome browser

使用本地已安装的Chrome浏览器

scrapling extract fetch "https://example.com" content.md --real-chrome

With CSS selector extraction

结合CSS选择器提取内容

scrapling extract fetch "https://example.com" articles.md
--css-selector "article"
--network-idle


**fetch options:**

--headless / --no-headless Run browser headless (default: True) --disable-resources Drop unnecessary resources for speed boost --network-idle Wait for network idle --timeout INTEGER Timeout in milliseconds (default: 30000) --wait INTEGER Additional wait time in ms (default: 0) -s, --css-selector TEXT Extract specific content --wait-selector TEXT Wait for selector before proceeding --locale TEXT User locale (default: system) --real-chrome Use installed Chrome browser --proxy TEXT Proxy URL -H, --extra-headers TEXT Extra headers (multiple)

undefined

scrapling extract fetch "https://example.com" articles.md
--css-selector "article"
--network-idle


**fetch选项：**

--headless / --no-headless 以无头模式运行浏览器（默认：开启） --disable-resources 禁用不必要的资源以提升速度 --network-idle 等待网络活动停止 --timeout INTEGER 超时时间（毫秒，默认：30000） --wait INTEGER 额外等待时间（毫秒，默认：0） -s, --css-selector TEXT 提取特定内容的CSS选择器 --wait-selector TEXT 等待指定选择器对应的元素出现后再继续 --locale TEXT 用户区域设置（默认：系统设置） --real-chrome 使用本地已安装的Chrome浏览器 --proxy TEXT 代理URL -H, --extra-headers TEXT 额外请求头（可多次使用）

undefined

stealthy-fetch - Bypass Protection

stealthy-fetch - 绕过防护机制

For websites with anti-bot protection or Cloudflare:

bash

undefined

适用于具备反机器人防护或Cloudflare验证的网站：

bash

undefined

Bypass basic protection

绕过基础防护

scrapling extract stealthy-fetch "https://example.com" content.md

Solve Cloudflare challenges

解决Cloudflare验证挑战

scrapling extract stealthy-fetch "https://nopecha.com/demo/cloudflare" data.txt
--solve-cloudflare
--css-selector "#padded_content a"

Use proxy for anonymity (set PROXY_URL environment variable)

使用代理提升匿名性（需设置PROXY_URL环境变量）

scrapling extract stealthy-fetch "https://site.com" content.md
--proxy "$PROXY_URL"

Hide canvas fingerprint

隐藏Canvas指纹

scrapling extract stealthy-fetch "https://example.com" content.md
--hide-canvas
--block-webrtc


**stealthy-fetch options:** (same as fetch plus)

--block-webrtc Block WebRTC entirely --solve-cloudflare Solve Cloudflare challenges --allow-webgl / --block-webgl Allow WebGL (default: True) --hide-canvas Add noise to canvas operations

undefined

scrapling extract stealthy-fetch "https://example.com" content.md
--hide-canvas
--block-webrtc


**stealthy-fetch选项：**（包含所有fetch选项，新增以下选项）

--block-webrtc 完全阻止WebRTC --solve-cloudflare 自动解决Cloudflare验证挑战 --allow-webgl / --block-webgl 允许WebGL（默认：开启） --hide-canvas 为Canvas操作添加干扰信息以隐藏指纹

undefined

CSS Selector Examples

CSS选择器示例

Extract specific content with the

-s

or

--css-selector

flag:

bash

undefined

使用

-s

或

--css-selector

参数提取特定内容：

bash

undefined

Extract all articles

提取所有文章内容

scrapling extract get "https://blog.example.com" articles.md -s "article"

Extract specific class

提取特定类名的内容

scrapling extract get "https://example.com" titles.txt -s ".title"

Extract by ID

提取指定ID的内容

scrapling extract get "https://example.com" content.md -s "#main-content"

Extract links (href attributes)

提取链接（href属性）

scrapling extract get "https://example.com" links.txt -s "a::attr(href)"

Extract text only

仅提取文本内容

scrapling extract get "https://example.com" titles.txt -s "h1::text"

Extract multiple elements with fetch

结合fetch命令提取多个元素

scrapling extract fetch "https://example.com" products.md
-s ".product-card"
--network-idle

undefined

scrapling extract fetch "https://example.com" products.md
-s ".product-card"
--network-idle

undefined

Help Commands

帮助命令

bash

scrapling --help
scrapling extract --help
scrapling extract get --help
scrapling extract post --help
scrapling extract fetch --help
scrapling extract stealthy-fetch --help

bash

scrapling --help
scrapling extract --help
scrapling extract get --help
scrapling extract post --help
scrapling extract fetch --help
scrapling extract stealthy-fetch --help

scrapling

Original

Translation

scrapling

scrapling

Installation

安装

Extract Commands (Primary Usage)

提取命令（核心用法）

Quick Start

快速开始

Basic website download as text

基础网站下载为文本格式

Download as markdown

以Markdown格式下载

Save raw HTML

保存原始HTML

Decision Guide: Which Command to Use?

命令选择指南：该用哪个命令？

HTTP Request Commands

HTTP请求命令

GET Request

GET请求

Basic download

基础下载

Download with custom timeout

自定义超时时间下载

Extract specific content using CSS selectors

使用CSS选择器提取特定内容

Send request with cookies

携带Cookie发送请求

Add user agent

添加用户代理

Add multiple headers

添加多个请求头

With query parameters

携带查询参数

POST Request

POST请求

Submit form data

提交表单数据

Send JSON data

发送JSON数据

PUT Request

PUT请求

Send data

发送数据

Send JSON data

发送JSON数据

DELETE Request

DELETE请求

With impersonation

模拟浏览器发送请求

Browser Fetching Commands

浏览器抓取命令

fetch - Handle Dynamic Content

fetch - 处理动态内容

Wait for JavaScript to load and network activity to finish

等待JavaScript加载完成并停止网络活动

Wait for specific element to appear

等待特定元素出现

Visible browser mode for debugging

可见浏览器模式用于调试

Use installed Chrome browser

使用本地已安装的Chrome浏览器

With CSS selector extraction

结合CSS选择器提取内容

stealthy-fetch - Bypass Protection

stealthy-fetch - 绕过防护机制

Bypass basic protection

绕过基础防护

Solve Cloudflare challenges

解决Cloudflare验证挑战

Use proxy for anonymity (set PROXY_URL environment variable)

使用代理提升匿名性（需设置PROXY_URL环境变量）

Hide canvas fingerprint

隐藏Canvas指纹

CSS Selector Examples

CSS选择器示例

Extract all articles