scrapling-official

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Scrapling

Scrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.

Its parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation — all in a few lines of Python. One library, zero compromises.

Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.

Requires: Python 3.10+

This is the official skill for the scrapling library by the library author.

Scrapling是一款自适应网页抓取框架，可覆盖从单次请求到全量大规模爬取的所有需求。

它的解析器可学习网站变化，当页面更新时自动重定位你需要的元素。它的抓取器开箱即可绕过Cloudflare Turnstile这类反机器人系统。它的爬虫框架仅需几行Python代码即可实现并发、多会话爬取，支持暂停/恢复和自动代理轮换——一个库满足所有需求，无需妥协。

支持极速爬取，附带实时统计和流处理能力。由网页抓取从业者为开发者和普通用户打造，人人都能找到适用的功能。

要求：Python 3.10+

这是Scrapling库作者官方发布的对应skill。

Setup (once)

配置（仅需一次）

Create a virtual Python environment through any way available, like

venv

, then inside the environment do:

pip install "scrapling[all]>=0.4.1"

Then do this to download all the browsers' dependencies:

bash

scrapling install --force

Make note of the

scrapling

binary path and use it instead of

scrapling

from now on with all commands (if

scrapling

is not on

$PATH

通过任意可用方式创建Python虚拟环境，比如

venv

，然后在环境内执行：

pip install "scrapling[all]>=0.4.1"

然后执行以下命令下载所有浏览器依赖：

bash

scrapling install --force

如果

scrapling

不在系统

$PATH

中，请记下

scrapling

二进制文件的路径，后续所有命令都使用该路径代替

scrapling

。

Docker

Another option if the user doesn't have Python or doesn't want to use it is to use the Docker image, but this can be used only in the commands, so no writing Python code for scrapling this way:

bash

docker pull pyd4vinci/scrapling

bash

docker pull ghcr.io/d4vinci/scrapling:latest

如果用户没有安装Python或者不想使用Python，也可以选择使用Docker镜像，但这种方式仅支持命令行使用，无法编写Python代码调用Scrapling：

bash

docker pull pyd4vinci/scrapling

或者

bash

docker pull ghcr.io/d4vinci/scrapling:latest

CLI Usage

CLI 使用说明

The

scrapling extract

command group lets you download and extract content from websites directly without writing any code.

bash

Usage: scrapling extract [OPTIONS] COMMAND [ARGS]...

Commands:
  get             Perform a GET request and save the content to a file.
  post            Perform a POST request and save the content to a file.
  put             Perform a PUT request and save the content to a file.
  delete          Perform a DELETE request and save the content to a file.
  fetch           Use a browser to fetch content with browser automation and flexible options.
  stealthy-fetch  Use a stealthy browser to fetch content with advanced stealth features.

scrapling extract

命令组可让你直接从网站下载并提取内容，无需编写任何代码。

bash

Usage: scrapling extract [OPTIONS] COMMAND [ARGS]...

Commands:
  get             执行GET请求并将内容保存到文件。
  post            执行POST请求并将内容保存到文件。
  put             执行PUT请求并将内容保存到文件。
  delete          执行DELETE请求并将内容保存到文件。
  fetch           通过浏览器自动化能力抓取内容，支持灵活配置。
  stealthy-fetch  使用隐形浏览器抓取内容，搭载高级隐形特性。

Usage pattern

使用模式

Choose your output format by changing the file extension. Here are some examples for the
```
scrapling extract get
```
command:
- Convert the HTML content to Markdown, then save it to the file (great for documentation):
```
scrapling extract get "https://blog.example.com" article.md
```
- Save the HTML content as it is to the file:
```
scrapling extract get "https://example.com" page.html
```
- Save a clean version of the text content of the webpage to the file:
```
scrapling extract get "https://example.com" content.txt
```
Output to a temp file, read it back, then clean up.
All commands can use CSS selectors to extract specific parts of the page through
```
--css-selector
```
or
```
-s
```
.

Which command to use generally:

Use get
with simple websites, blogs, or news articles.
Use fetch
with modern web apps, or sites with dynamic content.
Use stealthy-fetch
with protected sites, Cloudflare, or anti-bot systems.

When unsure, start with
get
. If it fails or returns empty content, escalate to
fetch
, then
stealthy-fetch
. The speed of
fetch
and
stealthy-fetch
is nearly the same, so you are not sacrificing anything.

通过修改文件后缀选择输出格式，以下是
```
scrapling extract get
```
命令的示例：
- 将HTML内容转换为Markdown后保存到文件（非常适合文档场景）：
```
scrapling extract get "https://blog.example.com" article.md
```
- 直接保存原始HTML内容到文件：
```
scrapling extract get "https://example.com" page.html
```
- 保存网页纯净版文本内容到文件：
```
scrapling extract get "https://example.com" content.txt
```
输出到临时文件，读取后立即清理。
所有命令都支持通过
```
--css-selector
```
或
```
-s
```
参数使用CSS选择器提取页面特定部分。

通用命令选择规则：

简单网站、博客、新闻文章使用 get
现代Web应用、动态内容站点使用 fetch
受保护站点、Cloudflare防护站点、带反爬系统的站点使用 stealthy-fetch

不确定的情况下先使用
get
，如果失败或返回空内容，再升级到
fetch
，之后是
stealthy-fetch
。
fetch
和
stealthy-fetch
的速度几乎一致，不会有性能损失。

Key options (requests)

核心选项（HTTP请求类）

Those options are shared between the 4 HTTP request commands:

Option	Input type	Description
-H, --headers	TEXT	HTTP headers in format "Key: Value" (can be used multiple times)
--cookies	TEXT	Cookies string in format "name1=value1; name2=value2"
--timeout	INTEGER	Request timeout in seconds (default: 30)
--proxy	TEXT	Proxy URL in format "http://username:password@host:port"
-s, --css-selector	TEXT	CSS selector to extract specific content from the page. It returns all matches.
-p, --params	TEXT	Query parameters in format "key=value" (can be used multiple times)
--follow-redirects / --no-follow-redirects	None	Whether to follow redirects (default: True)
--verify / --no-verify	None	Whether to verify SSL certificates (default: True)
--impersonate	TEXT	Browser to impersonate. Can be a single browser (e.g., Chrome) or a comma-separated list for random selection (e.g., Chrome, Firefox, Safari).
--stealthy-headers / --no-stealthy-headers	None	Use stealthy browser headers (default: True)

Options shared between

post

and

put

only:

Option	Input type	Description
-d, --data	TEXT	Form data to include in the request body (as string, ex: "param1=value1&param2=value2")
-j, --json	TEXT	JSON data to include in the request body (as string)

Examples:

bash

undefined

以下选项是4个HTTP请求命令共享的配置：

选项	输入类型	说明
-H, --headers	TEXT	HTTP请求头，格式为"Key: Value"（可多次使用）
--cookies	TEXT	Cookie字符串，格式为"name1=value1; name2=value2"
--timeout	INTEGER	请求超时时间，单位为秒（默认：30）
--proxy	TEXT	代理URL，格式为"http://username:password@host:port"
-s, --css-selector	TEXT	用于提取页面特定内容的CSS选择器，返回所有匹配结果
-p, --params	TEXT	查询参数，格式为"key=value"（可多次使用）
--follow-redirects / --no-follow-redirects	None	是否跟随重定向（默认：True）
--verify / --no-verify	None	是否验证SSL证书（默认：True）
--impersonate	TEXT	要模拟的浏览器，可以是单个浏览器（如Chrome），也可以是逗号分隔的列表用于随机选择（如Chrome, Firefox, Safari）
--stealthy-headers / --no-stealthy-headers	None	使用隐形浏览器请求头（默认：True）

仅

post

和

put

命令共享的选项：

选项	输入类型	说明
-d, --data	TEXT	请求体中的表单数据，字符串格式，例如："param1=value1&param2=value2"
-j, --json	TEXT	请求体中的JSON数据，字符串格式

示例：

bash

undefined

Basic download

基础下载

scrapling extract get "https://news.site.com" news.md

Download with custom timeout

自定义超时时间下载

scrapling extract get "https://example.com" content.txt --timeout 60

Extract only specific content using CSS selectors

使用CSS选择器仅提取特定内容

scrapling extract get "https://blog.example.com" articles.md --css-selector "article"

Send a request with cookies

携带Cookie发送请求

scrapling extract get "https://scrapling.requestcatcher.com" content.md --cookies "session=abc123; user=john"

Add user agent

添加用户代理

scrapling extract get "https://api.site.com" data.json -H "User-Agent: MyBot 1.0"

Add multiple headers

添加多个请求头

scrapling extract get "https://site.com" page.html -H "Accept: text/html" -H "Accept-Language: en-US"

undefined

scrapling extract get "https://site.com" page.html -H "Accept: text/html" -H "Accept-Language: en-US"

undefined

Key options (browsers)

核心选项（浏览器类）

Both (

fetch

stealthy-fetch

) share options:

Option	Input type	Description
--headless / --no-headless	None	Run browser in headless mode (default: True)
--disable-resources / --enable-resources	None	Drop unnecessary resources for speed boost (default: False)
--network-idle / --no-network-idle	None	Wait for network idle (default: False)
--real-chrome / --no-real-chrome	None	If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it. (default: False)
--timeout	INTEGER	Timeout in milliseconds (default: 30000)
--wait	INTEGER	Additional wait time in milliseconds after page load (default: 0)
-s, --css-selector	TEXT	CSS selector to extract specific content from the page. It returns all matches.
--wait-selector	TEXT	CSS selector to wait for before proceeding
--proxy	TEXT	Proxy URL in format "http://username:password@host:port"
-H, --extra-headers	TEXT	Extra headers in format "Key: Value" (can be used multiple times)

This option is specific to

fetch

only:

Option	Input type	Description
--locale	TEXT	Specify user locale. Defaults to the system default locale.

And these options are specific to

stealthy-fetch

only:

Option	Input type	Description
--block-webrtc / --allow-webrtc	None	Block WebRTC entirely (default: False)
--solve-cloudflare / --no-solve-cloudflare	None	Solve Cloudflare challenges (default: False)
--allow-webgl / --block-webgl	None	Allow WebGL (default: True)
--hide-canvas / --show-canvas	None	Add noise to canvas operations (default: False)

Examples:

bash

undefined

fetch

和

stealthy-fetch

共享以下选项：

选项	输入类型	说明
--headless / --no-headless	None	以无头模式运行浏览器（默认：True）
--disable-resources / --enable-resources	None	丢弃不必要的资源以提升速度（默认：False）
--network-idle / --no-network-idle	None	等待网络空闲（默认：False）
--real-chrome / --no-real-chrome	None	如果设备上安装了Chrome浏览器，启用该选项后，抓取器将启动你本地的Chrome实例使用（默认：False）
--timeout	INTEGER	超时时间，单位为毫秒（默认：30000）
--wait	INTEGER	页面加载后的额外等待时间，单位为毫秒（默认：0）
-s, --css-selector	TEXT	用于提取页面特定内容的CSS选择器，返回所有匹配结果
--wait-selector	TEXT	继续执行前等待匹配该CSS选择器的元素出现
--proxy	TEXT	代理URL，格式为"http://username:password@host:port"
-H, --extra-headers	TEXT	额外请求头，格式为"Key: Value"（可多次使用）

仅

fetch

命令独有的选项：

选项	输入类型	说明
--locale	TEXT	指定用户区域设置，默认为系统默认区域设置

仅

stealthy-fetch

命令独有的选项：

选项	输入类型	说明
--block-webrtc / --allow-webrtc	None	完全屏蔽WebRTC（默认：False）
--solve-cloudflare / --no-solve-cloudflare	None	解决Cloudflare验证挑战（默认：False）
--allow-webgl / --block-webgl	None	允许WebGL（默认：True）
--hide-canvas / --show-canvas	None	为Canvas操作添加噪声（默认：False）

示例：

bash

undefined

Wait for JavaScript to load content and finish network activity

等待JavaScript加载内容并完成网络活动

scrapling extract fetch "https://scrapling.requestcatcher.com/" content.md --network-idle

Wait for specific content to appear

等待特定内容出现

scrapling extract fetch "https://scrapling.requestcatcher.com/" data.txt --wait-selector ".content-loaded"

Run in visible browser mode (helpful for debugging)

以可见浏览器模式运行（适合调试）

scrapling extract fetch "https://scrapling.requestcatcher.com/" page.html --no-headless --disable-resources

Bypass basic protection

绕过基础防护

scrapling extract stealthy-fetch "https://scrapling.requestcatcher.com" content.md

Solve Cloudflare challenges

解决Cloudflare验证挑战

scrapling extract stealthy-fetch "https://nopecha.com/demo/cloudflare" data.txt --solve-cloudflare --css-selector "#padded_content a"

Use a proxy for anonymity.

使用代理实现匿名

scrapling extract stealthy-fetch "https://site.com" content.md --proxy "http://proxy-server:8080"

undefined

scrapling extract stealthy-fetch "https://site.com" content.md --proxy "http://proxy-server:8080"

undefined

Notes

注意事项

ALWAYS clean up temp files after reading
Prefer
```
.md
```
output for readability; use
```
.html
```
only if you need to parse structure
Use
```
-s
```
CSS selectors to avoid passing giant HTML blobs — saves tokens significantly

Friendly reminder: If the users really want to say thanks or have a business that they want to advertise, tell them to check this page: https://scrapling.readthedocs.io/en/latest/donate.html

If the user wants to do more than that, coding will give them that ability.

读取临时文件后务必立即清理
优先选择
```
.md
```
输出格式提升可读性，仅当你需要解析结构时使用
```
.html
```
使用
```
-s
```
参数指定CSS选择器避免返回超大HTML块，可显著节省token

友好提示：如果用户想要表示感谢或者有商业推广需求，可以告知他们访问这个页面：https://scrapling.readthedocs.io/en/latest/donate.html

如果用户需要实现更多功能，可以通过编码实现。

Code overview

代码概览

Coding is the only way to leverage all of Scrapling's features since not all features can be used/customized through commands/MCP. Here's a quick overview of how to code with scrapling.

编码是使用Scrapling全部功能的唯一方式，因为并非所有功能都可以通过命令/MCP使用/自定义。以下是Scrapling编码的快速概览。

Basic Usage

基础用法

HTTP requests with session support

python

from scrapling.fetchers import Fetcher, FetcherSession

with FetcherSession(impersonate='chrome') as session:  # Use latest version of Chrome's TLS fingerprint
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text').getall()

支持会话的HTTP请求

python

from scrapling.fetchers import Fetcher, FetcherSession

with FetcherSession(impersonate='chrome') as session:  # 使用最新版Chrome的TLS指纹
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text').getall()

Or use one-off requests

或者使用单次请求

page = Fetcher.get('https://quotes.toscrape.com/') quotes = page.css('.quote .text::text').getall()

Advanced stealth mode
```python
from scrapling.fetchers import StealthyFetcher, StealthySession

with StealthySession(headless=True, solve_cloudflare=True) as session:  # Keep the browser open until you finish
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
    data = page.css('#padded_content a').getall()

page = Fetcher.get('https://quotes.toscrape.com/') quotes = page.css('.quote .text::text').getall()

高级隐形模式
```python
from scrapling.fetchers import StealthyFetcher, StealthySession

with StealthySession(headless=True, solve_cloudflare=True) as session:  # 保持浏览器开启直到任务完成
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
    data = page.css('#padded_content a').getall()

Or use one-off request style, it opens the browser for this request, then closes it after finishing

或者使用单次请求模式，会为该请求打开浏览器，完成后自动关闭

page = StealthyFetcher.fetch('https://nopecha.com/demo/cloudflare') data = page.css('#padded_content a').getall()

Full browser automation
```python
from scrapling.fetchers import DynamicFetcher, DynamicSession

with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:  # Keep the browser open until you finish
    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
    data = page.xpath('//span[@class="text"]/text()').getall()  # XPath selector if you prefer it

page = StealthyFetcher.fetch('https://nopecha.com/demo/cloudflare') data = page.css('#padded_content a').getall()

全量浏览器自动化
```python
from scrapling.fetchers import DynamicFetcher, DynamicSession

with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:  # 保持浏览器开启直到任务完成
    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
    data = page.xpath('//span[@class="text"]/text()').getall()  # 可选择使用XPath选择器

Or use one-off request style, it opens the browser for this request, then closes it after finishing

或者使用单次请求模式，会为该请求打开浏览器，完成后自动关闭

page = DynamicFetcher.fetch('https://quotes.toscrape.com/') data = page.css('.quote .text::text').getall()

undefined

page = DynamicFetcher.fetch('https://quotes.toscrape.com/') data = page.css('.quote .text::text').getall()

undefined

Spiders

爬虫

Build full crawlers with concurrent requests, multiple session types, and pause/resume:

python

from scrapling.spiders import Spider, Request, Response

class QuotesSpider(Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]
    concurrent_requests = 10
    
    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            yield {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get(),
            }
            
        next_page = response.css('.next a')
        if next_page:
            yield response.follow(next_page[0].attrib['href'])

result = QuotesSpider().start()
print(f"Scraped {len(result.items)} quotes")
result.items.to_json("quotes.json")

Use multiple session types in a single spider:

python

from scrapling.spiders import Spider, Request, Response
from scrapling.fetchers import FetcherSession, AsyncStealthySession

class MultiSessionSpider(Spider):
    name = "multi"
    start_urls = ["https://example.com/"]
    
    def configure_sessions(self, manager):
        manager.add("fast", FetcherSession(impersonate="chrome"))
        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
    
    async def parse(self, response: Response):
        for link in response.css('a::attr(href)').getall():
            # Route protected pages through the stealth session
            if "protected" in link:
                yield Request(link, sid="stealth")
            else:
                yield Request(link, sid="fast", callback=self.parse)  # explicit callback

Pause and resume long crawls with checkpoints by running the spider like this:

python

QuotesSpider(crawldir="./crawl_data").start()

Press Ctrl+C to pause gracefully — progress is saved automatically. Later, when you start the spider again, pass the same

crawldir

, and it will resume from where it stopped.

构建支持并发请求、多会话类型、暂停/恢复的全功能爬虫：

python

from scrapling.spiders import Spider, Request, Response

class QuotesSpider(Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]
    concurrent_requests = 10
    
    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            yield {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get(),
            }
            
        next_page = response.css('.next a')
        if next_page:
            yield response.follow(next_page[0].attrib['href'])

result = QuotesSpider().start()
print(f"Scraped {len(result.items)} quotes")
result.items.to_json("quotes.json")

在单个爬虫中使用多会话类型：

python

from scrapling.spiders import Spider, Request, Response
from scrapling.fetchers import FetcherSession, AsyncStealthySession

class MultiSessionSpider(Spider):
    name = "multi"
    start_urls = ["https://example.com/"]
    
    def configure_sessions(self, manager):
        manager.add("fast", FetcherSession(impersonate="chrome"))
        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
    
    async def parse(self, response: Response):
        for link in response.css('a::attr(href)').getall():
            # 受保护的页面通过隐形会话路由
            if "protected" in link:
                yield Request(link, sid="stealth")
            else:
                yield Request(link, sid="fast", callback=self.parse)  # 显式指定回调

通过以下方式运行爬虫即可实现断点续爬：

python

QuotesSpider(crawldir="./crawl_data").start()

按下Ctrl+C即可优雅暂停，进度会自动保存。后续重新启动爬虫时传入相同的

crawldir

，即可从停止位置恢复爬取。

Advanced Parsing & Navigation

高级解析与导航

python

from scrapling.fetchers import Fetcher

python

from scrapling.fetchers import Fetcher

Rich element selection and navigation

丰富的元素选择与导航能力

page = Fetcher.get('https://quotes.toscrape.com/')

Get quotes with multiple selection methods

多种选择器获取名言

quotes = page.css('.quote') # CSS selector quotes = page.xpath('//div[@class="quote"]') # XPath quotes = page.find_all('div', {'class': 'quote'}) # BeautifulSoup-style

quotes = page.css('.quote') # CSS选择器 quotes = page.xpath('//div[@class="quote"]') # XPath quotes = page.find_all('div', {'class': 'quote'}) # BeautifulSoup风格

Same as

等价于

quotes = page.find_all('div', class_='quote') quotes = page.find_all(['div'], class_='quote') quotes = page.find_all(class_='quote') # and so on...

quotes = page.find_all('div', class_='quote') quotes = page.find_all(['div'], class_='quote') quotes = page.find_all(class_='quote') # 还有更多用法...

Find element by text content

通过文本内容查找元素

quotes = page.find_by_text('quote', tag='div')

Advanced navigation

高级导航

quote_text = page.css('.quote')[0].css('.text::text').get() quote_text = page.css('.quote').css('.text::text').getall() # Chained selectors first_quote = page.css('.quote')[0] author = first_quote.next_sibling.css('.author::text') parent_container = first_quote.parent

quote_text = page.css('.quote')[0].css('.text::text').get() quote_text = page.css('.quote').css('.text::text').getall() # 链式选择器 first_quote = page.css('.quote')[0] author = first_quote.next_sibling.css('.author::text') parent_container = first_quote.parent

Element relationships and similarity

元素关系与相似度

similar_elements = first_quote.find_similar() below_elements = first_quote.below_elements()

You can use the parser right away if you don't want to fetch websites like below:
```python
from scrapling.parser import Selector

page = Selector("<html>...</html>")

And it works precisely the same way!

imilar_elements = first_quote.find_similar() below_elements = first_quote.below_elements()

如果你不需要抓取网站，也可以直接使用解析器：
```python
from scrapling.parser import Selector

page = Selector("<html>...</html>")

使用方式完全一致！

Async Session Management Examples

异步会话管理示例

python

import asyncio
from scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession

async with FetcherSession(http3=True) as session:  # `FetcherSession` is context-aware and can work in both sync/async patterns
    page1 = session.get('https://quotes.toscrape.com/')
    page2 = session.get('https://quotes.toscrape.com/', impersonate='firefox135')

python

import asyncio
from scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession

async with FetcherSession(http3=True) as session:  # `FetcherSession`支持上下文感知，可同时在同步/异步模式下工作
    page1 = session.get('https://quotes.toscrape.com/')
    page2 = session.get('https://quotes.toscrape.com/', impersonate='firefox135')

Async session usage

异步会话使用

async with AsyncStealthySession(max_pages=2) as session: tasks = [] urls = ['https://example.com/page1', 'https://example.com/page2']

for url in urls:
    task = session.fetch(url)
    tasks.append(task)

print(session.get_pool_stats())  # Optional - The status of the browser tabs pool (busy/free/error)
results = await asyncio.gather(*tasks)
print(session.get_pool_stats())

undefined

async with AsyncStealthySession(max_pages=2) as session: tasks = [] urls = ['https://example.com/page1', 'https://example.com/page2']

for url in urls:
    task = session.fetch(url)
    tasks.append(task)

print(session.get_pool_stats())  # 可选 - 浏览器标签池状态（繁忙/空闲/错误）
results = await asyncio.gather(*tasks)
print(session.get_pool_stats())

undefined

References

参考文档

You already had a good glimpse of what the library can do. Use the references below to dig deeper when needed

```
references/mcp-server.md
```
— MCP server tools and capabilities
```
references/parsing
```
— Everything you need for parsing HTML
```
references/fetching
```
— Everything you need to fetch websites and session persistence
```
references/spiders
```
— Everything you need to write spiders, proxy rotation, and advanced features. It follows a Scrapy-like format
```
references/migrating_from_beautifulsoup.md
```
— A quick API comparison between scrapling and Beautifulsoup
```
https://github.com/D4Vinci/Scrapling/tree/main/docs
```
— Full official docs in Markdown for quick access (use only if current references do not look up-to-date).

This skill encapsulates almost all the published documentation in Markdown, so don't check external sources or search online without the user's permission.

你已经大致了解了这个库的能力，需要深入了解时可以使用以下参考文档：

```
references/mcp-server.md
```
— MCP服务器工具与能力
```
references/parsing
```
— HTML解析相关的所有内容
```
references/fetching
```
— 网站抓取与会话持久化相关的所有内容
```
references/spiders
```
— 编写爬虫、代理轮换、高级功能相关的所有内容，遵循类Scrapy的格式
```
references/migrating_from_beautifulsoup.md
```
— Scrapling与Beautifulsoup的快速API对比
```
https://github.com/D4Vinci/Scrapling/tree/main/docs
```
— 完整的官方Markdown文档，仅当当前参考文档不是最新版本时使用

本skill封装了几乎所有已发布的Markdown格式文档，未经用户许可请勿查阅外部资源或在线搜索。

Guardrails (Always)

安全规则（始终遵守）

Only scrape content you're authorized to access.
Respect robots.txt and ToS.
Add delays (download_delay) for large crawls.
Don't bypass paywalls or authentication without permission.
Never scrape personal/sensitive data.

仅抓取你有权访问的内容
遵守robots.txt和服务条款
大规模爬取时添加延迟（download_delay）
未经许可不要绕过付费墙或身份验证
绝不抓取个人/敏感数据