scrapling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Scrapling

Undetectable, adaptive, high-performance Python library for web data extraction. The first scraping library that automatically learns from website changes and survives structure updates.

一款无痕迹、自适应、高性能的Python网页数据提取库。这是首个可自动学习网站变化、适应结构更新的爬虫库。

When to Use

适用场景

Extracting data from websites that change their HTML structure frequently
Bypassing anti-bot protections (Cloudflare Turnstile, WAFs, fingerprinting)
High-performance web data collection at scale
Replacing brittle BeautifulSoup/Scrapy selectors with adaptive element tracking
AI-assisted data extraction via the built-in MCP server (Claude/Cursor integration)
Interactive web exploration and debugging via the CLI shell

从HTML结构频繁变化的网站提取数据
绕过反爬虫防护（Cloudflare Turnstile、WAFs、指纹识别）
大规模高性能网页数据采集
用自适应元素追踪替代易失效的BeautifulSoup/Scrapy选择器
通过内置MCP服务器实现AI辅助数据提取（支持Claude/Cursor集成）
通过CLI shell进行交互式网页探索与调试

How It Works

工作原理

Scrapling combines three capabilities:

Smart Fetching — Three fetcher tiers for different protection levels:
- ```
Fetcher
```
  — Fast HTTP with TLS fingerprinting and stealth headers
- ```
StealthyFetcher
```
  — Modified Firefox with fingerprint spoofing, bypasses Cloudflare
- ```
DynamicFetcher
```
  — Full Playwright browser automation in stealth mode
Adaptive Parsing — Tracks elements via similarity algorithms. When a website changes its structure, Scrapling automatically relocates the elements you need instead of breaking.
Developer Tools — Interactive IPython shell, CLI extraction commands, curl-to-Scrapling conversion, and an MCP server for AI agent integration.

Scrapling整合了三大核心能力：

智能抓取 — 针对不同防护级别提供三种抓取器：
- ```
Fetcher
```
  — 具备TLS指纹伪装和隐身请求头的快速HTTP抓取器
- ```
StealthyFetcher
```
  — 经过修改的Firefox浏览器，可伪造指纹，绕过Cloudflare
- ```
DynamicFetcher
```
  — 采用隐身模式的完整Playwright浏览器自动化工具
自适应解析 — 通过相似度算法追踪元素。当网站结构变化时，Scrapling会自动重新定位所需元素，而非直接失效。
开发工具 — 交互式IPython shell、CLI提取命令、curl转Scrapling代码工具，以及用于AI Agent集成的MCP服务器。

Quick Start

快速开始

bash

pip install "scrapling[all]" && scrapling install

python

from scrapling.fetchers import StealthyFetcher

url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True)  # Adaptive stealth fetching
products = page.css('.product', adaptive=True)   # Survives site changes

for product in products:
    print(product.css_first('.title').text)
    print(product.css_first('.price').text)

bash

pip install "scrapling[all]" && scrapling install

python

from scrapling.fetchers import StealthyFetcher

url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True)  # 自适应隐身抓取
products = page.css('.product', adaptive=True)   # 可适应网站变化

for product in products:
    print(product.css_first('.title').text)
    print(product.css_first('.price').text)

Features

功能特性

Adaptive element tracking — Auto-relocates elements after site structure changes via similarity algorithms
Three fetcher tiers — Static HTTP, stealth Firefox, full Playwright browser automation
Anti-bot bypass — Defeats Cloudflare Turnstile, WAFs, TLS fingerprinting, and browser detection
Blazing fast — Outperforms Parsel, Scrapy, and BeautifulSoup in benchmarks
CSS + XPath selectors — Plus text/regex search, BeautifulSoup-style navigation, auto-selector generation
Async support — All fetchers support async/await
Persistent sessions — FetcherSession, StealthySession, DynamicSession (sync and async)
Interactive shell —
```
scrapling shell
```
for live exploration, curl conversion, browser previews

CLI extraction —

scrapling extract get URL output.md --css-selector

MCP server — AI integration for Claude, Cursor, and other MCP-compatible agents
Docker ready —
```
docker pull pyd4vinci/scrapling
```
(includes all browsers)

自适应元素追踪 — 通过相似度算法，在网站结构变化后自动重新定位元素
三级抓取器 — 静态HTTP、隐身Firefox、完整Playwright浏览器自动化
反爬虫绕过 — 破解Cloudflare Turnstile、WAFs、TLS指纹识别及浏览器检测
极速性能 — 在基准测试中表现优于Parsel、Scrapy和BeautifulSoup
CSS + XPath选择器 — 支持文本/正则搜索、BeautifulSoup式导航、自动选择器生成
异步支持 — 所有抓取器均支持async/await
持久化会话 — FetcherSession、StealthySession、DynamicSession（同步和异步版本）
交互式shell —
```
scrapling shell
```
用于实时探索、curl转换、浏览器预览

CLI提取 —

scrapling extract get URL output.md --css-selector

MCP服务器 — 支持与Claude、Cursor及其他兼容MCP的AI Agent集成
Docker就绪 —
```
docker pull pyd4vinci/scrapling
```
（包含所有浏览器）

Performance

性能对比

Test	Scrapling	BeautifulSoup	Speedup
Text extraction (5k elements)	1.92ms	1283ms	~668x
Element similarity matching	1.87ms	N/A	—

测试项	Scrapling	BeautifulSoup	性能提升
文本提取（5000个元素）	1.92ms	1283ms	~668倍
元素相似度匹配	1.87ms	不支持	—

Installation Options

安装选项

bash

pip install scrapling                     # Core parser only
pip install "scrapling[fetchers]"         # + browser fetchers
scrapling install                         # Install browser engines
pip install "scrapling[all]"              # Everything

bash

pip install scrapling                     # 仅核心解析器
pip install "scrapling[fetchers]"         # + 浏览器抓取器
scrapling install                         # 安装浏览器引擎
pip install "scrapling[all]"              # 完整安装

Source

来源

Repository: github.com/D4Vinci/Scrapling (8k+ stars)
Documentation: scrapling.readthedocs.io
Author: D4Vinci

代码仓库: github.com/D4Vinci/Scrapling（8000+星标）
文档: scrapling.readthedocs.io
作者: D4Vinci