scrapling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseScrapling
Scrapling
Undetectable, adaptive, high-performance Python library for web data extraction. The first scraping library that automatically learns from website changes and survives structure updates.
一款无痕迹、自适应、高性能的Python网页数据提取库。这是首个可自动学习网站变化、适应结构更新的爬虫库。
When to Use
适用场景
- Extracting data from websites that change their HTML structure frequently
- Bypassing anti-bot protections (Cloudflare Turnstile, WAFs, fingerprinting)
- High-performance web data collection at scale
- Replacing brittle BeautifulSoup/Scrapy selectors with adaptive element tracking
- AI-assisted data extraction via the built-in MCP server (Claude/Cursor integration)
- Interactive web exploration and debugging via the CLI shell
- 从HTML结构频繁变化的网站提取数据
- 绕过反爬虫防护(Cloudflare Turnstile、WAFs、指纹识别)
- 大规模高性能网页数据采集
- 用自适应元素追踪替代易失效的BeautifulSoup/Scrapy选择器
- 通过内置MCP服务器实现AI辅助数据提取(支持Claude/Cursor集成)
- 通过CLI shell进行交互式网页探索与调试
How It Works
工作原理
Scrapling combines three capabilities:
-
Smart Fetching — Three fetcher tiers for different protection levels:
- — Fast HTTP with TLS fingerprinting and stealth headers
Fetcher - — Modified Firefox with fingerprint spoofing, bypasses Cloudflare
StealthyFetcher - — Full Playwright browser automation in stealth mode
DynamicFetcher
-
Adaptive Parsing — Tracks elements via similarity algorithms. When a website changes its structure, Scrapling automatically relocates the elements you need instead of breaking.
-
Developer Tools — Interactive IPython shell, CLI extraction commands, curl-to-Scrapling conversion, and an MCP server for AI agent integration.
Scrapling整合了三大核心能力:
-
智能抓取 — 针对不同防护级别提供三种抓取器:
- — 具备TLS指纹伪装和隐身请求头的快速HTTP抓取器
Fetcher - — 经过修改的Firefox浏览器,可伪造指纹,绕过Cloudflare
StealthyFetcher - — 采用隐身模式的完整Playwright浏览器自动化工具
DynamicFetcher
-
自适应解析 — 通过相似度算法追踪元素。当网站结构变化时,Scrapling会自动重新定位所需元素,而非直接失效。
-
开发工具 — 交互式IPython shell、CLI提取命令、curl转Scrapling代码工具,以及用于AI Agent集成的MCP服务器。
Quick Start
快速开始
bash
pip install "scrapling[all]" && scrapling installpython
from scrapling.fetchers import StealthyFetcher
url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True) # Adaptive stealth fetching
products = page.css('.product', adaptive=True) # Survives site changes
for product in products:
print(product.css_first('.title').text)
print(product.css_first('.price').text)bash
pip install "scrapling[all]" && scrapling installpython
from scrapling.fetchers import StealthyFetcher
url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True) # 自适应隐身抓取
products = page.css('.product', adaptive=True) # 可适应网站变化
for product in products:
print(product.css_first('.title').text)
print(product.css_first('.price').text)Features
功能特性
- Adaptive element tracking — Auto-relocates elements after site structure changes via similarity algorithms
- Three fetcher tiers — Static HTTP, stealth Firefox, full Playwright browser automation
- Anti-bot bypass — Defeats Cloudflare Turnstile, WAFs, TLS fingerprinting, and browser detection
- Blazing fast — Outperforms Parsel, Scrapy, and BeautifulSoup in benchmarks
- CSS + XPath selectors — Plus text/regex search, BeautifulSoup-style navigation, auto-selector generation
- Async support — All fetchers support async/await
- Persistent sessions — FetcherSession, StealthySession, DynamicSession (sync and async)
- Interactive shell — for live exploration, curl conversion, browser previews
scrapling shell - CLI extraction —
scrapling extract get URL output.md --css-selector - MCP server — AI integration for Claude, Cursor, and other MCP-compatible agents
- Docker ready — (includes all browsers)
docker pull pyd4vinci/scrapling
- 自适应元素追踪 — 通过相似度算法,在网站结构变化后自动重新定位元素
- 三级抓取器 — 静态HTTP、隐身Firefox、完整Playwright浏览器自动化
- 反爬虫绕过 — 破解Cloudflare Turnstile、WAFs、TLS指纹识别及浏览器检测
- 极速性能 — 在基准测试中表现优于Parsel、Scrapy和BeautifulSoup
- CSS + XPath选择器 — 支持文本/正则搜索、BeautifulSoup式导航、自动选择器生成
- 异步支持 — 所有抓取器均支持async/await
- 持久化会话 — FetcherSession、StealthySession、DynamicSession(同步和异步版本)
- 交互式shell — 用于实时探索、curl转换、浏览器预览
scrapling shell - CLI提取 —
scrapling extract get URL output.md --css-selector - MCP服务器 — 支持与Claude、Cursor及其他兼容MCP的AI Agent集成
- Docker就绪 — (包含所有浏览器)
docker pull pyd4vinci/scrapling
Performance
性能对比
| Test | Scrapling | BeautifulSoup | Speedup |
|---|---|---|---|
| Text extraction (5k elements) | 1.92ms | 1283ms | ~668x |
| Element similarity matching | 1.87ms | N/A | — |
| 测试项 | Scrapling | BeautifulSoup | 性能提升 |
|---|---|---|---|
| 文本提取(5000个元素) | 1.92ms | 1283ms | ~668倍 |
| 元素相似度匹配 | 1.87ms | 不支持 | — |
Installation Options
安装选项
bash
pip install scrapling # Core parser only
pip install "scrapling[fetchers]" # + browser fetchers
scrapling install # Install browser engines
pip install "scrapling[all]" # Everythingbash
pip install scrapling # 仅核心解析器
pip install "scrapling[fetchers]" # + 浏览器抓取器
scrapling install # 安装浏览器引擎
pip install "scrapling[all]" # 完整安装Source
来源
- Repository: github.com/D4Vinci/Scrapling (8k+ stars)
- Documentation: scrapling.readthedocs.io
- Author: D4Vinci
- 代码仓库: github.com/D4Vinci/Scrapling(8000+星标)
- 文档: scrapling.readthedocs.io
- 作者: D4Vinci