scrapling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Scrapling

Scrapling

Undetectable, adaptive, high-performance Python library for web data extraction. The first scraping library that automatically learns from website changes and survives structure updates.
一款无痕迹、自适应、高性能的Python网页数据提取库。这是首个可自动学习网站变化、适应结构更新的爬虫库。

When to Use

适用场景

  • Extracting data from websites that change their HTML structure frequently
  • Bypassing anti-bot protections (Cloudflare Turnstile, WAFs, fingerprinting)
  • High-performance web data collection at scale
  • Replacing brittle BeautifulSoup/Scrapy selectors with adaptive element tracking
  • AI-assisted data extraction via the built-in MCP server (Claude/Cursor integration)
  • Interactive web exploration and debugging via the CLI shell
  • 从HTML结构频繁变化的网站提取数据
  • 绕过反爬虫防护(Cloudflare Turnstile、WAFs、指纹识别)
  • 大规模高性能网页数据采集
  • 用自适应元素追踪替代易失效的BeautifulSoup/Scrapy选择器
  • 通过内置MCP服务器实现AI辅助数据提取(支持Claude/Cursor集成)
  • 通过CLI shell进行交互式网页探索与调试

How It Works

工作原理

Scrapling combines three capabilities:
  1. Smart Fetching — Three fetcher tiers for different protection levels:
    • Fetcher
      — Fast HTTP with TLS fingerprinting and stealth headers
    • StealthyFetcher
      — Modified Firefox with fingerprint spoofing, bypasses Cloudflare
    • DynamicFetcher
      — Full Playwright browser automation in stealth mode
  2. Adaptive Parsing — Tracks elements via similarity algorithms. When a website changes its structure, Scrapling automatically relocates the elements you need instead of breaking.
  3. Developer Tools — Interactive IPython shell, CLI extraction commands, curl-to-Scrapling conversion, and an MCP server for AI agent integration.
Scrapling整合了三大核心能力:
  1. 智能抓取 — 针对不同防护级别提供三种抓取器:
    • Fetcher
      — 具备TLS指纹伪装和隐身请求头的快速HTTP抓取器
    • StealthyFetcher
      — 经过修改的Firefox浏览器,可伪造指纹,绕过Cloudflare
    • DynamicFetcher
      — 采用隐身模式的完整Playwright浏览器自动化工具
  2. 自适应解析 — 通过相似度算法追踪元素。当网站结构变化时,Scrapling会自动重新定位所需元素,而非直接失效。
  3. 开发工具 — 交互式IPython shell、CLI提取命令、curl转Scrapling代码工具,以及用于AI Agent集成的MCP服务器。

Quick Start

快速开始

bash
pip install "scrapling[all]" && scrapling install
python
from scrapling.fetchers import StealthyFetcher

url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True)  # Adaptive stealth fetching
products = page.css('.product', adaptive=True)   # Survives site changes

for product in products:
    print(product.css_first('.title').text)
    print(product.css_first('.price').text)
bash
pip install "scrapling[all]" && scrapling install
python
from scrapling.fetchers import StealthyFetcher

url = 'https://example.com'
page = StealthyFetcher.get(url, headless=True)  # 自适应隐身抓取
products = page.css('.product', adaptive=True)   # 可适应网站变化

for product in products:
    print(product.css_first('.title').text)
    print(product.css_first('.price').text)

Features

功能特性

  • Adaptive element tracking — Auto-relocates elements after site structure changes via similarity algorithms
  • Three fetcher tiers — Static HTTP, stealth Firefox, full Playwright browser automation
  • Anti-bot bypass — Defeats Cloudflare Turnstile, WAFs, TLS fingerprinting, and browser detection
  • Blazing fast — Outperforms Parsel, Scrapy, and BeautifulSoup in benchmarks
  • CSS + XPath selectors — Plus text/regex search, BeautifulSoup-style navigation, auto-selector generation
  • Async support — All fetchers support async/await
  • Persistent sessions — FetcherSession, StealthySession, DynamicSession (sync and async)
  • Interactive shell
    scrapling shell
    for live exploration, curl conversion, browser previews
  • CLI extraction
    scrapling extract get URL output.md --css-selector
  • MCP server — AI integration for Claude, Cursor, and other MCP-compatible agents
  • Docker ready
    docker pull pyd4vinci/scrapling
    (includes all browsers)
  • 自适应元素追踪 — 通过相似度算法,在网站结构变化后自动重新定位元素
  • 三级抓取器 — 静态HTTP、隐身Firefox、完整Playwright浏览器自动化
  • 反爬虫绕过 — 破解Cloudflare Turnstile、WAFs、TLS指纹识别及浏览器检测
  • 极速性能 — 在基准测试中表现优于Parsel、Scrapy和BeautifulSoup
  • CSS + XPath选择器 — 支持文本/正则搜索、BeautifulSoup式导航、自动选择器生成
  • 异步支持 — 所有抓取器均支持async/await
  • 持久化会话 — FetcherSession、StealthySession、DynamicSession(同步和异步版本)
  • 交互式shell
    scrapling shell
    用于实时探索、curl转换、浏览器预览
  • CLI提取
    scrapling extract get URL output.md --css-selector
  • MCP服务器 — 支持与Claude、Cursor及其他兼容MCP的AI Agent集成
  • Docker就绪
    docker pull pyd4vinci/scrapling
    (包含所有浏览器)

Performance

性能对比

TestScraplingBeautifulSoupSpeedup
Text extraction (5k elements)1.92ms1283ms~668x
Element similarity matching1.87msN/A
测试项ScraplingBeautifulSoup性能提升
文本提取(5000个元素)1.92ms1283ms~668倍
元素相似度匹配1.87ms不支持

Installation Options

安装选项

bash
pip install scrapling                     # Core parser only
pip install "scrapling[fetchers]"         # + browser fetchers
scrapling install                         # Install browser engines
pip install "scrapling[all]"              # Everything
bash
pip install scrapling                     # 仅核心解析器
pip install "scrapling[fetchers]"         # + 浏览器抓取器
scrapling install                         # 安装浏览器引擎
pip install "scrapling[all]"              # 完整安装

Source

来源