Loading...
Loading...
Found 31 Skills
新闻站点内容提取。支持 12 个平台:微信公众号、今日头条、网易新闻、搜狐新闻、腾讯新闻、BBC News、CNN News、Twitter/X、Lenny's Newsletter、Naver Blog、Detik News、Quora。当用户需要提取新闻内容、抓取公众号文章、爬取新闻、或获取新闻JSON/Markdown时激活。
Expert guidance for building web scrapers and crawlers using the Scrapy Python framework with best practices for spider development, data extraction, and pipeline management.
Complete knowledge domain for Firecrawl v2 API - web scraping and crawling that converts websites into LLM-ready markdown or structured data. Use when: scraping websites, crawling entire sites, extracting web content, converting HTML to markdown, building web scrapers, handling dynamic JavaScript content, bypassing anti-bot protection, extracting structured data from web pages, or when encountering "content not loading", "JavaScript rendering issues", or "blocked by bot detection". Keywords: firecrawl, firecrawl api, web scraping, web crawler, scrape website, crawl website, extract content, html to markdown, site crawler, content extraction, web automation, firecrawl-py, firecrawl-js, llm ready data, structured data extraction, bot bypass, javascript rendering, scraping api, crawling api, map urls, batch scraping
Comprehensive SEO analysis for any website or business type. Performs full site audits, single-page deep analysis, technical SEO checks (crawlability, indexability, Core Web Vitals with INP), schema markup detection/validation/generation, content quality assessment (E-E-A-T framework per Dec 2025 update extending to all competitive queries), image optimization, sitemap analysis, and Generative Engine Optimization (GEO) for AI Overviews, ChatGPT, and Perplexity citations. Analyzes AI crawler accessibility (GPTBot, ClaudeBot, PerplexityBot), llms.txt compliance, brand mention signals, and passage-level citability. Industry detection for SaaS, e-commerce, local business, publishers, agencies. Triggers on: "SEO", "audit", "schema", "Core Web Vitals", "sitemap", "E-E-A-T", "AI Overviews", "GEO", "technical SEO", "content quality", "page speed", "structured data".
When the user wants to choose or optimize rendering strategy for SEO. Also use when the user mentions "SSR," "SSG," "CSR," "ISR," "static rendering," "dynamic rendering," "server-side rendering," "client-side rendering," "JavaScript rendering," "pre-rendering," "prerender," "content in initial HTML," or "crawler visibility."
Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scraping. 智能搜索爬取 | Free, no API key required.
This skill should be used when the user asks to "audit for AI visibility", "optimize for ChatGPT", "check GEO readiness", "analyze hedge density", "generate agentfacts", "check if my site works with AI search", "test LLM crawlability", "check discovery gap", or mentions Generative Engine Optimization, AI crawlers, Perplexity discoverability, or NANDA protocol.
Apply Web Scraping with Python practices (Ryan Mitchell). Covers First Scrapers (Ch 1: urllib, BeautifulSoup), HTML Parsing (Ch 2: find, findAll, CSS selectors, regex, lambda), Crawling (Ch 3-4: single-domain, cross-site, crawl models), Scrapy (Ch 5: spiders, items, pipelines, rules), Storing Data (Ch 6: CSV, MySQL, files, email), Reading Documents (Ch 7: PDF, Word, encoding), Cleaning Data (Ch 8: normalization, OpenRefine), NLP (Ch 9: n-grams, Markov, NLTK), Forms & Logins (Ch 10: POST, sessions, cookies), JavaScript (Ch 11: Selenium, headless, Ajax), APIs (Ch 12: REST, undocumented), Image/OCR (Ch 13: Pillow, Tesseract), Avoiding Traps (Ch 14: headers, honeypots), Testing (Ch 15: unittest, Selenium), Parallel (Ch 16: threads, processes), Remote (Ch 17: Tor, proxies), Legalities (Ch 18: robots.txt, CFAA, ethics). Trigger on "web scraping", "BeautifulSoup", "Scrapy", "crawler", "spider", "scraper", "parse HTML", "Selenium scraping", "data extraction".
Scrape web pages using Scrapling with anti-bot bypass (like Cloudflare Turnstile), stealth headless browsing, spiders framework, adaptive scraping, and JavaScript rendering. Use when asked to scrape, crawl, or extract data from websites; web_fetch fails; the site has anti-bot protections; write Python code to scrape/crawl; or write spiders.
Crawl WeChat Official Account articles, supporting full-text extraction, mixed text and image layout, local image download, and summary generation. Call this when you need to access WeChat Official Account links and convert them to Markdown.
Comprehensive news aggregator that fetches, filters, and deeply analyzes real-time content from 8 major sources: Hacker News, GitHub Trending, Product Hunt, 36Kr, Tencent News, WallStreetCN, V2EX, and Weibo. Best for 'daily scans', 'tech news briefings', 'finance updates', and 'deep interpretations' of hot topics.
Comprehensive web scraping, crawling, and data extraction toolkit powered by Firecrawl API. Provides scripts for single-page scraping (scrape.py), web search (search.py), URL discovery (map.py), multi-page crawling (crawl.py), structured data extraction (extract.py), and autonomous data gathering (agent.py). Use when you need to: (1) extract content from web pages, (2) search and scrape the web, (3) discover URLs on websites, (4) crawl multiple pages, (5) extract structured data with JSON schemas, or (6) autonomously gather data from anywhere on the web. Requires FIRECRAWL_API_KEY environment variable.