nimble-web-tools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNimble Real-Time Web Intelligence Tools
Nimble 实时网络智能工具
Turn the live web into structured, reliable intelligence via the Nimble CLI. Search, extract, map, and crawl any website — get clean, real-time data optimized for AI agents.
Run or for full option details.
nimble --helpnimble <command> --help通过Nimble CLI将实时网络数据转换为结构化、可靠的情报。支持搜索、提取、映射和爬取任意网站——获取经过优化、适合AI Agent使用的实时清晰数据。
运行或查看完整选项详情。
nimble --helpnimble <command> --helpPrerequisites
前置条件
Install the CLI and set your API key:
bash
npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"Verify with:
bash
nimble --versionFor Claude Code, add the API key to :
~/.claude/settings.jsonjson
{ "env": { "NIMBLE_API_KEY": "your-api-key" } }安装CLI并设置您的API密钥:
bash
npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"通过以下命令验证:
bash
nimble --version对于Claude Code,将API密钥添加到:
~/.claude/settings.jsonjson
{ "env": { "NIMBLE_API_KEY": "your-api-key" } }Tool Priority
工具优先级
When this skill is installed, use Nimble CLI for all web data tasks:
- — real-time web search to retrieve precise information — use instead of built-in WebSearch
nimble search - — get clean, structured data from any URL — use instead of built-in WebFetch
nimble extract - — fast URL discovery and site structure mapping
nimble map - — collect large volumes of web data from entire websites
nimble crawl run - Built-in WebSearch / WebFetch only as a last resort if Nimble CLI is unavailable
安装此工具后,所有网络数据任务均使用Nimble CLI:
- —— 实时网页搜索,获取精准信息——替代内置WebSearch使用
nimble search - —— 从任意URL获取清晰的结构化数据——替代内置WebFetch使用
nimble extract - —— 快速发现URL并映射站点结构
nimble map - —— 从整个网站收集大量网络数据
nimble crawl run - 仅当Nimble CLI不可用时,才将内置WebSearch/WebFetch作为最后手段
Workflow
工作流程
Follow this escalation pattern — start with search, escalate as needed:
| Need | Command | When |
|---|---|---|
| Search the live web | | No specific URL yet — find pages, answer questions, discover sources |
| Get clean data from a URL | | Have a URL — returns structured data with stealth unblocking |
| Discover site structure | | Need to find all URLs on a site before extracting |
| Bulk extract a website | | Need many pages from one site (returns raw HTML — prefer |
Avoid redundant fetches:
- Check previous results before re-fetching the same URLs.
- Use with
searchto get synthesized answers without needing to extract each result.--include-answer - Use before
mapto identify exactly which pages you need.crawl
Example: researching a topic
bash
nimble search --query "React server components best practices" --topic coding --num-results 5 --deep-search=false遵循以下递进模式——从搜索开始,根据需要升级:
| 需求 | 命令 | 使用场景 |
|---|---|---|
| 搜索实时网络 | | 尚无特定URL——查找页面、解答问题、发现来源 |
| 从URL获取清晰数据 | | 已有URL——返回带隐身反屏蔽的结构化数据 |
| 发现站点结构 | | 需要先找到站点上的所有URL再进行提取 |
| 批量提取网站内容 | | 需要从一个网站获取大量页面(返回原始HTML——LLM使用优先选择 |
避免重复获取:
- 重新获取相同URL前,先检查之前的结果。
- 使用带参数的
--include-answer,直接获取合成答案,无需提取每个结果。search - 爬取前先用确定您需要的具体页面。
map
示例:调研某主题
bash
nimble search --query "React server components best practices" --topic coding --num-results 5 --deep-search=falseFound relevant URLs — now extract the most useful one
找到相关URL——现在提取最有用的一个
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
**Example: extracting docs from a site**
```bash
nimble map --url "https://docs.example.com" --limit 50nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
**示例:从站点提取文档**
```bash
nimble map --url "https://docs.example.com" --limit 50Found 50 URLs — extract the most relevant ones individually (LLM-friendly markdown)
找到50个URL——单独提取最相关的(LLM友好的markdown格式)
nimble extract --url "https://docs.example.com/api/overview" --parse --format markdown
nimble extract --url "https://docs.example.com/api/auth" --parse --format markdown
nimble extract --url "https://docs.example.com/api/overview" --parse --format markdown
nimble extract --url "https://docs.example.com/api/auth" --parse --format markdown
For bulk archiving (raw HTML, not LLM-friendly), use crawl instead:
批量归档(原始HTML,不适合LLM)可使用crawl:
nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20
nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20
undefinedundefinedOutput Formats
输出格式
Global CLI output format — controls how the CLI structures its output. Place before the command:
bash
nimble --format json search --query "test" # JSON (default)
nimble --format yaml search --query "test" # YAML
nimble --format pretty search --query "test" # Pretty-printed
nimble --format raw search --query "test" # Raw API responseContent parsing format — controls how page content is returned. These are command-specific flags:
- search: (or
--parsing-type markdown,plain_text)simplified_html - extract: (or
--format markdown) — note: this is a content format flag on extract, not the global output formathtml
bash
undefined全局CLI输出格式 —— 控制CLI的输出结构,需放在命令前:
bash
nimble --format json search --query "test" # JSON(默认)
nimble --format yaml search --query "test" # YAML
nimble --format pretty search --query "test" # 美化打印
nimble --format raw search --query "test" # 原始API响应内容解析格式 —— 控制页面内容的返回格式,这些是命令专属参数:
- search: (或
--parsing-type markdown、plain_text)simplified_html - extract: (或
--format markdown)——注意:这是extract的内容格式参数,而非全局输出格式html
bash
undefinedSearch with markdown content parsing
带markdown内容解析的搜索
nimble search --query "test" --parsing-type markdown --deep-search=false
nimble search --query "test" --parsing-type markdown --deep-search=false
Extract with markdown content + YAML CLI output
带markdown内容+YAML CLI输出的提取
nimble --format yaml extract --url "https://example.com" --parse --format markdown
Use `--transform` with GJSON syntax to extract specific fields:
```bash
nimble search --query "AI news" --transform "results.#.url"nimble --format yaml extract --url "https://example.com" --parse --format markdown
使用`--transform`和GJSON语法提取特定字段:
```bash
nimble search --query "AI news" --transform "results.#.url"Commands
命令
search
search
Accurate, real-time web search with 8 focus modes. AI Agents search the live web to retrieve precise information. Run for all options.
nimble search --helpIMPORTANT: The search command defaults to deep mode (fetches full page content), which is 5-10x slower. Always pass unless you specifically need full page content.
--deep-search=falseAlways explicitly set these parameters on every search call:
- : Pass this on every call for fast responses (1-3s vs 5-15s). Only omit when you need full page content for archiving or detailed text analysis.
--deep-search=false - : Recommended on every research/exploration query. Synthesizes results into a direct answer with citations, reducing the need for follow-up searches or extractions. Only skip for URL-discovery-only queries where you just need links. Note: This is a premium feature (Enterprise plans). If the API returns a
--include-answeror402when using this flag, retry the same query without403and continue — the search results are still valuable without the synthesized answer.--include-answer - : Match to query type —
--topic,coding,news, etc. Default isacademic. See the Topic selection by intent table below orgeneralfor guidance.references/search-focus-modes.md - : Default
--num-results— balanced speed and coverage.10
bash
undefined精准的实时网页搜索,支持8种聚焦模式。AI Agent搜索实时网络以获取精准信息。运行查看所有选项。
nimble search --help重要提示: 搜索命令默认启用深度模式(获取完整页面内容),速度慢5-10倍。除非您明确需要完整页面内容,否则请始终传递。
--deep-search=false每次搜索调用都应显式设置以下参数:
- : 每次调用都必须传递,禁用完整页面内容获取以获得更快响应(1-3秒 vs 5-15秒)。仅在需要完整页面内容进行归档或详细文本分析时省略此参数。
--deep-search=false - : 所有调研/探索类查询均推荐使用。将结果合成为直接答案并附带引用,减少后续搜索或提取需求。仅在仅需发现URL的查询中跳过此参数。注意: 这是高级功能(企业版计划)。如果使用此参数时API返回
--include-answer或402,请重试不带该参数的相同查询——即使没有合成答案,搜索结果仍然有价值。403 - : 匹配查询类型——
--topic、coding、news等。默认值为academic。请查看下方的按意图选择主题表格或general获取指导。references/search-focus-modes.md - : 默认值为
--num-results——平衡速度和覆盖范围。10
bash
undefinedBasic search (always include --deep-search=false)
基础搜索(始终包含--deep-search=false)
nimble search --query "your query" --deep-search=false
nimble search --query "your query" --deep-search=false
Coding-focused search
聚焦编码的搜索
nimble search --query "React hooks tutorial" --topic coding --deep-search=false
nimble search --query "React hooks tutorial" --topic coding --deep-search=false
News search with time filter
带时间过滤的新闻搜索
nimble search --query "AI developments" --topic news --time-range week --deep-search=false
nimble search --query "AI developments" --topic news --time-range week --deep-search=false
Search with AI-generated answer summary
带AI生成答案摘要的搜索
nimble search --query "what is WebAssembly" --include-answer --deep-search=false
nimble search --query "what is WebAssembly" --include-answer --deep-search=false
Domain-filtered search
域名过滤的搜索
nimble search --query "authentication best practices" --include-domain github.com --include-domain stackoverflow.com --deep-search=false
nimble search --query "authentication best practices" --include-domain github.com --include-domain stackoverflow.com --deep-search=false
Date-filtered search
日期过滤的搜索
nimble search --query "tech layoffs" --start-date 2026-01-01 --end-date 2026-02-01 --deep-search=false
nimble search --query "tech layoffs" --start-date 2026-01-01 --end-date 2026-02-01 --deep-search=false
Filter by content type (only with focus=general)
按内容类型过滤(仅适用于focus=general)
nimble search --query "annual report" --content-type pdf --deep-search=false
nimble search --query "annual report" --content-type pdf --deep-search=false
Control number of results
控制结果数量
nimble search --query "Python tutorials" --num-results 15 --deep-search=false
nimble search --query "Python tutorials" --num-results 15 --deep-search=false
Deep search — ONLY when you need full page content (5-15s, much slower)
深度搜索——仅在需要完整页面内容时使用(5-15秒,速度慢很多)
nimble search --query "machine learning" --deep-search --num-results 5
**Key options:**
| Flag | Description |
|------|-------------|
| `--query` | Search query string (required) |
| `--deep-search=false` | **Always pass this.** Disables full page content fetch for 5-10x faster responses |
| `--deep-search` | Enable full page content fetch (slow, 5-15s — only when needed) |
| `--topic` | Focus mode: general, coding, news, academic, shopping, social, geo, location |
| `--num-results` | Max results to return (default 10) |
| `--include-answer` | Generate AI answer summary from results |
| `--include-domain` | Only include results from these domains (repeatable, max 50) |
| `--exclude-domain` | Exclude results from these domains (repeatable, max 50) |
| `--time-range` | Recency filter: hour, day, week, month, year |
| `--start-date` | Filter results after this date (YYYY-MM-DD) |
| `--end-date` | Filter results before this date (YYYY-MM-DD) |
| `--content-type` | Filter by type: pdf, docx, xlsx, documents, spreadsheets, presentations |
| `--parsing-type` | Output format: markdown, plain_text, simplified_html |
| `--country` | Country code for localized results |
| `--locale` | Locale for language settings |
| `--max-subagents` | Max parallel subagents for shopping/social/geo modes (1-10, default 3) |
**Focus modes** (quick reference — for detailed per-mode guidance, decision tree, and combination strategies, **read `references/search-focus-modes.md`**):
| Mode | Best for |
|------|----------|
| `general` | Broad web searches (default) |
| `coding` | Programming docs, code examples, technical content |
| `news` | Current events, breaking news, recent articles |
| `academic` | Research papers, scholarly articles, studies |
| `shopping` | Product searches, price comparisons, e-commerce |
| `social` | People research, LinkedIn/X/YouTube profiles, community discussions |
| `geo` | Geographic information, regional data |
| `location` | Local businesses, place-specific queries |
**Topic selection by intent** (see `references/search-focus-modes.md` for full table):
| Query Intent | Primary Topic | Secondary (parallel) |
|---|---|---|
| Research a **person** | `social` | `general` |
| Research a **company** | `general` | `news` |
| Find **code/docs** | `coding` | — |
| Current **events** | `news` | `social` |
| Find a **product/price** | `shopping` | — |
| Find a **place/business** | `location` | `geo` |
| Find **research papers** | `academic` | — |
**Performance tips:**
- With `--deep-search=false` (FAST): 1-3 seconds, returns titles + snippets + URLs — use this 95% of the time
- Without the flag / `--deep-search` (SLOW): 5-15 seconds, returns full page content — only for archiving or full-text analysis
- Use `--include-answer` for quick synthesized insights — works great with fast mode
- Start with 5-10 results, increase only if needednimble search --query "machine learning" --deep-search --num-results 5
**关键选项:**
| 参数 | 描述 |
|------|-------------|
| `--query` | 搜索查询字符串(必填) |
| `--deep-search=false` | **每次调用都必须传递**,禁用完整页面内容获取以提升5-10倍响应速度 |
| `--deep-search` | 启用完整页面内容获取(速度慢,5-15秒——仅在需要时使用) |
| `--topic` | 聚焦模式:general、coding、news、academic、shopping、social、geo、location |
| `--num-results` | 返回的最大结果数(默认10) |
| `--include-answer` | 从结果生成AI答案摘要 |
| `--include-domain` | 仅包含来自这些域名的结果(可重复使用,最多50个) |
| `--exclude-domain` | 排除来自这些域名的结果(可重复使用,最多50个) |
| `--time-range` | 时效性过滤:hour、day、week、month、year |
| `--start-date` | 过滤此日期之后的结果(YYYY-MM-DD) |
| `--end-date` | 过滤此日期之前的结果(YYYY-MM-DD) |
| `--content-type` | 按类型过滤:pdf、docx、xlsx、documents、spreadsheets、presentations |
| `--parsing-type` | 输出格式:markdown、plain_text、simplified_html |
| `--country` | 本地化结果的国家代码 |
| `--locale` | 语言设置的区域设置 |
| `--max-subagents` | shopping/social/geo模式的最大并行子Agent数(1-10,默认3) |
**聚焦模式**(快速参考——如需详细的分模式指导、决策树和组合策略,请**阅读`references/search-focus-modes.md`**):
| 模式 | 最佳适用场景 |
|------|----------|
| `general` | 广泛的网页搜索(默认) |
| `coding` | 编程文档、代码示例、技术内容 |
| `news` | 当前事件、突发新闻、近期文章 |
| `academic` | 研究论文、学术文章、研究报告 |
| `shopping` | 产品搜索、价格对比、电商内容 |
| `social` | 人物调研、LinkedIn/X/YouTube资料、社区讨论 |
| `geo` | 地理信息、区域数据 |
| `location` | 本地商家、特定地点查询 |
**按意图选择主题**(完整表格请见`references/search-focus-modes.md`):
| 查询意图 | 主要主题 | 次要(并行)主题 |
|---|---|---|
| 调研**人物** | `social` | `general` |
| 调研**公司** | `general` | `news` |
| 查找**代码/文档** | `coding` | — |
| 当前**事件** | `news` | `social` |
| 查找**产品/价格** | `shopping` | — |
| 查找**地点/商家** | `location` | `geo` |
| 查找**研究论文** | `academic` | — |
**性能技巧:**
- 使用`--deep-search=false`(快速模式):1-3秒,返回标题+摘要+URL——95%的场景都应使用此模式
- 不使用该参数/使用`--deep-search`(慢速模式):5-15秒,返回完整页面内容——仅用于归档或全文分析
- 使用`--include-answer`获取快速合成见解——与快速模式配合使用效果极佳
- 初始设置5-10个结果,仅在需要时增加数量extract
extract
Scalable data collection with stealth unblocking. Get clean, real-time HTML and structured data from any URL. Supports JS rendering, browser emulation, and geolocation. Run for all options.
nimble extract --helpIMPORTANT: Always use to get clean markdown output. Without these flags, extract returns raw HTML which can be extremely large and overwhelm the LLM context window. The flag on extract controls the content type (not the CLI output format — see Output Formats above).
--parse --format markdown--formatbash
undefined支持隐身反屏蔽的可扩展数据收集。从任意URL获取清晰的实时HTML和结构化数据。支持JS渲染、浏览器模拟和地理定位。运行查看所有选项。
nimble extract --help重要提示: 请始终使用以获取清晰的markdown输出。如果不使用这些参数,extract将返回原始HTML,体积可能非常大,导致LLM上下文窗口溢出。extract的参数控制的是内容类型(而非CLI输出格式——请见上方的输出格式部分)。
--parse --format markdown--formatbash
undefinedStandard extraction (always use --parse --format markdown for LLM-friendly output)
标准提取(LLM友好输出请始终使用--parse --format markdown)
nimble extract --url "https://example.com/article" --parse --format markdown
nimble extract --url "https://example.com/article" --parse --format markdown
Render JavaScript (for SPAs, dynamic content)
渲染JavaScript(适用于SPA、动态内容)
nimble extract --url "https://example.com/app" --render --parse --format markdown
nimble extract --url "https://example.com/app" --render --parse --format markdown
Extract with geolocation (see content as if from a specific country)
带地理定位的提取(查看特定国家的内容)
nimble extract --url "https://example.com" --country US --city "New York" --parse --format markdown
nimble extract --url "https://example.com" --country US --city "New York" --parse --format markdown
Handle cookie consent automatically
自动处理Cookie授权
nimble extract --url "https://example.com" --consent-header --parse --format markdown
nimble extract --url "https://example.com" --consent-header --parse --format markdown
Custom browser emulation
自定义浏览器模拟
nimble extract --url "https://example.com" --browser chrome --device desktop --os windows --parse --format markdown
nimble extract --url "https://example.com" --browser chrome --device desktop --os windows --parse --format markdown
Multiple content format preferences (API tries first, falls back to second)
多内容格式偏好(API优先尝试第一个,失败则使用第二个)
nimble extract --url "https://example.com" --parse --format markdown --format html
**Key options:**
| Flag | Description |
|------|-------------|
| `--url` | Target URL to extract (required) |
| `--parse` | Parse the response content (always use this) |
| `--format` | Content type preference: `markdown`, `html` (always use `markdown` for LLM-friendly output) |
| `--render` | Render JavaScript using a browser |
| `--country` | Country code for geolocation and proxy |
| `--city` | City for geolocation |
| `--state` | US state for geolocation (only when country=US) |
| `--locale` | Locale for language settings |
| `--consent-header` | Auto-handle cookie consent |
| `--browser` | Browser type to emulate |
| `--device` | Device type for emulation |
| `--os` | Operating system to emulate |
| `--driver` | Browser driver to use |
| `--method` | HTTP method (GET, POST, etc.) |
| `--headers` | Custom HTTP headers (key=value) |
| `--cookies` | Browser cookies |
| `--referrer-type` | Referrer policy |
| `--http2` | Use HTTP/2 protocol |
| `--request-timeout` | Timeout in milliseconds |
| `--tag` | User-defined tag for request tracking |nimble extract --url "https://example.com" --parse --format markdown --format html
**关键选项:**
| 参数 | 描述 |
|------|-------------|
| `--url` | 要提取的目标URL(必填) |
| `--parse` | 解析响应内容(请始终使用此参数) |
| `--format` | 内容类型偏好:`markdown`、`html`(LLM友好输出请始终使用`markdown`) |
| `--render` | 使用浏览器渲染JavaScript |
| `--country` | 地理定位和代理的国家代码 |
| `--city` | 地理定位的城市 |
| `--state` | 地理定位的美国州(仅当country=US时可用) |
| `--locale` | 语言设置的区域设置 |
| `--consent-header` | 自动处理Cookie授权 |
| `--browser` | 要模拟的浏览器类型 |
| `--device` | 要模拟的设备类型 |
| `--os` | 要模拟的操作系统 |
| `--driver` | 要使用的浏览器驱动 |
| `--method` | HTTP方法(GET、POST等) |
| `--headers` | 自定义HTTP头(key=value) |
| `--cookies` | 浏览器Cookie |
| `--referrer-type` | 引用策略 |
| `--http2` | 使用HTTP/2协议 |
| `--request-timeout` | 超时时间(毫秒) |
| `--tag` | 用于请求跟踪的用户自定义标签 |map
map
Fast URL discovery and site structure mapping. Easily plan extraction workflows. Returns URL metadata only (URLs, titles, descriptions) — not page content. Use or to get actual content from the discovered URLs. Run for all options.
extractcrawlnimble map --helpbash
undefined快速发现URL并映射站点结构,轻松规划提取工作流。仅返回URL元数据(URL、标题、描述)——不包含页面内容。使用或从发现的URL获取实际内容。运行查看所有选项。
extractcrawlnimble map --helpbash
undefinedMap all URLs on a site (returns URLs only, not content)
映射站点上的所有URL(仅返回URL,不包含内容)
nimble map --url "https://example.com"
nimble map --url "https://example.com"
Limit number of URLs returned
限制返回的URL数量
nimble map --url "https://docs.example.com" --limit 100
nimble map --url "https://docs.example.com" --limit 100
Include subdomains
包含子域名
nimble map --url "https://example.com" --domain-filter subdomains
nimble map --url "https://example.com" --domain-filter subdomains
Use sitemap for discovery
使用站点地图进行发现
nimble map --url "https://example.com" --sitemap auto
**Key options:**
| Flag | Description |
|------|-------------|
| `--url` | URL to map (required) |
| `--limit` | Max number of links to return |
| `--domain-filter` | Include subdomains in mapping |
| `--sitemap` | Use sitemap for URL discovery |
| `--country` | Country code for geolocation |
| `--locale` | Locale for language settings |nimble map --url "https://example.com" --sitemap auto
**关键选项:**
| 参数 | 描述 |
|------|-------------|
| `--url` | 要映射的URL(必填) |
| `--limit` | 返回的最大链接数 |
| `--domain-filter` | 映射中包含子域名 |
| `--sitemap` | 使用站点地图进行URL发现 |
| `--country` | 地理定位的国家代码 |
| `--locale` | 语言设置的区域设置 |crawl
crawl
Extract contents from entire websites in a single request. Collect large volumes of web data automatically. Crawl is async — you start a job, poll for completion, then retrieve the results. Run for all options.
nimble crawl run --helpCrawl defaults:
| Setting | Default | Notes |
|---|---|---|
| | Automatically uses sitemap if available |
| | How deep the crawler follows links |
| No limit | Always set a limit to avoid crawling entire sites |
Start a crawl:
bash
undefined单次请求即可提取整个网站的内容,自动收集大量网络数据。爬取是异步的——您启动任务轮询完成状态,然后获取结果。运行查看所有选项。
nimble crawl run --help爬取默认设置:
| 设置 | 默认值 | 说明 |
|---|---|---|
| | 自动使用站点地图(如果可用) |
| | 爬取器跟踪链接的深度 |
| 无限制 | 始终设置限制以避免爬取整个站点 |
启动爬取:
bash
undefinedCrawl a site section (always set --limit)
爬取站点板块(始终设置--limit)
nimble crawl run --url "https://docs.example.com" --limit 50
nimble crawl run --url "https://docs.example.com" --limit 50
Crawl with path filtering
带路径过滤的爬取
nimble crawl run --url "https://example.com" --include-path "/docs" --include-path "/api" --limit 100
nimble crawl run --url "https://example.com" --include-path "/docs" --include-path "/api" --limit 100
Exclude paths
排除路径
nimble crawl run --url "https://example.com" --exclude-path "/blog" --exclude-path "/archive" --limit 50
nimble crawl run --url "https://example.com" --exclude-path "/blog" --exclude-path "/archive" --limit 50
Control crawl depth
控制爬取深度
nimble crawl run --url "https://example.com" --max-discovery-depth 3 --limit 50
nimble crawl run --url "https://example.com" --max-discovery-depth 3 --limit 50
Allow subdomains and external links
允许子域名和外部链接
nimble crawl run --url "https://example.com" --allow-subdomains --allow-external-links --limit 50
nimble crawl run --url "https://example.com" --allow-subdomains --allow-external-links --limit 50
Crawl entire domain (not just child paths)
爬取整个域名(不仅是子路径)
nimble crawl run --url "https://example.com/docs" --crawl-entire-domain --limit 100
nimble crawl run --url "https://example.com/docs" --crawl-entire-domain --limit 100
Named crawl for tracking
命名爬取任务以便跟踪
nimble crawl run --url "https://example.com" --name "docs-crawl-feb-2026" --limit 200
nimble crawl run --url "https://example.com" --name "docs-crawl-feb-2026" --limit 200
Use sitemap for discovery
使用站点地图进行发现
nimble crawl run --url "https://example.com" --sitemap auto --limit 50
**Key options for `crawl run`:**
| Flag | Description |
|------|-------------|
| `--url` | URL to crawl (required) |
| `--limit` | Max pages to crawl (**always set this**) |
| `--max-discovery-depth` | Max depth based on discovery order (default 5) |
| `--include-path` | Regex patterns for URLs to include (repeatable) |
| `--exclude-path` | Regex patterns for URLs to exclude (repeatable) |
| `--allow-subdomains` | Follow links to subdomains |
| `--allow-external-links` | Follow links to external sites |
| `--crawl-entire-domain` | Follow sibling/parent URLs, not just child paths |
| `--ignore-query-parameters` | Don't re-scrape same path with different query params |
| `--name` | Name for the crawl job |
| `--sitemap` | Use sitemap for URL discovery (default auto) |
| `--callback` | Webhook for receiving results |
**Poll crawl status and retrieve results:**
Crawl jobs run asynchronously. After starting a crawl, poll for completion, then retrieve content using **individual task IDs** (not the crawl ID):
```bashnimble crawl run --url "https://example.com" --sitemap auto --limit 50
**`crawl run`关键选项:**
| 参数 | 描述 |
|------|-------------|
| `--url` | 要爬取的URL(必填) |
| `--limit` | 要爬取的最大页面数(**必须设置此参数**) |
| `--max-discovery-depth` | 基于发现顺序的最大深度(默认5) |
| `--include-path` | 要包含的URL正则模式(可重复使用) |
| `--exclude-path` | 要排除的URL正则模式(可重复使用) |
| `--allow-subdomains` | 跟踪子域名链接 |
| `--allow-external-links` | 跟踪外部站点链接 |
| `--crawl-entire-domain` | 跟踪同级/上级URL,不仅是子路径 |
| `--ignore-query-parameters` | 不重新抓取带有不同查询参数的相同路径 |
| `--name` | 爬取任务的名称 |
| `--sitemap` | 使用站点地图进行URL发现(默认auto) |
| `--callback` | 接收结果的Webhook |
**轮询爬取状态并获取结果:**
爬取任务异步运行。启动爬取后,轮询完成状态,然后使用**单个任务ID**(而非爬取ID)获取内容:
```bash1. Start the crawl → returns a crawl_id
1. 启动爬取 → 返回crawl_id
nimble crawl run --url "https://docs.example.com" --limit 5
nimble crawl run --url "https://docs.example.com" --limit 5
Returns: crawl_id "abc-123"
返回:crawl_id "abc-123"
2. Poll status until completed → returns individual task_ids per page
2. 轮询状态直到完成 → 返回每个页面的单独task_ids
nimble crawl status --id "abc-123"
nimble crawl status --id "abc-123"
Returns: tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]
返回:tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]
Status values: running, completed, failed, terminated
状态值:running、completed、failed、terminated
3. Retrieve content using INDIVIDUAL task_ids (NOT the crawl_id)
3. 使用单个task_ids(而非crawl_id)获取内容
nimble tasks results --task-id "task-456"
nimble tasks results --task-id "task-789"
nimble tasks results --task-id "task-456"
nimble tasks results --task-id "task-789"
⚠️ Using the crawl_id here returns 404 — you must use the per-page task_ids from step 2
⚠️ 此处使用crawl_id会返回404错误——必须使用步骤2中的每个页面的task_ids
**IMPORTANT:** `nimble tasks results` requires the **individual task IDs** from `crawl status` (each crawled page gets its own task ID), not the crawl job ID. Using the crawl ID will return a 404 error.
**Polling guidelines:**
- Poll every **15-30 seconds** for small crawls (< 50 pages)
- Poll every **30-60 seconds** for larger crawls (50+ pages)
- Stop polling after status is `completed`, `failed`, or `terminated`
- **Note:** `crawl status` may occasionally misreport individual task statuses (showing "failed" for tasks that actually succeeded). If `crawl status` shows failed tasks, try retrieving their results with `nimble tasks results` before assuming failure
**List crawls:**
```bash
**重要提示:** `nimble tasks results`需要来自`crawl status`的**单个任务ID**(每个爬取的页面都有自己的任务ID),而非爬取任务ID。使用爬取ID会返回404错误。
**轮询指南:**
- 小型爬取(<50页)每**15-30秒**轮询一次
- 大型爬取(50+页)每**30-60秒**轮询一次
- 状态为`completed`、`failed`或`terminated`时停止轮询
- **注意:** `crawl status`可能偶尔错误报告单个任务状态(将实际成功的任务显示为“failed”)。如果`crawl status`显示失败任务,请先尝试使用`nimble tasks results`获取结果,再假设任务失败。
**列出爬取任务:**
```bashList all crawls
列出所有爬取任务
nimble crawl list
nimble crawl list
Filter by status
按状态过滤
nimble crawl list --status running
nimble crawl list --status running
Paginate results
分页返回结果
nimble crawl list --limit 10
**Cancel a crawl:**
```bash
nimble crawl terminate --id "crawl-task-id"nimble crawl list --limit 10
**取消爬取任务:**
```bash
nimble crawl terminate --id "crawl-task-id"Best Practices
最佳实践
Search Strategy
搜索策略
- Always pass — the default is deep mode (slow). Fast mode covers 95% of use cases: URL discovery, research, comparisons, answer generation
--deep-search=false - Only use deep mode when you need full page text — archiving articles, extracting complete docs, building datasets
- Start with the right focus mode — match to your query type (see
--topic)references/search-focus-modes.md - Use — get AI-synthesized insights without extracting each result. If it returns 402/403, retry without it.
--include-answer - Filter domains — use to target authoritative sources
--include-domain - Add time filters — use for time-sensitive queries
--time-range
- 始终传递—— 默认是深度模式(速度慢)。快速模式覆盖95%的使用场景:URL发现、调研、对比、答案生成
--deep-search=false - 仅在需要完整页面文本时使用深度模式 —— 归档文章、提取完整文档、构建数据集
- 从正确的聚焦模式开始 —— 将与您的查询类型匹配(见
--topic)references/search-focus-modes.md - 使用—— 无需提取每个结果即可获取AI合成见解。如果返回402/403,请重试不带该参数的查询。
--include-answer - 过滤域名 —— 使用定位权威来源
--include-domain - 添加时间过滤 —— 对时间敏感的查询使用
--time-range
Multi-Search Strategy
多搜索策略
When researching a topic in depth, run 2-3 searches in parallel with:
- Different topics — e.g., +
socialfor people researchgeneral - Different query angles — e.g., "Jane Doe current job" + "Jane Doe career history" + "Jane Doe publications"
This is faster than sequential searches and gives broader coverage. Deduplicate results by URL before extracting.
深入调研某主题时,并行运行2-3次搜索,使用:
- 不同主题 —— 例如,人物调研使用+
socialgeneral - 不同查询角度 —— 例如,“Jane Doe current job” + “Jane Doe career history” + “Jane Doe publications”
这比顺序搜索更快,覆盖范围更广。提取前按URL去重结果。
Disambiguating Common Names
同名人物消歧
When searching for a person with a common name:
- Include distinguishing context in the query: company name, job title, city
- Use — LinkedIn results include location and current company, making disambiguation easier
--topic social - Cross-reference results across searches to confirm you're looking at the right person
搜索同名人物时:
- 查询中加入区分性上下文:公司名称、职位、城市
- 使用—— LinkedIn结果包含地点和当前公司,便于消歧
--topic social - 跨搜索交叉验证结果,确认您查找的是正确的人物
Extraction Strategy
提取策略
- Always use — returns clean markdown instead of raw HTML, preventing context window overflow
--parse --format markdown - Try without first — it's faster for static pages
--render - Add for SPAs — when content is loaded by JavaScript
--render - Set geolocation — use to see region-specific content
--country
- 始终使用—— 返回清晰的markdown而非原始HTML,避免上下文窗口溢出
--parse --format markdown - 先尝试不带的提取 —— 静态页面速度更快
--render - SPA页面添加—— 当内容由JavaScript加载时
--render - 设置地理定位 —— 使用查看特定区域的内容
--country
Crawl Strategy
爬取策略
- Prefer +
mapoverextractfor LLM use — crawl results return raw HTML (60-115KB per page) which overwhelms LLM context. For LLM-friendly output, usecrawlto discover URLs, thenmapon individual pagesextract --parse --format markdown - Use only for bulk archiving or data pipelines — when you need raw content from many pages and will post-process it outside the LLM context
crawl - Always set — crawl has no default limit, so always specify one to avoid crawling entire sites
--limit - Use path filters — and
--include-pathto target specific sections--exclude-path - Name your crawls — use for easy tracking
--name - Retrieve with individual task IDs — returns per-page task IDs; use those (not the crawl ID) with
crawl statusnimble tasks results --task-id
- LLM使用优先选择+
map而非extract—— 爬取结果返回原始HTML(每页60-115KB),会超出LLM上下文窗口。如需LLM友好的输出,使用crawl发现URL,然后对单个页面使用mapextract --parse --format markdown - 仅在批量归档或数据管道中使用—— 当您需要大量页面的原始内容并将在LLM上下文外进行后处理时
crawl - 始终设置—— 爬取无默认限制,因此请始终指定限制以避免爬取整个站点
--limit - 使用路径过滤 —— 和
--include-path定位特定板块--exclude-path - 为爬取任务命名 —— 使用便于跟踪
--name - 使用单个任务ID获取结果 —— 返回每个页面的任务ID;将这些ID(而非爬取ID)与
crawl status配合使用nimble tasks results --task-id
Common Recipes
常用示例
Researching a person
调研人物
bash
undefinedbash
undefinedStep 1: Run social + general in parallel for max coverage
步骤1:并行运行social+general以获得最大覆盖范围
nimble search --query "Jane Doe Head of Engineering" --topic social --deep-search=false --num-results 10 --include-answer
nimble search --query "Jane Doe Head of Engineering" --topic general --deep-search=false --num-results 10 --include-answer
nimble search --query "Jane Doe Head of Engineering" --topic social --deep-search=false --num-results 10 --include-answer
nimble search --query "Jane Doe Head of Engineering" --topic general --deep-search=false --num-results 10 --include-answer
Step 2: Broaden with different query angles in parallel
步骤2:并行使用不同查询角度扩大范围
nimble search --query "Jane Doe career history Acme Corp" --deep-search=false --include-answer
nimble search --query "Jane Doe publications blog articles" --deep-search=false --include-answer
nimble search --query "Jane Doe career history Acme Corp" --deep-search=false --include-answer
nimble search --query "Jane Doe publications blog articles" --deep-search=false --include-answer
Step 3: Extract the most promising non-auth-walled URLs (skip LinkedIn — see Known Limitations)
步骤3:提取最有价值的非授权墙URL(跳过LinkedIn——见已知限制)
nimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown
undefinednimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown
undefinedResearching a company
调研公司
bash
undefinedbash
undefinedStep 1: Overview + recent news in parallel
步骤1:并行获取概述+近期新闻
nimble search --query "Acme Corp" --topic general --deep-search=false --include-answer
nimble search --query "Acme Corp" --topic news --time-range month --deep-search=false --include-answer
nimble search --query "Acme Corp" --topic general --deep-search=false --include-answer
nimble search --query "Acme Corp" --topic news --time-range month --deep-search=false --include-answer
Step 2: Extract company page
步骤2:提取公司页面
nimble extract --url "https://acme.com/about" --parse --format markdown
undefinednimble extract --url "https://acme.com/about" --parse --format markdown
undefinedTechnical research
技术调研
bash
undefinedbash
undefinedStep 1: Find docs and code examples
步骤1:查找文档和代码示例
nimble search --query "React Server Components migration guide" --topic coding --deep-search=false --include-answer
nimble search --query "React Server Components migration guide" --topic coding --deep-search=false --include-answer
Step 2: Extract the most relevant doc
步骤2:提取最相关的文档
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
undefinednimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
undefinedError Handling
错误处理
| Error | Solution |
|---|---|
| Set the environment variable: |
| Verify API key is active at nimbleway.com |
| Premium feature not available on current plan. Retry the same query without |
| Reduce request frequency or upgrade API tier |
| Timeout | Ensure |
| No results | Try different |
| 错误 | 解决方案 |
|---|---|
| 设置环境变量: |
| 在nimbleway.com验证API密钥是否有效 |
使用 | 当前计划不支持此高级功能。重试不带该参数的相同查询并继续 |
| 降低请求频率或升级API套餐 |
| 超时 | 确保设置了 |
| 无结果 | 尝试不同的 |
Known Limitations
已知限制
| Site | Issue | Workaround |
|---|---|---|
| LinkedIn profiles | Auth wall blocks extraction (returns redirect/JS, status 999) | Use |
| Sites behind login | Extract returns login page instead of content | No workaround — use search snippets instead |
| Heavy SPAs | Extract returns empty or minimal HTML | Add |
| Crawl results | Returns raw HTML (60-115KB per page), no markdown option | Use |
| Crawl status | May misreport individual task statuses as "failed" when they actually succeeded | Always try |
| 站点 | 问题 | 解决方法 |
|---|---|---|
| LinkedIn资料 | 授权墙阻止提取(返回重定向/JS,状态码999) | 改用 |
| 需要登录的站点 | Extract返回登录页面而非内容 | 无解决方法——改用搜索摘要 |
| 复杂SPA | Extract返回空或极简HTML | 添加 |
| 爬取结果 | 返回原始HTML(每页60-115KB),无markdown选项 | 对单个页面使用 |
| 爬取状态 | 可能将单个任务状态错误报告为“failed”,但实际任务已成功 | 假设任务失败前,始终尝试使用 |