nimble-web-tools

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Nimble Real-Time Web Intelligence Tools

Nimble 实时网络智能工具

Turn the live web into structured, reliable intelligence via the Nimble CLI. Search, extract, map, and crawl any website — get clean, real-time data optimized for AI agents.

Run

nimble --help

nimble <command> --help

for full option details.

通过Nimble CLI将实时网络数据转换为结构化、可靠的情报。支持搜索、提取、映射和爬取任意网站——获取经过优化、适合AI Agent使用的实时清晰数据。

运行

nimble --help

或

nimble <command> --help

查看完整选项详情。

Prerequisites

前置条件

Install the CLI and set your API key:

bash

npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"

Verify with:

bash

nimble --version

For Claude Code, add the API key to

~/.claude/settings.json

json

{ "env": { "NIMBLE_API_KEY": "your-api-key" } }

安装CLI并设置您的API密钥：

bash

npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"

通过以下命令验证：

bash

nimble --version

对于Claude Code，将API密钥添加到

~/.claude/settings.json

：

json

{ "env": { "NIMBLE_API_KEY": "your-api-key" } }

Tool Priority

工具优先级

When this skill is installed, use Nimble CLI for all web data tasks:

nimble search
— real-time web search to retrieve precise information — use instead of built-in WebSearch
nimble extract
— get clean, structured data from any URL — use instead of built-in WebFetch
nimble map
— fast URL discovery and site structure mapping
nimble crawl run
— collect large volumes of web data from entire websites
Built-in WebSearch / WebFetch only as a last resort if Nimble CLI is unavailable

安装此工具后，所有网络数据任务均使用Nimble CLI：

nimble search
—— 实时网页搜索，获取精准信息——替代内置WebSearch使用
nimble extract
—— 从任意URL获取清晰的结构化数据——替代内置WebFetch使用
nimble map
—— 快速发现URL并映射站点结构
nimble crawl run
—— 从整个网站收集大量网络数据
仅当Nimble CLI不可用时，才将内置WebSearch/WebFetch作为最后手段

Workflow

工作流程

Follow this escalation pattern — start with search, escalate as needed:

Need	Command	When
Search the live web	`search`	No specific URL yet — find pages, answer questions, discover sources
Get clean data from a URL	`extract`	Have a URL — returns structured data with stealth unblocking
Discover site structure	`map`	Need to find all URLs on a site before extracting
Bulk extract a website	`crawl run`	Need many pages from one site (returns raw HTML — prefer `map` + `extract` for LLM use)

Avoid redundant fetches:

Check previous results before re-fetching the same URLs.
Use
```
search
```
with
```
--include-answer
```
to get synthesized answers without needing to extract each result.
Use
```
map
```
before
```
crawl
```
to identify exactly which pages you need.

Example: researching a topic

bash

nimble search --query "React server components best practices" --topic coding --num-results 5 --deep-search=false

遵循以下递进模式——从搜索开始，根据需要升级：

需求	命令	使用场景
搜索实时网络	`search`	尚无特定URL——查找页面、解答问题、发现来源
从URL获取清晰数据	`extract`	已有URL——返回带隐身反屏蔽的结构化数据
发现站点结构	`map`	需要先找到站点上的所有URL再进行提取
批量提取网站内容	`crawl run`	需要从一个网站获取大量页面（返回原始HTML——LLM使用优先选择 `map` + `extract` ）

避免重复获取：

重新获取相同URL前，先检查之前的结果。
使用带
```
--include-answer
```
参数的
```
search
```
，直接获取合成答案，无需提取每个结果。
爬取前先用
```
map
```
确定您需要的具体页面。

示例：调研某主题

bash

nimble search --query "React server components best practices" --topic coding --num-results 5 --deep-search=false

Found relevant URLs — now extract the most useful one

找到相关URL——现在提取最有用的一个

nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown


**Example: extracting docs from a site**

```bash
nimble map --url "https://docs.example.com" --limit 50

nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown


**示例：从站点提取文档**

```bash
nimble map --url "https://docs.example.com" --limit 50

Found 50 URLs — extract the most relevant ones individually (LLM-friendly markdown)

找到50个URL——单独提取最相关的（LLM友好的markdown格式）

nimble extract --url "https://docs.example.com/api/overview" --parse --format markdown nimble extract --url "https://docs.example.com/api/auth" --parse --format markdown

For bulk archiving (raw HTML, not LLM-friendly), use crawl instead:

批量归档（原始HTML，不适合LLM）可使用crawl：

nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20

undefined

undefined

Output Formats

输出格式

Global CLI output format — controls how the CLI structures its output. Place before the command:

bash

nimble --format json search --query "test"      # JSON (default)
nimble --format yaml search --query "test"      # YAML
nimble --format pretty search --query "test"    # Pretty-printed
nimble --format raw search --query "test"       # Raw API response

Content parsing format — controls how page content is returned. These are command-specific flags:

search:

--parsing-type markdown

(or

plain_text

simplified_html

)

extract:
```
--format markdown
```
(or
```
html
```
) — note: this is a content format flag on extract, not the global output format

bash

undefined

全局CLI输出格式 —— 控制CLI的输出结构，需放在命令前：

bash

nimble --format json search --query "test"      # JSON（默认）
nimble --format yaml search --query "test"      # YAML
nimble --format pretty search --query "test"    # 美化打印
nimble --format raw search --query "test"       # 原始API响应

内容解析格式 —— 控制页面内容的返回格式，这些是命令专属参数：

search:

--parsing-type markdown

（或

plain_text

、

simplified_html

）

extract:
```
--format markdown
```
（或
```
html
```
）——注意：这是extract的内容格式参数，而非全局输出格式

bash

undefined

Search with markdown content parsing

带markdown内容解析的搜索

nimble search --query "test" --parsing-type markdown --deep-search=false

Extract with markdown content + YAML CLI output

带markdown内容+YAML CLI输出的提取

nimble --format yaml extract --url "https://example.com" --parse --format markdown


Use `--transform` with GJSON syntax to extract specific fields:

```bash
nimble search --query "AI news" --transform "results.#.url"

nimble --format yaml extract --url "https://example.com" --parse --format markdown


使用`--transform`和GJSON语法提取特定字段：

```bash
nimble search --query "AI news" --transform "results.#.url"

Commands

命令

search

Accurate, real-time web search with 8 focus modes. AI Agents search the live web to retrieve precise information. Run

nimble search --help

for all options.

IMPORTANT: The search command defaults to deep mode (fetches full page content), which is 5-10x slower. Always pass

--deep-search=false

unless you specifically need full page content.

Always explicitly set these parameters on every search call:

```
--deep-search=false
```
: Pass this on every call for fast responses (1-3s vs 5-15s). Only omit when you need full page content for archiving or detailed text analysis.
```
--include-answer
```
: Recommended on every research/exploration query. Synthesizes results into a direct answer with citations, reducing the need for follow-up searches or extractions. Only skip for URL-discovery-only queries where you just need links. Note: This is a premium feature (Enterprise plans). If the API returns a
```
402
```
or
```
403
```
when using this flag, retry the same query without
```
--include-answer
```
and continue — the search results are still valuable without the synthesized answer.
```
--topic
```
: Match to query type —
```
coding
```
,
```
news
```
,
```
academic
```
, etc. Default is
```
general
```
. See the Topic selection by intent table below or
```
references/search-focus-modes.md
```
for guidance.
```
--num-results
```
: Default
```
10
```
— balanced speed and coverage.

bash

undefined

精准的实时网页搜索，支持8种聚焦模式。AI Agent搜索实时网络以获取精准信息。运行

nimble search --help

查看所有选项。

重要提示： 搜索命令默认启用深度模式（获取完整页面内容），速度慢5-10倍。除非您明确需要完整页面内容，否则请始终传递

--deep-search=false

。

每次搜索调用都应显式设置以下参数：

```
--deep-search=false
```
: 每次调用都必须传递，禁用完整页面内容获取以获得更快响应（1-3秒 vs 5-15秒）。仅在需要完整页面内容进行归档或详细文本分析时省略此参数。
```
--include-answer
```
: 所有调研/探索类查询均推荐使用。将结果合成为直接答案并附带引用，减少后续搜索或提取需求。仅在仅需发现URL的查询中跳过此参数。注意： 这是高级功能（企业版计划）。如果使用此参数时API返回
```
402
```
或
```
403
```
，请重试不带该参数的相同查询——即使没有合成答案，搜索结果仍然有价值。
```
--topic
```
: 匹配查询类型——
```
coding
```
、
```
news
```
、
```
academic
```
等。默认值为
```
general
```
。请查看下方的按意图选择主题表格或
```
references/search-focus-modes.md
```
获取指导。
```
--num-results
```
: 默认值为
```
10
```
——平衡速度和覆盖范围。

bash

undefined

Basic search (always include --deep-search=false)

基础搜索（始终包含--deep-search=false）

nimble search --query "your query" --deep-search=false

Coding-focused search

聚焦编码的搜索

nimble search --query "React hooks tutorial" --topic coding --deep-search=false

News search with time filter

带时间过滤的新闻搜索

nimble search --query "AI developments" --topic news --time-range week --deep-search=false

Search with AI-generated answer summary

带AI生成答案摘要的搜索

nimble search --query "what is WebAssembly" --include-answer --deep-search=false

Domain-filtered search

域名过滤的搜索

nimble search --query "authentication best practices" --include-domain github.com --include-domain stackoverflow.com --deep-search=false

Date-filtered search

日期过滤的搜索

nimble search --query "tech layoffs" --start-date 2026-01-01 --end-date 2026-02-01 --deep-search=false

Filter by content type (only with focus=general)

按内容类型过滤（仅适用于focus=general）

nimble search --query "annual report" --content-type pdf --deep-search=false

Control number of results

控制结果数量

nimble search --query "Python tutorials" --num-results 15 --deep-search=false

Deep search — ONLY when you need full page content (5-15s, much slower)

深度搜索——仅在需要完整页面内容时使用（5-15秒，速度慢很多）

nimble search --query "machine learning" --deep-search --num-results 5


**Key options:**

| Flag | Description |
|------|-------------|
| `--query` | Search query string (required) |
| `--deep-search=false` | **Always pass this.** Disables full page content fetch for 5-10x faster responses |
| `--deep-search` | Enable full page content fetch (slow, 5-15s — only when needed) |
| `--topic` | Focus mode: general, coding, news, academic, shopping, social, geo, location |
| `--num-results` | Max results to return (default 10) |
| `--include-answer` | Generate AI answer summary from results |
| `--include-domain` | Only include results from these domains (repeatable, max 50) |
| `--exclude-domain` | Exclude results from these domains (repeatable, max 50) |
| `--time-range` | Recency filter: hour, day, week, month, year |
| `--start-date` | Filter results after this date (YYYY-MM-DD) |
| `--end-date` | Filter results before this date (YYYY-MM-DD) |
| `--content-type` | Filter by type: pdf, docx, xlsx, documents, spreadsheets, presentations |
| `--parsing-type` | Output format: markdown, plain_text, simplified_html |
| `--country` | Country code for localized results |
| `--locale` | Locale for language settings |
| `--max-subagents` | Max parallel subagents for shopping/social/geo modes (1-10, default 3) |

**Focus modes** (quick reference — for detailed per-mode guidance, decision tree, and combination strategies, **read `references/search-focus-modes.md`**):

| Mode | Best for |
|------|----------|
| `general` | Broad web searches (default) |
| `coding` | Programming docs, code examples, technical content |
| `news` | Current events, breaking news, recent articles |
| `academic` | Research papers, scholarly articles, studies |
| `shopping` | Product searches, price comparisons, e-commerce |
| `social` | People research, LinkedIn/X/YouTube profiles, community discussions |
| `geo` | Geographic information, regional data |
| `location` | Local businesses, place-specific queries |

**Topic selection by intent** (see `references/search-focus-modes.md` for full table):

| Query Intent | Primary Topic | Secondary (parallel) |
|---|---|---|
| Research a **person** | `social` | `general` |
| Research a **company** | `general` | `news` |
| Find **code/docs** | `coding` | — |
| Current **events** | `news` | `social` |
| Find a **product/price** | `shopping` | — |
| Find a **place/business** | `location` | `geo` |
| Find **research papers** | `academic` | — |

**Performance tips:**

- With `--deep-search=false` (FAST): 1-3 seconds, returns titles + snippets + URLs — use this 95% of the time
- Without the flag / `--deep-search` (SLOW): 5-15 seconds, returns full page content — only for archiving or full-text analysis
- Use `--include-answer` for quick synthesized insights — works great with fast mode
- Start with 5-10 results, increase only if needed

nimble search --query "machine learning" --deep-search --num-results 5


**关键选项：**

| 参数 | 描述 |
|------|-------------|
| `--query` | 搜索查询字符串（必填） |
| `--deep-search=false` | **每次调用都必须传递**，禁用完整页面内容获取以提升5-10倍响应速度 |
| `--deep-search` | 启用完整页面内容获取（速度慢，5-15秒——仅在需要时使用） |
| `--topic` | 聚焦模式：general、coding、news、academic、shopping、social、geo、location |
| `--num-results` | 返回的最大结果数（默认10） |
| `--include-answer` | 从结果生成AI答案摘要 |
| `--include-domain` | 仅包含来自这些域名的结果（可重复使用，最多50个） |
| `--exclude-domain` | 排除来自这些域名的结果（可重复使用，最多50个） |
| `--time-range` | 时效性过滤：hour、day、week、month、year |
| `--start-date` | 过滤此日期之后的结果（YYYY-MM-DD） |
| `--end-date` | 过滤此日期之前的结果（YYYY-MM-DD） |
| `--content-type` | 按类型过滤：pdf、docx、xlsx、documents、spreadsheets、presentations |
| `--parsing-type` | 输出格式：markdown、plain_text、simplified_html |
| `--country` | 本地化结果的国家代码 |
| `--locale` | 语言设置的区域设置 |
| `--max-subagents` | shopping/social/geo模式的最大并行子Agent数（1-10，默认3） |

**聚焦模式**（快速参考——如需详细的分模式指导、决策树和组合策略，请**阅读`references/search-focus-modes.md`**）：

| 模式 | 最佳适用场景 |
|------|----------|
| `general` | 广泛的网页搜索（默认） |
| `coding` | 编程文档、代码示例、技术内容 |
| `news` | 当前事件、突发新闻、近期文章 |
| `academic` | 研究论文、学术文章、研究报告 |
| `shopping` | 产品搜索、价格对比、电商内容 |
| `social` | 人物调研、LinkedIn/X/YouTube资料、社区讨论 |
| `geo` | 地理信息、区域数据 |
| `location` | 本地商家、特定地点查询 |

**按意图选择主题**（完整表格请见`references/search-focus-modes.md`）：

| 查询意图 | 主要主题 | 次要（并行）主题 |
|---|---|---|
| 调研**人物** | `social` | `general` |
| 调研**公司** | `general` | `news` |
| 查找**代码/文档** | `coding` | — |
| 当前**事件** | `news` | `social` |
| 查找**产品/价格** | `shopping` | — |
| 查找**地点/商家** | `location` | `geo` |
| 查找**研究论文** | `academic` | — |

**性能技巧：**

- 使用`--deep-search=false`（快速模式）：1-3秒，返回标题+摘要+URL——95%的场景都应使用此模式
- 不使用该参数/使用`--deep-search`（慢速模式）：5-15秒，返回完整页面内容——仅用于归档或全文分析
- 使用`--include-answer`获取快速合成见解——与快速模式配合使用效果极佳
- 初始设置5-10个结果，仅在需要时增加数量

extract

Scalable data collection with stealth unblocking. Get clean, real-time HTML and structured data from any URL. Supports JS rendering, browser emulation, and geolocation. Run

nimble extract --help

for all options.

IMPORTANT: Always use

--parse --format markdown

to get clean markdown output. Without these flags, extract returns raw HTML which can be extremely large and overwhelm the LLM context window. The

--format

flag on extract controls the content type (not the CLI output format — see Output Formats above).

bash

undefined

支持隐身反屏蔽的可扩展数据收集。从任意URL获取清晰的实时HTML和结构化数据。支持JS渲染、浏览器模拟和地理定位。运行

nimble extract --help

查看所有选项。

重要提示： 请始终使用

--parse --format markdown

以获取清晰的markdown输出。如果不使用这些参数，extract将返回原始HTML，体积可能非常大，导致LLM上下文窗口溢出。extract的

--format

参数控制的是内容类型（而非CLI输出格式——请见上方的输出格式部分）。

bash

undefined

Standard extraction (always use --parse --format markdown for LLM-friendly output)

标准提取（LLM友好输出请始终使用--parse --format markdown）

nimble extract --url "https://example.com/article" --parse --format markdown

Render JavaScript (for SPAs, dynamic content)

渲染JavaScript（适用于SPA、动态内容）

nimble extract --url "https://example.com/app" --render --parse --format markdown

Extract with geolocation (see content as if from a specific country)

带地理定位的提取（查看特定国家的内容）

nimble extract --url "https://example.com" --country US --city "New York" --parse --format markdown

Handle cookie consent automatically

自动处理Cookie授权

nimble extract --url "https://example.com" --consent-header --parse --format markdown

Custom browser emulation

自定义浏览器模拟

nimble extract --url "https://example.com" --browser chrome --device desktop --os windows --parse --format markdown

Multiple content format preferences (API tries first, falls back to second)

多内容格式偏好（API优先尝试第一个，失败则使用第二个）

nimble extract --url "https://example.com" --parse --format markdown --format html


**Key options:**

| Flag | Description |
|------|-------------|
| `--url` | Target URL to extract (required) |
| `--parse` | Parse the response content (always use this) |
| `--format` | Content type preference: `markdown`, `html` (always use `markdown` for LLM-friendly output) |
| `--render` | Render JavaScript using a browser |
| `--country` | Country code for geolocation and proxy |
| `--city` | City for geolocation |
| `--state` | US state for geolocation (only when country=US) |
| `--locale` | Locale for language settings |
| `--consent-header` | Auto-handle cookie consent |
| `--browser` | Browser type to emulate |
| `--device` | Device type for emulation |
| `--os` | Operating system to emulate |
| `--driver` | Browser driver to use |
| `--method` | HTTP method (GET, POST, etc.) |
| `--headers` | Custom HTTP headers (key=value) |
| `--cookies` | Browser cookies |
| `--referrer-type` | Referrer policy |
| `--http2` | Use HTTP/2 protocol |
| `--request-timeout` | Timeout in milliseconds |
| `--tag` | User-defined tag for request tracking |

nimble extract --url "https://example.com" --parse --format markdown --format html


**关键选项：**

| 参数 | 描述 |
|------|-------------|
| `--url` | 要提取的目标URL（必填） |
| `--parse` | 解析响应内容（请始终使用此参数） |
| `--format` | 内容类型偏好：`markdown`、`html`（LLM友好输出请始终使用`markdown`） |
| `--render` | 使用浏览器渲染JavaScript |
| `--country` | 地理定位和代理的国家代码 |
| `--city` | 地理定位的城市 |
| `--state` | 地理定位的美国州（仅当country=US时可用） |
| `--locale` | 语言设置的区域设置 |
| `--consent-header` | 自动处理Cookie授权 |
| `--browser` | 要模拟的浏览器类型 |
| `--device` | 要模拟的设备类型 |
| `--os` | 要模拟的操作系统 |
| `--driver` | 要使用的浏览器驱动 |
| `--method` | HTTP方法（GET、POST等） |
| `--headers` | 自定义HTTP头（key=value） |
| `--cookies` | 浏览器Cookie |
| `--referrer-type` | 引用策略 |
| `--http2` | 使用HTTP/2协议 |
| `--request-timeout` | 超时时间（毫秒） |
| `--tag` | 用于请求跟踪的用户自定义标签 |

map

Fast URL discovery and site structure mapping. Easily plan extraction workflows. Returns URL metadata only (URLs, titles, descriptions) — not page content. Use

extract

crawl

to get actual content from the discovered URLs. Run

nimble map --help

for all options.

bash

undefined

快速发现URL并映射站点结构，轻松规划提取工作流。仅返回URL元数据（URL、标题、描述）——不包含页面内容。使用

extract

或

crawl

从发现的URL获取实际内容。运行

nimble map --help

查看所有选项。

bash

undefined

Map all URLs on a site (returns URLs only, not content)

映射站点上的所有URL（仅返回URL，不包含内容）

nimble map --url "https://example.com"

Limit number of URLs returned

限制返回的URL数量

nimble map --url "https://docs.example.com" --limit 100

Include subdomains

包含子域名

nimble map --url "https://example.com" --domain-filter subdomains

Use sitemap for discovery

使用站点地图进行发现

nimble map --url "https://example.com" --sitemap auto


**Key options:**

| Flag | Description |
|------|-------------|
| `--url` | URL to map (required) |
| `--limit` | Max number of links to return |
| `--domain-filter` | Include subdomains in mapping |
| `--sitemap` | Use sitemap for URL discovery |
| `--country` | Country code for geolocation |
| `--locale` | Locale for language settings |

nimble map --url "https://example.com" --sitemap auto


**关键选项：**

| 参数 | 描述 |
|------|-------------|
| `--url` | 要映射的URL（必填） |
| `--limit` | 返回的最大链接数 |
| `--domain-filter` | 映射中包含子域名 |
| `--sitemap` | 使用站点地图进行URL发现 |
| `--country` | 地理定位的国家代码 |
| `--locale` | 语言设置的区域设置 |

crawl

Extract contents from entire websites in a single request. Collect large volumes of web data automatically. Crawl is async — you start a job, poll for completion, then retrieve the results. Run

nimble crawl run --help

for all options.

Crawl defaults:

Setting	Default	Notes
`--sitemap`	`auto`	Automatically uses sitemap if available
`--max-discovery-depth`	`5`	How deep the crawler follows links
`--limit`	No limit	Always set a limit to avoid crawling entire sites

Start a crawl:

bash

undefined

单次请求即可提取整个网站的内容，自动收集大量网络数据。爬取是异步的——您启动任务轮询完成状态，然后获取结果。运行

nimble crawl run --help

查看所有选项。

爬取默认设置：

设置	默认值	说明
`--sitemap`	`auto`	自动使用站点地图（如果可用）
`--max-discovery-depth`	`5`	爬取器跟踪链接的深度
`--limit`	无限制	始终设置限制以避免爬取整个站点

启动爬取：

bash

undefined

Crawl a site section (always set --limit)

爬取站点板块（始终设置--limit）

nimble crawl run --url "https://docs.example.com" --limit 50

Crawl with path filtering

带路径过滤的爬取

nimble crawl run --url "https://example.com" --include-path "/docs" --include-path "/api" --limit 100

Exclude paths

排除路径

nimble crawl run --url "https://example.com" --exclude-path "/blog" --exclude-path "/archive" --limit 50

Control crawl depth

控制爬取深度

nimble crawl run --url "https://example.com" --max-discovery-depth 3 --limit 50

Allow subdomains and external links

允许子域名和外部链接

nimble crawl run --url "https://example.com" --allow-subdomains --allow-external-links --limit 50

Crawl entire domain (not just child paths)

爬取整个域名（不仅是子路径）

nimble crawl run --url "https://example.com/docs" --crawl-entire-domain --limit 100

Named crawl for tracking

命名爬取任务以便跟踪

nimble crawl run --url "https://example.com" --name "docs-crawl-feb-2026" --limit 200

Use sitemap for discovery

使用站点地图进行发现

nimble crawl run --url "https://example.com" --sitemap auto --limit 50


**Key options for `crawl run`:**

| Flag | Description |
|------|-------------|
| `--url` | URL to crawl (required) |
| `--limit` | Max pages to crawl (**always set this**) |
| `--max-discovery-depth` | Max depth based on discovery order (default 5) |
| `--include-path` | Regex patterns for URLs to include (repeatable) |
| `--exclude-path` | Regex patterns for URLs to exclude (repeatable) |
| `--allow-subdomains` | Follow links to subdomains |
| `--allow-external-links` | Follow links to external sites |
| `--crawl-entire-domain` | Follow sibling/parent URLs, not just child paths |
| `--ignore-query-parameters` | Don't re-scrape same path with different query params |
| `--name` | Name for the crawl job |
| `--sitemap` | Use sitemap for URL discovery (default auto) |
| `--callback` | Webhook for receiving results |

**Poll crawl status and retrieve results:**

Crawl jobs run asynchronously. After starting a crawl, poll for completion, then retrieve content using **individual task IDs** (not the crawl ID):

```bash

nimble crawl run --url "https://example.com" --sitemap auto --limit 50


**`crawl run`关键选项：**

| 参数 | 描述 |
|------|-------------|
| `--url` | 要爬取的URL（必填） |
| `--limit` | 要爬取的最大页面数（**必须设置此参数**） |
| `--max-discovery-depth` | 基于发现顺序的最大深度（默认5） |
| `--include-path` | 要包含的URL正则模式（可重复使用） |
| `--exclude-path` | 要排除的URL正则模式（可重复使用） |
| `--allow-subdomains` | 跟踪子域名链接 |
| `--allow-external-links` | 跟踪外部站点链接 |
| `--crawl-entire-domain` | 跟踪同级/上级URL，不仅是子路径 |
| `--ignore-query-parameters` | 不重新抓取带有不同查询参数的相同路径 |
| `--name` | 爬取任务的名称 |
| `--sitemap` | 使用站点地图进行URL发现（默认auto） |
| `--callback` | 接收结果的Webhook |

**轮询爬取状态并获取结果：**

爬取任务异步运行。启动爬取后，轮询完成状态，然后使用**单个任务ID**（而非爬取ID）获取内容：

```bash

1. Start the crawl → returns a crawl_id

1. 启动爬取 → 返回crawl_id

nimble crawl run --url "https://docs.example.com" --limit 5

Returns: crawl_id "abc-123"

返回：crawl_id "abc-123"

2. Poll status until completed → returns individual task_ids per page

2. 轮询状态直到完成 → 返回每个页面的单独task_ids

nimble crawl status --id "abc-123"

Returns: tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]

返回：tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]

Status values: running, completed, failed, terminated

状态值：running、completed、failed、terminated

3. Retrieve content using INDIVIDUAL task_ids (NOT the crawl_id)

3. 使用单个task_ids（而非crawl_id）获取内容

nimble tasks results --task-id "task-456" nimble tasks results --task-id "task-789"

⚠️ Using the crawl_id here returns 404 — you must use the per-page task_ids from step 2

⚠️ 此处使用crawl_id会返回404错误——必须使用步骤2中的每个页面的task_ids


**IMPORTANT:** `nimble tasks results` requires the **individual task IDs** from `crawl status` (each crawled page gets its own task ID), not the crawl job ID. Using the crawl ID will return a 404 error.

**Polling guidelines:**
- Poll every **15-30 seconds** for small crawls (< 50 pages)
- Poll every **30-60 seconds** for larger crawls (50+ pages)
- Stop polling after status is `completed`, `failed`, or `terminated`
- **Note:** `crawl status` may occasionally misreport individual task statuses (showing "failed" for tasks that actually succeeded). If `crawl status` shows failed tasks, try retrieving their results with `nimble tasks results` before assuming failure

**List crawls:**

```bash


**重要提示：** `nimble tasks results`需要来自`crawl status`的**单个任务ID**（每个爬取的页面都有自己的任务ID），而非爬取任务ID。使用爬取ID会返回404错误。

**轮询指南：**
- 小型爬取（<50页）每**15-30秒**轮询一次
- 大型爬取（50+页）每**30-60秒**轮询一次
- 状态为`completed`、`failed`或`terminated`时停止轮询
- **注意：** `crawl status`可能偶尔错误报告单个任务状态（将实际成功的任务显示为“failed”）。如果`crawl status`显示失败任务，请先尝试使用`nimble tasks results`获取结果，再假设任务失败。

**列出爬取任务：**

```bash

List all crawls

列出所有爬取任务

nimble crawl list

Filter by status

按状态过滤

nimble crawl list --status running

Paginate results

分页返回结果

nimble crawl list --limit 10


**Cancel a crawl:**

```bash
nimble crawl terminate --id "crawl-task-id"

nimble crawl list --limit 10


**取消爬取任务：**

```bash
nimble crawl terminate --id "crawl-task-id"

Best Practices

最佳实践

Search Strategy

搜索策略

Always pass
--deep-search=false
— the default is deep mode (slow). Fast mode covers 95% of use cases: URL discovery, research, comparisons, answer generation
Only use deep mode when you need full page text — archiving articles, extracting complete docs, building datasets
Start with the right focus mode — match
```
--topic
```
to your query type (see
```
references/search-focus-modes.md
```
)
Use
--include-answer
— get AI-synthesized insights without extracting each result. If it returns 402/403, retry without it.
Filter domains — use
```
--include-domain
```
to target authoritative sources
Add time filters — use
```
--time-range
```
for time-sensitive queries

始终传递
--deep-search=false
—— 默认是深度模式（速度慢）。快速模式覆盖95%的使用场景：URL发现、调研、对比、答案生成
仅在需要完整页面文本时使用深度模式 —— 归档文章、提取完整文档、构建数据集
从正确的聚焦模式开始 —— 将
```
--topic
```
与您的查询类型匹配（见
```
references/search-focus-modes.md
```
）
使用
--include-answer
—— 无需提取每个结果即可获取AI合成见解。如果返回402/403，请重试不带该参数的查询。
过滤域名 —— 使用
```
--include-domain
```
定位权威来源
添加时间过滤 —— 对时间敏感的查询使用
```
--time-range
```

Multi-Search Strategy

多搜索策略

When researching a topic in depth, run 2-3 searches in parallel with:

Different topics — e.g.,
```
social
```
+
```
general
```
for people research
Different query angles — e.g., "Jane Doe current job" + "Jane Doe career history" + "Jane Doe publications"

This is faster than sequential searches and gives broader coverage. Deduplicate results by URL before extracting.

深入调研某主题时，并行运行2-3次搜索，使用：

不同主题 —— 例如，人物调研使用
```
social
```
+
```
general
```
不同查询角度 —— 例如，“Jane Doe current job” + “Jane Doe career history” + “Jane Doe publications”

这比顺序搜索更快，覆盖范围更广。提取前按URL去重结果。

Disambiguating Common Names

同名人物消歧

When searching for a person with a common name:

Include distinguishing context in the query: company name, job title, city
Use
```
--topic social
```
— LinkedIn results include location and current company, making disambiguation easier
Cross-reference results across searches to confirm you're looking at the right person

搜索同名人物时：

查询中加入区分性上下文：公司名称、职位、城市
使用
```
--topic social
```
—— LinkedIn结果包含地点和当前公司，便于消歧
跨搜索交叉验证结果，确认您查找的是正确的人物

Extraction Strategy

提取策略

Always use
--parse --format markdown
— returns clean markdown instead of raw HTML, preventing context window overflow
Try without
--render
first — it's faster for static pages
Add
--render
for SPAs — when content is loaded by JavaScript
Set geolocation — use
```
--country
```
to see region-specific content

始终使用
--parse --format markdown
—— 返回清晰的markdown而非原始HTML，避免上下文窗口溢出
先尝试不带
--render
的提取 —— 静态页面速度更快
SPA页面添加
--render
—— 当内容由JavaScript加载时
设置地理定位 —— 使用
```
--country
```
查看特定区域的内容

Crawl Strategy

爬取策略

Prefer
map
+
extract
over
crawl
for LLM use — crawl results return raw HTML (60-115KB per page) which overwhelms LLM context. For LLM-friendly output, use
```
map
```
to discover URLs, then
```
extract --parse --format markdown
```
on individual pages
Use
crawl
only for bulk archiving or data pipelines — when you need raw content from many pages and will post-process it outside the LLM context
Always set
--limit
— crawl has no default limit, so always specify one to avoid crawling entire sites
Use path filters —
```
--include-path
```
and
```
--exclude-path
```
to target specific sections
Name your crawls — use
```
--name
```
for easy tracking
Retrieve with individual task IDs —
```
crawl status
```
returns per-page task IDs; use those (not the crawl ID) with
```
nimble tasks results --task-id
```

LLM使用优先选择
map
+
extract
而非
crawl
—— 爬取结果返回原始HTML（每页60-115KB），会超出LLM上下文窗口。如需LLM友好的输出，使用
```
map
```
发现URL，然后对单个页面使用
```
extract --parse --format markdown
```
仅在批量归档或数据管道中使用
crawl
—— 当您需要大量页面的原始内容并将在LLM上下文外进行后处理时
始终设置
--limit
—— 爬取无默认限制，因此请始终指定限制以避免爬取整个站点
使用路径过滤 ——
```
--include-path
```
和
```
--exclude-path
```
定位特定板块
为爬取任务命名 —— 使用
```
--name
```
便于跟踪
使用单个任务ID获取结果 ——
```
crawl status
```
返回每个页面的任务ID；将这些ID（而非爬取ID）与
```
nimble tasks results --task-id
```
配合使用

Common Recipes

常用示例

Researching a person

调研人物

bash

undefined

bash

undefined

Step 1: Run social + general in parallel for max coverage

步骤1：并行运行social+general以获得最大覆盖范围

nimble search --query "Jane Doe Head of Engineering" --topic social --deep-search=false --num-results 10 --include-answer nimble search --query "Jane Doe Head of Engineering" --topic general --deep-search=false --num-results 10 --include-answer

Step 2: Broaden with different query angles in parallel

步骤2：并行使用不同查询角度扩大范围

nimble search --query "Jane Doe career history Acme Corp" --deep-search=false --include-answer nimble search --query "Jane Doe publications blog articles" --deep-search=false --include-answer

Step 3: Extract the most promising non-auth-walled URLs (skip LinkedIn — see Known Limitations)

步骤3：提取最有价值的非授权墙URL（跳过LinkedIn——见已知限制）

nimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown

undefined

nimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown

undefined

Researching a company

调研公司

bash

undefined

bash

undefined

Step 1: Overview + recent news in parallel

步骤1：并行获取概述+近期新闻

nimble search --query "Acme Corp" --topic general --deep-search=false --include-answer nimble search --query "Acme Corp" --topic news --time-range month --deep-search=false --include-answer

Step 2: Extract company page

步骤2：提取公司页面

nimble extract --url "https://acme.com/about" --parse --format markdown

undefined

nimble extract --url "https://acme.com/about" --parse --format markdown

undefined

Technical research

技术调研

bash

undefined

bash

undefined

Step 1: Find docs and code examples

步骤1：查找文档和代码示例

nimble search --query "React Server Components migration guide" --topic coding --deep-search=false --include-answer

Step 2: Extract the most relevant doc

步骤2：提取最相关的文档

nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown

undefined

nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown

undefined

Error Handling

错误处理

Error	Solution
`NIMBLE_API_KEY not set`	Set the environment variable: `export NIMBLE_API_KEY="your-key"`
`401 Unauthorized`	Verify API key is active at nimbleway.com
`402` / `403` with `--include-answer`	Premium feature not available on current plan. Retry the same query without `--include-answer` and continue
`429 Too Many Requests`	Reduce request frequency or upgrade API tier
Timeout	Ensure `--deep-search=false` is set, reduce `--num-results` , or increase `--request-timeout`
No results	Try different `--topic` , broaden query, remove domain filters

错误	解决方案
`NIMBLE_API_KEY not set`	设置环境变量： `export NIMBLE_API_KEY="your-key"`
`401 Unauthorized`	在nimbleway.com验证API密钥是否有效
使用 `--include-answer` 时返回 `402` / `403`	当前计划不支持此高级功能。重试不带该参数的相同查询并继续
`429 Too Many Requests`	降低请求频率或升级API套餐
超时	确保设置了 `--deep-search=false` ，减少 `--num-results` ，或增加 `--request-timeout`
无结果	尝试不同的 `--topic` ，扩大查询范围，移除域名过滤

Known Limitations

已知限制

Site	Issue	Workaround
LinkedIn profiles	Auth wall blocks extraction (returns redirect/JS, status 999)	Use `--topic social` search instead — it returns LinkedIn data directly via subagents. Do NOT try to `extract` LinkedIn URLs.
Sites behind login	Extract returns login page instead of content	No workaround — use search snippets instead
Heavy SPAs	Extract returns empty or minimal HTML	Add `--render` flag to execute JavaScript before extraction
Crawl results	Returns raw HTML (60-115KB per page), no markdown option	Use `map` + `extract --parse --format markdown` on individual pages for LLM-friendly output
Crawl status	May misreport individual task statuses as "failed" when they actually succeeded	Always try `nimble tasks results --task-id` before assuming failure

站点	问题	解决方法
LinkedIn资料	授权墙阻止提取（返回重定向/JS，状态码999）	改用 `--topic social` 搜索——通过子Agent直接返回LinkedIn数据。不要尝试 `extract` LinkedIn URL。
需要登录的站点	Extract返回登录页面而非内容	无解决方法——改用搜索摘要
复杂SPA	Extract返回空或极简HTML	添加 `--render` 参数，在提取前执行JavaScript
爬取结果	返回原始HTML（每页60-115KB），无markdown选项	对单个页面使用 `map` + `extract --parse --format markdown` 以获得LLM友好的输出
爬取状态	可能将单个任务状态错误报告为“failed”，但实际任务已成功	假设任务失败前，始终尝试使用 `nimble tasks results --task-id` 获取结果