docs-manage

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Docs Manage

文档管理

Index, refresh, and remove library documentation in the local Grounded Docs store. These commands modify the index and produce plain-text status messages on stdout.

在本地Grounded Docs存储中索引、刷新和移除库文档。这些命令会修改索引，并在标准输出（stdout）上生成纯文本状态消息。

When to use

使用场景

A library is not yet indexed and you need its docs available for search.
Documentation may be stale and you want to pull in updated pages.
You want to remove a library or version from the index to free space.

某个库尚未被索引，需要使其文档可被搜索
文档可能已过时，需要获取更新后的页面
想要从索引中移除某个库或版本以释放空间

Commands

命令

scrape

Download and index documentation from a URL or local directory.

bash

npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]

Flag	Alias	Default	Description
`--version <ver>`	`-v`		Library version label
`--max-pages <n>`	`-p`	config default	Maximum pages to scrape
`--max-depth <n>`	`-d`	config default	Maximum navigation depth
`--max-concurrency <n>`	`-c`	config default	Concurrent page requests
`--ignore-errors`		`true`	Continue on individual page errors
`--scope subpages\|hostname\|domain`		`subpages`	Crawling boundary
`--follow-redirects`		`true`	Follow HTTP redirects
`--no-follow-redirects`			Disable following redirects
`--scrape-mode auto\|fetch\|playwright`		`auto`	HTML processing strategy
`--include-pattern <glob>`			URL include pattern (repeatable)
`--exclude-pattern <glob>`			URL exclude pattern (repeatable, takes precedence)
`--header "Name: Value"`			Custom HTTP header (repeatable)
`--embedding-model <model>`			Embedding model configuration
`--server-url <url>`			Remote pipeline worker URL
`--clean`		`true`	Clear existing documents before scraping
`--quiet`			Suppress non-error diagnostics
`--verbose`			Enable debug logging

Examples:

bash

undefined

从URL或本地目录下载并索引文档。

bash

npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]

参数	别名	默认值	说明
`--version <ver>`	`-v`		库版本标签
`--max-pages <n>`	`-p`	配置默认值	最大抓取页面数
`--max-depth <n>`	`-d`	配置默认值	最大导航深度
`--max-concurrency <n>`	`-c`	配置默认值	并发页面请求数
`--ignore-errors`		`true`	遇到单个页面错误时继续执行
`--scope subpages\|hostname\|domain`		`subpages`	爬取边界
`--follow-redirects`		`true`	跟随HTTP重定向
`--no-follow-redirects`			禁用跟随重定向
`--scrape-mode auto\|fetch\|playwright`		`auto`	HTML处理策略
`--include-pattern <glob>`			URL包含模式（可重复指定）
`--exclude-pattern <glob>`			URL排除模式（可重复指定，优先级更高）
`--header "Name: Value"`			自定义HTTP请求头（可重复指定）
`--embedding-model <model>`			嵌入模型配置
`--server-url <url>`			远程流水线工作节点URL
`--clean`		`true`	抓取前清除现有文档
`--quiet`			抑制非错误诊断信息
`--verbose`			启用调试日志

示例：

bash

undefined

Scrape React docs, version-tagged

抓取React文档并标记版本

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0

Scrape local files

抓取本地文件

npx @arabold/docs-mcp-server@latest scrape mylib file:///Users/me/docs/my-library

Scrape with depth and page limits

限制深度和页面数进行抓取

npx @arabold/docs-mcp-server@latest scrape nextjs https://nextjs.org/docs --max-pages 200 --max-depth 3

Scrape with custom headers (e.g. authentication)

使用自定义请求头抓取（如身份验证）

npx @arabold/docs-mcp-server@latest scrape internal-api https://docs.internal.com
--header "Authorization: Bearer tok_xxx"

Exclude changelog pages

排除更新日志页面

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"


Output is a plain-text status line, e.g. `Successfully scraped 42 pages`.
Progress updates appear on stderr during the run.

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"


输出为纯文本状态行，例如`Successfully scraped 42 pages`。运行过程中的进度更新会显示在标准错误输出（stderr）中。

refresh

Re-scrape an existing library version, skipping unchanged pages via HTTP ETags.

bash

npx @arabold/docs-mcp-server@latest refresh <library> [options]

Flag	Alias	Description
`--version <ver>`	`-v`	Version to refresh (omit for latest)
`--embedding-model <model>`		Embedding model configuration
`--server-url <url>`		Remote pipeline worker URL
`--quiet`		Suppress non-error diagnostics
`--verbose`		Enable debug logging

Example:

bash

npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0

The library and version must already be indexed. Use

scrape

for first-time indexing.

重新抓取已存在的库版本，通过HTTP ETags跳过未更改的页面。

bash

npx @arabold/docs-mcp-server@latest refresh <library> [options]

参数	别名	说明
`--version <ver>`	`-v`	要刷新的版本（省略则刷新最新版本）
`--embedding-model <model>`		嵌入模型配置
`--server-url <url>`		远程流水线工作节点URL
`--quiet`		抑制非错误诊断信息
`--verbose`		启用调试日志

示例：

bash

npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0

该库和版本必须已被索引。首次索引请使用

scrape

命令。

remove

Delete a library (or a specific version) from the index.

bash

npx @arabold/docs-mcp-server@latest remove <library> [options]

Flag	Alias	Description
`--version <ver>`	`-v`	Specific version to remove (omit to remove latest)
`--server-url <url>`		Remote pipeline worker URL
`--quiet`		Suppress non-error diagnostics
`--verbose`		Enable debug logging

Example:

bash

npx @arabold/docs-mcp-server@latest remove react --version 18.3.1

This is destructive and cannot be undone. Re-run

scrape

to re-index.

从索引中删除某个库（或特定版本）。

bash

npx @arabold/docs-mcp-server@latest remove <library> [options]

参数	别名	说明
`--version <ver>`	`-v`	要删除的特定版本（省略则删除最新版本）
`--server-url <url>`		远程流水线工作节点URL
`--quiet`		抑制非错误诊断信息
`--verbose`		启用调试日志

示例：

bash

npx @arabold/docs-mcp-server@latest remove react --version 18.3.1

此操作具有破坏性且无法撤销。如需重新索引，请重新运行

scrape

命令。

Output behaviour

输出行为

All three commands write plain-text status messages to stdout and diagnostics to stderr. The global

--output

flag is accepted but has no effect because the output is plain text, not structured data.

In non-interactive sessions, diagnostics are suppressed by default. Use

--verbose

(or set

LOG_LEVEL=INFO

) to re-enable them. Use

--quiet

to suppress all non-error diagnostics regardless of session type.

所有三个命令都会将纯文本状态消息写入标准输出（stdout），并将诊断信息写入标准错误输出（stderr）。全局

--output

参数可被接受，但不会产生效果，因为输出为纯文本而非结构化数据。

在非交互式会话中，诊断信息默认被抑制。使用

--verbose

（或设置

LOG_LEVEL=INFO

）可重新启用。无论会话类型如何，使用

--quiet

可抑制所有非错误诊断信息。

Typical workflow

典型工作流程

bash

undefined

bash

undefined

1. Index documentation for the first time

1. 首次索引文档

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0

2. Later, refresh to pick up any changes

2. 后续刷新以获取更新内容

npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0

3. Clean up old versions

3. 清理旧版本

npx @arabold/docs-mcp-server@latest remove react --version 18.3.1

undefined

npx @arabold/docs-mcp-server@latest remove react --version 18.3.1

undefined

Important notes

重要注意事项

Scraping can take time. Large documentation sites with hundreds of pages may run for several minutes. Use
```
--max-pages
```
and
```
--max-depth
```
to limit scope when you only need a subset.
Local files must use the
```
file://
```
URL scheme (e.g.
```
file:///absolute/path/to/docs
```
).
--clean
is on by default for
```
scrape
```
, meaning existing documents for the same library+version are removed before re-indexing. Pass
```
--no-clean
```
to append instead.
refresh
only works on previously indexed content. It uses HTTP ETags to skip pages that have not changed, making it much faster than a full re-scrape.

抓取可能需要时间：包含数百个页面的大型文档网站可能需要运行数分钟。当仅需要子集内容时，使用
```
--max-pages
```
和
```
--max-depth
```
限制范围。
本地文件必须使用
```
file://
```
URL协议（例如
```
file:///absolute/path/to/docs
```
）。
--clean
在
scrape
命令中默认开启，这意味着在重新索引前会移除同一库+版本的现有文档。传递
```
--no-clean
```
参数可改为追加模式。
**
```
refresh
```
**仅适用于已索引的内容。它使用HTTP ETags跳过未更改的页面，因此比完全重新抓取快得多。