docs-manage

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Docs Manage

文档管理

Index, refresh, and remove library documentation in the local Grounded Docs store. These commands modify the index and produce plain-text status messages on stdout.
在本地Grounded Docs存储中索引、刷新和移除库文档。这些命令会修改索引,并在标准输出(stdout)上生成纯文本状态消息。

When to use

使用场景

  • A library is not yet indexed and you need its docs available for search.
  • Documentation may be stale and you want to pull in updated pages.
  • You want to remove a library or version from the index to free space.
  • 某个库尚未被索引,需要使其文档可被搜索
  • 文档可能已过时,需要获取更新后的页面
  • 想要从索引中移除某个库或版本以释放空间

Commands

命令

scrape

scrape

Download and index documentation from a URL or local directory.
bash
npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]
FlagAliasDefaultDescription
--version <ver>
-v
Library version label
--max-pages <n>
-p
config defaultMaximum pages to scrape
--max-depth <n>
-d
config defaultMaximum navigation depth
--max-concurrency <n>
-c
config defaultConcurrent page requests
--ignore-errors
true
Continue on individual page errors
--scope subpages|hostname|domain
subpages
Crawling boundary
--follow-redirects
true
Follow HTTP redirects
--no-follow-redirects
Disable following redirects
--scrape-mode auto|fetch|playwright
auto
HTML processing strategy
--include-pattern <glob>
URL include pattern (repeatable)
--exclude-pattern <glob>
URL exclude pattern (repeatable, takes precedence)
--header "Name: Value"
Custom HTTP header (repeatable)
--embedding-model <model>
Embedding model configuration
--server-url <url>
Remote pipeline worker URL
--clean
true
Clear existing documents before scraping
--quiet
Suppress non-error diagnostics
--verbose
Enable debug logging
Examples:
bash
undefined
从URL或本地目录下载并索引文档。
bash
npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]
参数别名默认值说明
--version <ver>
-v
库版本标签
--max-pages <n>
-p
配置默认值最大抓取页面数
--max-depth <n>
-d
配置默认值最大导航深度
--max-concurrency <n>
-c
配置默认值并发页面请求数
--ignore-errors
true
遇到单个页面错误时继续执行
--scope subpages|hostname|domain
subpages
爬取边界
--follow-redirects
true
跟随HTTP重定向
--no-follow-redirects
禁用跟随重定向
--scrape-mode auto|fetch|playwright
auto
HTML处理策略
--include-pattern <glob>
URL包含模式(可重复指定)
--exclude-pattern <glob>
URL排除模式(可重复指定,优先级更高)
--header "Name: Value"
自定义HTTP请求头(可重复指定)
--embedding-model <model>
嵌入模型配置
--server-url <url>
远程流水线工作节点URL
--clean
true
抓取前清除现有文档
--quiet
抑制非错误诊断信息
--verbose
启用调试日志
示例:
bash
undefined

Scrape React docs, version-tagged

抓取React文档并标记版本

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0

Scrape local files

抓取本地文件

npx @arabold/docs-mcp-server@latest scrape mylib file:///Users/me/docs/my-library
npx @arabold/docs-mcp-server@latest scrape mylib file:///Users/me/docs/my-library

Scrape with depth and page limits

限制深度和页面数进行抓取

npx @arabold/docs-mcp-server@latest scrape nextjs https://nextjs.org/docs --max-pages 200 --max-depth 3
npx @arabold/docs-mcp-server@latest scrape nextjs https://nextjs.org/docs --max-pages 200 --max-depth 3

Scrape with custom headers (e.g. authentication)

使用自定义请求头抓取(如身份验证)

npx @arabold/docs-mcp-server@latest scrape internal-api https://docs.internal.com
--header "Authorization: Bearer tok_xxx"
npx @arabold/docs-mcp-server@latest scrape internal-api https://docs.internal.com
--header "Authorization: Bearer tok_xxx"

Exclude changelog pages

排除更新日志页面

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"

Output is a plain-text status line, e.g. `Successfully scraped 42 pages`.
Progress updates appear on stderr during the run.
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"

输出为纯文本状态行,例如`Successfully scraped 42 pages`。运行过程中的进度更新会显示在标准错误输出(stderr)中。

refresh

refresh

Re-scrape an existing library version, skipping unchanged pages via HTTP ETags.
bash
npx @arabold/docs-mcp-server@latest refresh <library> [options]
FlagAliasDescription
--version <ver>
-v
Version to refresh (omit for latest)
--embedding-model <model>
Embedding model configuration
--server-url <url>
Remote pipeline worker URL
--quiet
Suppress non-error diagnostics
--verbose
Enable debug logging
Example:
bash
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0
The library and version must already be indexed. Use
scrape
for first-time indexing.
重新抓取已存在的库版本,通过HTTP ETags跳过未更改的页面。
bash
npx @arabold/docs-mcp-server@latest refresh <library> [options]
参数别名说明
--version <ver>
-v
要刷新的版本(省略则刷新最新版本)
--embedding-model <model>
嵌入模型配置
--server-url <url>
远程流水线工作节点URL
--quiet
抑制非错误诊断信息
--verbose
启用调试日志
示例:
bash
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0
该库和版本必须已被索引。首次索引请使用
scrape
命令。

remove

remove

Delete a library (or a specific version) from the index.
bash
npx @arabold/docs-mcp-server@latest remove <library> [options]
FlagAliasDescription
--version <ver>
-v
Specific version to remove (omit to remove latest)
--server-url <url>
Remote pipeline worker URL
--quiet
Suppress non-error diagnostics
--verbose
Enable debug logging
Example:
bash
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1
This is destructive and cannot be undone. Re-run
scrape
to re-index.
从索引中删除某个库(或特定版本)。
bash
npx @arabold/docs-mcp-server@latest remove <library> [options]
参数别名说明
--version <ver>
-v
要删除的特定版本(省略则删除最新版本)
--server-url <url>
远程流水线工作节点URL
--quiet
抑制非错误诊断信息
--verbose
启用调试日志
示例:
bash
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1
此操作具有破坏性且无法撤销。如需重新索引,请重新运行
scrape
命令。

Output behaviour

输出行为

All three commands write plain-text status messages to stdout and diagnostics to stderr. The global
--output
flag is accepted but has no effect because the output is plain text, not structured data.
In non-interactive sessions, diagnostics are suppressed by default. Use
--verbose
(or set
LOG_LEVEL=INFO
) to re-enable them. Use
--quiet
to suppress all non-error diagnostics regardless of session type.
所有三个命令都会将纯文本状态消息写入标准输出(stdout),并将诊断信息写入标准错误输出(stderr)。全局
--output
参数可被接受,但不会产生效果,因为输出为纯文本而非结构化数据。
在非交互式会话中,诊断信息默认被抑制。使用
--verbose
(或设置
LOG_LEVEL=INFO
)可重新启用。无论会话类型如何,使用
--quiet
可抑制所有非错误诊断信息。

Typical workflow

典型工作流程

bash
undefined
bash
undefined

1. Index documentation for the first time

1. 首次索引文档

npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0

2. Later, refresh to pick up any changes

2. 后续刷新以获取更新内容

npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0

3. Clean up old versions

3. 清理旧版本

npx @arabold/docs-mcp-server@latest remove react --version 18.3.1
undefined
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1
undefined

Important notes

重要注意事项

  • Scraping can take time. Large documentation sites with hundreds of pages may run for several minutes. Use
    --max-pages
    and
    --max-depth
    to limit scope when you only need a subset.
  • Local files must use the
    file://
    URL scheme (e.g.
    file:///absolute/path/to/docs
    ).
  • --clean
    is on by default
    for
    scrape
    , meaning existing documents for the same library+version are removed before re-indexing. Pass
    --no-clean
    to append instead.
  • refresh
    only works on previously indexed content. It uses HTTP ETags to skip pages that have not changed, making it much faster than a full re-scrape.
  • 抓取可能需要时间:包含数百个页面的大型文档网站可能需要运行数分钟。当仅需要子集内容时,使用
    --max-pages
    --max-depth
    限制范围。
  • 本地文件必须使用
    file://
    URL协议(例如
    file:///absolute/path/to/docs
    )。
  • --clean
    scrape
    命令中默认开启
    ,这意味着在重新索引前会移除同一库+版本的现有文档。传递
    --no-clean
    参数可改为追加模式。
  • **
    refresh
    **仅适用于已索引的内容。它使用HTTP ETags跳过未更改的页面,因此比完全重新抓取快得多。