docs-manage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocs Manage
文档管理
Index, refresh, and remove library documentation in the local Grounded Docs
store. These commands modify the index and produce plain-text status messages
on stdout.
在本地Grounded Docs存储中索引、刷新和移除库文档。这些命令会修改索引,并在标准输出(stdout)上生成纯文本状态消息。
When to use
使用场景
- A library is not yet indexed and you need its docs available for search.
- Documentation may be stale and you want to pull in updated pages.
- You want to remove a library or version from the index to free space.
- 某个库尚未被索引,需要使其文档可被搜索
- 文档可能已过时,需要获取更新后的页面
- 想要从索引中移除某个库或版本以释放空间
Commands
命令
scrape
scrape
Download and index documentation from a URL or local directory.
bash
npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]| Flag | Alias | Default | Description |
|---|---|---|---|
| | Library version label | |
| | config default | Maximum pages to scrape |
| | config default | Maximum navigation depth |
| | config default | Concurrent page requests |
| | Continue on individual page errors | |
| | Crawling boundary | |
| | Follow HTTP redirects | |
| Disable following redirects | ||
| | HTML processing strategy | |
| URL include pattern (repeatable) | ||
| URL exclude pattern (repeatable, takes precedence) | ||
| Custom HTTP header (repeatable) | ||
| Embedding model configuration | ||
| Remote pipeline worker URL | ||
| | Clear existing documents before scraping | |
| Suppress non-error diagnostics | ||
| Enable debug logging |
Examples:
bash
undefined从URL或本地目录下载并索引文档。
bash
npx @arabold/docs-mcp-server@latest scrape <library> <url> [options]| 参数 | 别名 | 默认值 | 说明 |
|---|---|---|---|
| | 库版本标签 | |
| | 配置默认值 | 最大抓取页面数 |
| | 配置默认值 | 最大导航深度 |
| | 配置默认值 | 并发页面请求数 |
| | 遇到单个页面错误时继续执行 | |
| | 爬取边界 | |
| | 跟随HTTP重定向 | |
| 禁用跟随重定向 | ||
| | HTML处理策略 | |
| URL包含模式(可重复指定) | ||
| URL排除模式(可重复指定,优先级更高) | ||
| 自定义HTTP请求头(可重复指定) | ||
| 嵌入模型配置 | ||
| 远程流水线工作节点URL | ||
| | 抓取前清除现有文档 | |
| 抑制非错误诊断信息 | ||
| 启用调试日志 |
示例:
bash
undefinedScrape React docs, version-tagged
抓取React文档并标记版本
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
Scrape local files
抓取本地文件
npx @arabold/docs-mcp-server@latest scrape mylib file:///Users/me/docs/my-library
npx @arabold/docs-mcp-server@latest scrape mylib file:///Users/me/docs/my-library
Scrape with depth and page limits
限制深度和页面数进行抓取
npx @arabold/docs-mcp-server@latest scrape nextjs https://nextjs.org/docs --max-pages 200 --max-depth 3
npx @arabold/docs-mcp-server@latest scrape nextjs https://nextjs.org/docs --max-pages 200 --max-depth 3
Scrape with custom headers (e.g. authentication)
使用自定义请求头抓取(如身份验证)
npx @arabold/docs-mcp-server@latest scrape internal-api https://docs.internal.com
--header "Authorization: Bearer tok_xxx"
--header "Authorization: Bearer tok_xxx"
npx @arabold/docs-mcp-server@latest scrape internal-api https://docs.internal.com
--header "Authorization: Bearer tok_xxx"
--header "Authorization: Bearer tok_xxx"
Exclude changelog pages
排除更新日志页面
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"
--exclude-pattern "**/changelog*"
Output is a plain-text status line, e.g. `Successfully scraped 42 pages`.
Progress updates appear on stderr during the run.npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react
--exclude-pattern "**/changelog*"
--exclude-pattern "**/changelog*"
输出为纯文本状态行,例如`Successfully scraped 42 pages`。运行过程中的进度更新会显示在标准错误输出(stderr)中。refresh
refresh
Re-scrape an existing library version, skipping unchanged pages via HTTP ETags.
bash
npx @arabold/docs-mcp-server@latest refresh <library> [options]| Flag | Alias | Description |
|---|---|---|
| | Version to refresh (omit for latest) |
| Embedding model configuration | |
| Remote pipeline worker URL | |
| Suppress non-error diagnostics | |
| Enable debug logging |
Example:
bash
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0The library and version must already be indexed. Use for first-time
indexing.
scrape重新抓取已存在的库版本,通过HTTP ETags跳过未更改的页面。
bash
npx @arabold/docs-mcp-server@latest refresh <library> [options]| 参数 | 别名 | 说明 |
|---|---|---|
| | 要刷新的版本(省略则刷新最新版本) |
| 嵌入模型配置 | |
| 远程流水线工作节点URL | |
| 抑制非错误诊断信息 | |
| 启用调试日志 |
示例:
bash
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0该库和版本必须已被索引。首次索引请使用命令。
scraperemove
remove
Delete a library (or a specific version) from the index.
bash
npx @arabold/docs-mcp-server@latest remove <library> [options]| Flag | Alias | Description |
|---|---|---|
| | Specific version to remove (omit to remove latest) |
| Remote pipeline worker URL | |
| Suppress non-error diagnostics | |
| Enable debug logging |
Example:
bash
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1This is destructive and cannot be undone. Re-run to re-index.
scrape从索引中删除某个库(或特定版本)。
bash
npx @arabold/docs-mcp-server@latest remove <library> [options]| 参数 | 别名 | 说明 |
|---|---|---|
| | 要删除的特定版本(省略则删除最新版本) |
| 远程流水线工作节点URL | |
| 抑制非错误诊断信息 | |
| 启用调试日志 |
示例:
bash
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1此操作具有破坏性且无法撤销。如需重新索引,请重新运行命令。
scrapeOutput behaviour
输出行为
All three commands write plain-text status messages to stdout and
diagnostics to stderr. The global flag is accepted but has no
effect because the output is plain text, not structured data.
--outputIn non-interactive sessions, diagnostics are suppressed by default. Use
(or set ) to re-enable them. Use to
suppress all non-error diagnostics regardless of session type.
--verboseLOG_LEVEL=INFO--quiet所有三个命令都会将纯文本状态消息写入标准输出(stdout),并将诊断信息写入标准错误输出(stderr)。全局参数可被接受,但不会产生效果,因为输出为纯文本而非结构化数据。
--output在非交互式会话中,诊断信息默认被抑制。使用(或设置)可重新启用。无论会话类型如何,使用可抑制所有非错误诊断信息。
--verboseLOG_LEVEL=INFO--quietTypical workflow
典型工作流程
bash
undefinedbash
undefined1. Index documentation for the first time
1. 首次索引文档
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
npx @arabold/docs-mcp-server@latest scrape react https://react.dev/reference/react --version 19.0.0
2. Later, refresh to pick up any changes
2. 后续刷新以获取更新内容
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0
npx @arabold/docs-mcp-server@latest refresh react --version 19.0.0
3. Clean up old versions
3. 清理旧版本
npx @arabold/docs-mcp-server@latest remove react --version 18.3.1
undefinednpx @arabold/docs-mcp-server@latest remove react --version 18.3.1
undefinedImportant notes
重要注意事项
- Scraping can take time. Large documentation sites with hundreds of pages
may run for several minutes. Use and
--max-pagesto limit scope when you only need a subset.--max-depth - Local files must use the URL scheme (e.g.
file://).file:///absolute/path/to/docs - is on by default for
--clean, meaning existing documents for the same library+version are removed before re-indexing. Passscrapeto append instead.--no-clean - only works on previously indexed content. It uses HTTP ETags to skip pages that have not changed, making it much faster than a full re-scrape.
refresh
- 抓取可能需要时间:包含数百个页面的大型文档网站可能需要运行数分钟。当仅需要子集内容时,使用和
--max-pages限制范围。--max-depth - 本地文件必须使用URL协议(例如
file://)。file:///absolute/path/to/docs - 在
--clean命令中默认开启,这意味着在重新索引前会移除同一库+版本的现有文档。传递scrape参数可改为追加模式。--no-clean - ****仅适用于已索引的内容。它使用HTTP ETags跳过未更改的页面,因此比完全重新抓取快得多。
refresh