cyte-cli

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

cyte CLI Skill

cyte CLI 技能指南

Practical command guide for AI agents using the

cyte

CLI.

面向AI Agent的

cyte

CLI实用命令指南。

When to Apply

适用场景

Reference this skill when:

A user asks to extract content from a URL
A user asks to discover links from a page
A user asks to crawl docs/sites recursively
A user asks for structured JSON/JSONL output for automation

在以下场景中参考本技能：

用户要求从URL提取内容
用户要求从页面发现链接
用户要求递归爬取文档/网站
用户要求生成用于自动化的结构化JSON/JSONL输出

Execution Modes

执行模式

One-off (recommended):
```
npx cyte ...
```
PNPM one-off:
```
pnpm dlx cyte ...
```
Global install:
```
cyte ...
```

一次性使用（推荐）：
```
npx cyte ...
```
PNPM一次性使用：
```
pnpm dlx cyte ...
```
全局安装：
```
cyte ...
```

Core Commands

核心命令

1. Extract Single Page

1. 提取单页内容

bash

npx cyte <url>
npx cyte <url> --json

bash

npx cyte <url>
npx cyte <url> --json

2. Discover Links

2. 发现链接

bash

npx cyte links <url>
npx cyte links <url> --json
npx cyte links <url> --internal --json
npx cyte links <url> --external --json
npx cyte links <url> --internal --match <pattern> --json

bash

npx cyte links <url>
npx cyte links <url> --json
npx cyte links <url> --internal --json
npx cyte links <url> --external --json
npx cyte links <url> --internal --match <pattern> --json

3. Deep Crawl

3. 深度爬取

bash

npx cyte <url> --deep --depth 2
npx cyte <url> --deep --json
npx cyte <url> --deep --json --format jsonl

bash

npx cyte <url> --deep --depth 2
npx cyte <url> --deep --json
npx cyte <url> --deep --json --format jsonl

Agent Workflow

Agent 工作流程

Ensure URL is present (ask user if missing).
Default to
```
links --json
```
for exploration.
Extract priority pages via
```
<url> --json
```
.
Escalate to
```
--deep
```
only when broader coverage is required.
Use conservative crawl defaults first:
- ```
--depth 1 --concurrency 2 --delay 200
```

确认URL已提供（若缺失则询问用户）。
默认使用
```
links --json
```
进行探索。
通过
```
<url> --json
```
提取优先级页面。
仅当需要更广泛的覆盖范围时，才使用
```
--deep
```
模式。
首先使用保守的爬取默认参数：
- ```
--depth 1 --concurrency 2 --delay 200
```

Output Contracts

输出格式约定

cyte <url> --json

```
{ url, title, markdown, links }
```

cyte links <url> --json

```
[{ title, url, type }]
```

cyte <url> --deep --json

{ startUrl, pagesVisited, pagesSucceeded, pagesFailed, pages }

cyte <url> --json

```
{ url, title, markdown, links }
```

cyte links <url> --json

```
[{ title, url, type }]
```

cyte <url> --deep --json

{ startUrl, pagesVisited, pagesSucceeded, pagesFailed, pages }

Behavior and Constraints

行为与约束

Prefer
```
--json
```
for machine workflows.
Bare domains are valid input (for example:
```
vercel.com
```
).
Respect robots by default.
Use
```
--no-respect-robots
```
only if explicitly requested.
Deep crawl writes files to
```
./cyte
```
unless
```
--output
```
is set.
Use
```
--sitemap
```
if regular traversal misses important pages.

对于机器工作流，优先使用
```
--json
```
格式。
裸域名可作为有效输入（例如：
```
vercel.com
```
）。
默认遵循robots协议。
仅当用户明确要求时，才使用
```
--no-respect-robots
```
参数。
深度爬取默认将文件写入
```
./cyte
```
目录，除非设置了
```
--output
```
参数。
如果常规遍历遗漏重要页面，可使用
```
--sitemap
```
参数。

Quick Examples

快速示例

bash

undefined

bash

undefined

Discover docs URLs first

先发现文档URL

npx cyte links docs.example.com --internal --json

Pull one page into JSON for an agent

提取单页内容为JSON供Agent使用

npx cyte docs.example.com/getting-started --json

Build a larger snapshot

生成更大范围的快照

npx cyte docs.example.com --deep --depth 2 --json

undefined

npx cyte docs.example.com --deep --depth 2 --json

undefined