url-to-markdown

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

URL to Markdown

URL 转 Markdown

Fetches any URL via Chrome CDP and converts HTML to clean markdown.

通过Chrome CDP抓取任意URL，并将HTML转换为格式整洁的Markdown。

Script Directory

脚本目录

Important: All scripts are located in

scripts/

subdirectory of this skill.

Agent Execution Instructions:

Determine this SKILL.md file's directory path as
```
SKILL_DIR
```
Script path =
```
${SKILL_DIR}/scripts/<script-name>.ts
```
Replace all
```
${SKILL_DIR}
```
in this document with actual path

Script Reference:

Script	Purpose
`scripts/main.ts`	CLI entry point for URL fetching

重要提示：所有脚本均位于此skill的

scripts/

子目录中。

Agent执行说明：

确定此SKILL.md文件的目录路径为
```
SKILL_DIR
```
脚本路径 =
```
${SKILL_DIR}/scripts/<script-name>.ts
```
将本文档中所有
```
${SKILL_DIR}
```
替换为实际路径

脚本参考：

脚本	用途
`scripts/main.ts`	URL抓取的CLI入口

Features

功能特性

Chrome CDP for full JavaScript rendering
Two capture modes: auto or wait-for-user
Clean markdown output with metadata
Handles login-required pages via wait mode

基于Chrome CDP实现完整JavaScript渲染
两种捕获模式：自动模式或等待用户模式
带有元数据的整洁Markdown输出
通过等待模式处理需要登录的页面

Usage

使用方法

bash

undefined

bash

undefined

Auto mode (default) - capture when page loads

自动模式（默认）- 页面加载完成后捕获

npx -y bun ${SKILL_DIR}/scripts/main.ts <url>

Wait mode - wait for user signal before capture

等待模式 - 等待用户信号后再捕获

npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait

Save to specific file

保存到指定文件

npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md

undefined

npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md

undefined

Options

选项参数

Option	Description
`<url>`	URL to fetch
`-o <path>`	Output file path (default: auto-generated)
`--wait`	Wait for user signal before capturing
`--timeout <ms>`	Page load timeout (default: 30000)

选项	说明
`<url>`	要抓取的URL
`-o <path>`	输出文件路径（默认：自动生成）
`--wait`	捕获前等待用户信号
`--timeout <ms>`	页面加载超时时间（默认：30000）

Capture Modes

捕获模式

Auto Mode (default)

自动模式（默认）

Page loads → waits for network idle → captures immediately.

Best for:

Public pages
Static content
No login required

页面加载完成 → 等待网络空闲 → 立即捕获。

最适用于：

公开页面
静态内容
无需登录的页面

Wait Mode (

--wait

)

等待模式（

--wait

）

Page opens → user can interact (login, scroll, etc.) → user signals ready → captures.

Best for:

Login-required pages
Dynamic content needing interaction
Pages with lazy loading

Agent workflow for wait mode:

Run script with
```
--wait
```
flag

Script outputs:

Page opened. Press Enter when ready to capture...

Use
```
AskUserQuestion
```
to ask user if page is ready
When user confirms, send newline to stdin to trigger capture

页面打开 → 用户可进行交互（登录、滚动等）→ 用户确认准备就绪 → 开始捕获。

最适用于：

需要登录的页面
需要交互的动态内容
带有懒加载的页面

等待模式下的Agent工作流：

使用
```
--wait
```
参数运行脚本

脚本输出：

Page opened. Press Enter when ready to capture...

使用
```
AskUserQuestion
```
询问用户页面是否准备就绪
用户确认后，向标准输入发送换行符触发捕获

Output Format

输出格式

markdown

---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---

markdown

---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---

Page Title

Converted markdown content...

undefined

Converted markdown content...

undefined

Mode Selection Guide

模式选择指南

When user requests URL capture, help select appropriate mode:

Suggest Auto Mode when:

URL is public (no login wall visible)
Content appears static
User doesn't mention login requirements

Suggest Wait Mode when:

User mentions needing to log in
Site known to require authentication
User wants to scroll/interact before capture
Content is behind paywall

Ask user when unclear:

The page may require login or interaction before capturing.

Which mode should I use?
1. Auto - Capture immediately when loaded
2. Wait - Wait for you to interact first

当用户请求URL捕获时，帮助选择合适的模式：

建议使用自动模式的场景：

URL为公开页面（无登录墙）
内容为静态
用户未提及登录需求

建议使用等待模式的场景：

用户提到需要登录
已知该网站需要身份验证
用户希望先进行滚动/交互再捕获
内容位于付费墙之后

不确定时询问用户：

该页面可能需要登录或交互后才能捕获。

请问要使用哪种模式？
1. 自动模式 - 页面加载完成后立即捕获
2. 等待模式 - 先等待您完成交互

Output Directory

输出目录

Each capture creates a file organized by domain:

url-to-markdown/
└── <domain>/
    └── <slug>.md

Path Components:

```
<domain>
```
: Site domain (e.g.,
```
example.com
```
,
```
github.com
```
)
```
<slug>
```
: Generated from page title or URL path (kebab-case)

Slug Generation:

Extract from page title (preferred) or URL path
Convert to kebab-case, 2-6 words
Example: "Getting Started with React" →
```
getting-started-with-react
```

Conflict Resolution: If

url-to-markdown/<domain>/<slug>.md

already exists:

Append timestamp:
```
<slug>-YYYYMMDD-HHMMSS.md
```

Example:

getting-started.md

exists →

getting-started-20260118-143052.md

每次捕获都会生成一个按域名组织的文件：

url-to-markdown/
└── <domain>/
    └── <slug>.md

路径组成：

```
<domain>
```
：网站域名（例如：
```
example.com
```
、
```
github.com
```
）
```
<slug>
```
：由页面标题或URL路径生成（短横线分隔的小写格式）

Slug生成规则：

优先从页面标题提取，其次是URL路径
转换为短横线分隔的小写格式，保留2-6个单词
示例："Getting Started with React" →
```
getting-started-with-react
```

冲突解决：如果

url-to-markdown/<domain>/<slug>.md

已存在：

追加时间戳：
```
<slug>-YYYYMMDD-HHMMSS.md
```

示例：

getting-started.md

已存在 →

getting-started-20260118-143052.md

Error Handling

错误处理

Error	Resolution
Chrome not found	Install Chrome or set `URL_CHROME_PATH` env
Page timeout	Increase `--timeout` value
Capture failed	Try wait mode for complex pages
Empty content	Page may need JS rendering time

错误	解决方法
未找到Chrome	安装Chrome或设置 `URL_CHROME_PATH` 环境变量
页面超时	增大 `--timeout` 参数的值
捕获失败	尝试对复杂页面使用等待模式
内容为空	页面可能需要更多JavaScript渲染时间

Environment Variables

环境变量

Variable	Description
`URL_CHROME_PATH`	Custom Chrome executable path
`URL_DATA_DIR`	Custom data directory
`URL_CHROME_PROFILE_DIR`	Custom Chrome profile directory

变量	说明
`URL_CHROME_PATH`	自定义Chrome可执行文件路径
`URL_DATA_DIR`	自定义数据目录
`URL_CHROME_PROFILE_DIR`	自定义Chrome配置文件目录

Extension Support

扩展支持

Custom configurations via EXTEND.md.

Check paths (priority order):

.content-gen-skills/url-to-markdown/EXTEND.md

(project)

~/.content-gen-skills/url-to-markdown/EXTEND.md

(user)

If found, load before workflow. Extension content overrides defaults.

通过EXTEND.md进行自定义配置。

路径检查优先级：

.content-gen-skills/url-to-markdown/EXTEND.md

（项目级）

~/.content-gen-skills/url-to-markdown/EXTEND.md

（用户级）

如果找到该文件，将在工作流开始前加载。扩展内容将覆盖默认设置。

url-to-markdown

Original

Translation

URL to Markdown

URL 转 Markdown

Script Directory

脚本目录

Features

功能特性

Usage

使用方法

Auto mode (default) - capture when page loads

自动模式（默认）- 页面加载完成后捕获

Wait mode - wait for user signal before capture

等待模式 - 等待用户信号后再捕获

Save to specific file

保存到指定文件

Options

选项参数

Capture Modes

捕获模式

Auto Mode (default)

自动模式（默认）

Wait Mode (
`--wait`
)

等待模式（
`--wait`
）

Output Format

输出格式

Page Title

Page Title

Mode Selection Guide

模式选择指南

Output Directory

输出目录

Error Handling

错误处理

Environment Variables

环境变量

Extension Support

扩展支持

url-to-markdown

Original

Translation

URL to Markdown

URL 转 Markdown

Script Directory

脚本目录

Features

功能特性

Usage

使用方法

Auto mode (default) - capture when page loads

自动模式（默认）- 页面加载完成后捕获

Wait mode - wait for user signal before capture

等待模式 - 等待用户信号后再捕获

Save to specific file

保存到指定文件

Options

选项参数

Capture Modes

捕获模式

Auto Mode (default)

自动模式（默认）

Wait Mode (--wait)

等待模式（--wait）

Output Format

输出格式

Page Title

Page Title

Mode Selection Guide

模式选择指南

Output Directory

输出目录

Error Handling

错误处理

Environment Variables

环境变量

Extension Support

扩展支持

Wait Mode (
`--wait`
)

等待模式（
`--wait`
）