url-to-markdown

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

URL to Markdown

URL 转 Markdown

Fetches any URL via Chrome CDP and converts HTML to clean markdown.
通过Chrome CDP抓取任意URL,并将HTML转换为格式整洁的Markdown。

Script Directory

脚本目录

Important: All scripts are located in
scripts/
subdirectory of this skill.
Agent Execution Instructions:
  1. Determine this SKILL.md file's directory path as
    SKILL_DIR
  2. Script path =
    ${SKILL_DIR}/scripts/<script-name>.ts
  3. Replace all
    ${SKILL_DIR}
    in this document with actual path
Script Reference:
ScriptPurpose
scripts/main.ts
CLI entry point for URL fetching
重要提示:所有脚本均位于此skill的
scripts/
子目录中。
Agent执行说明
  1. 确定此SKILL.md文件的目录路径为
    SKILL_DIR
  2. 脚本路径 =
    ${SKILL_DIR}/scripts/<script-name>.ts
  3. 将本文档中所有
    ${SKILL_DIR}
    替换为实际路径
脚本参考
脚本用途
scripts/main.ts
URL抓取的CLI入口

Features

功能特性

  • Chrome CDP for full JavaScript rendering
  • Two capture modes: auto or wait-for-user
  • Clean markdown output with metadata
  • Handles login-required pages via wait mode
  • 基于Chrome CDP实现完整JavaScript渲染
  • 两种捕获模式:自动模式或等待用户模式
  • 带有元数据的整洁Markdown输出
  • 通过等待模式处理需要登录的页面

Usage

使用方法

bash
undefined
bash
undefined

Auto mode (default) - capture when page loads

自动模式(默认)- 页面加载完成后捕获

npx -y bun ${SKILL_DIR}/scripts/main.ts <url>
npx -y bun ${SKILL_DIR}/scripts/main.ts <url>

Wait mode - wait for user signal before capture

等待模式 - 等待用户信号后再捕获

npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> --wait

Save to specific file

保存到指定文件

npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
undefined
npx -y bun ${SKILL_DIR}/scripts/main.ts <url> -o output.md
undefined

Options

选项参数

OptionDescription
<url>
URL to fetch
-o <path>
Output file path (default: auto-generated)
--wait
Wait for user signal before capturing
--timeout <ms>
Page load timeout (default: 30000)
选项说明
<url>
要抓取的URL
-o <path>
输出文件路径(默认:自动生成)
--wait
捕获前等待用户信号
--timeout <ms>
页面加载超时时间(默认:30000)

Capture Modes

捕获模式

Auto Mode (default)

自动模式(默认)

Page loads → waits for network idle → captures immediately.
Best for:
  • Public pages
  • Static content
  • No login required
页面加载完成 → 等待网络空闲 → 立即捕获。
最适用于:
  • 公开页面
  • 静态内容
  • 无需登录的页面

Wait Mode (
--wait
)

等待模式(
--wait

Page opens → user can interact (login, scroll, etc.) → user signals ready → captures.
Best for:
  • Login-required pages
  • Dynamic content needing interaction
  • Pages with lazy loading
Agent workflow for wait mode:
  1. Run script with
    --wait
    flag
  2. Script outputs:
    Page opened. Press Enter when ready to capture...
  3. Use
    AskUserQuestion
    to ask user if page is ready
  4. When user confirms, send newline to stdin to trigger capture
页面打开 → 用户可进行交互(登录、滚动等)→ 用户确认准备就绪 → 开始捕获。
最适用于:
  • 需要登录的页面
  • 需要交互的动态内容
  • 带有懒加载的页面
等待模式下的Agent工作流
  1. 使用
    --wait
    参数运行脚本
  2. 脚本输出:
    Page opened. Press Enter when ready to capture...
  3. 使用
    AskUserQuestion
    询问用户页面是否准备就绪
  4. 用户确认后,向标准输入发送换行符触发捕获

Output Format

输出格式

markdown
---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---
markdown
---
url: https://example.com/page
title: "Page Title"
description: "Meta description if available"
author: "Author if available"
published: "2024-01-01"
captured_at: "2024-01-15T10:30:00Z"
---

Page Title

Page Title

Converted markdown content...
undefined
Converted markdown content...
undefined

Mode Selection Guide

模式选择指南

When user requests URL capture, help select appropriate mode:
Suggest Auto Mode when:
  • URL is public (no login wall visible)
  • Content appears static
  • User doesn't mention login requirements
Suggest Wait Mode when:
  • User mentions needing to log in
  • Site known to require authentication
  • User wants to scroll/interact before capture
  • Content is behind paywall
Ask user when unclear:
The page may require login or interaction before capturing.

Which mode should I use?
1. Auto - Capture immediately when loaded
2. Wait - Wait for you to interact first
当用户请求URL捕获时,帮助选择合适的模式:
建议使用自动模式的场景
  • URL为公开页面(无登录墙)
  • 内容为静态
  • 用户未提及登录需求
建议使用等待模式的场景
  • 用户提到需要登录
  • 已知该网站需要身份验证
  • 用户希望先进行滚动/交互再捕获
  • 内容位于付费墙之后
不确定时询问用户
该页面可能需要登录或交互后才能捕获。

请问要使用哪种模式?
1. 自动模式 - 页面加载完成后立即捕获
2. 等待模式 - 先等待您完成交互

Output Directory

输出目录

Each capture creates a file organized by domain:
url-to-markdown/
└── <domain>/
    └── <slug>.md
Path Components:
  • <domain>
    : Site domain (e.g.,
    example.com
    ,
    github.com
    )
  • <slug>
    : Generated from page title or URL path (kebab-case)
Slug Generation:
  1. Extract from page title (preferred) or URL path
  2. Convert to kebab-case, 2-6 words
  3. Example: "Getting Started with React" →
    getting-started-with-react
Conflict Resolution: If
url-to-markdown/<domain>/<slug>.md
already exists:
  • Append timestamp:
    <slug>-YYYYMMDD-HHMMSS.md
  • Example:
    getting-started.md
    exists →
    getting-started-20260118-143052.md
每次捕获都会生成一个按域名组织的文件:
url-to-markdown/
└── <domain>/
    └── <slug>.md
路径组成
  • <domain>
    :网站域名(例如:
    example.com
    github.com
  • <slug>
    :由页面标题或URL路径生成(短横线分隔的小写格式)
Slug生成规则
  1. 优先从页面标题提取,其次是URL路径
  2. 转换为短横线分隔的小写格式,保留2-6个单词
  3. 示例:"Getting Started with React" →
    getting-started-with-react
冲突解决: 如果
url-to-markdown/<domain>/<slug>.md
已存在:
  • 追加时间戳:
    <slug>-YYYYMMDD-HHMMSS.md
  • 示例:
    getting-started.md
    已存在 →
    getting-started-20260118-143052.md

Error Handling

错误处理

ErrorResolution
Chrome not foundInstall Chrome or set
URL_CHROME_PATH
env
Page timeoutIncrease
--timeout
value
Capture failedTry wait mode for complex pages
Empty contentPage may need JS rendering time
错误解决方法
未找到Chrome安装Chrome或设置
URL_CHROME_PATH
环境变量
页面超时增大
--timeout
参数的值
捕获失败尝试对复杂页面使用等待模式
内容为空页面可能需要更多JavaScript渲染时间

Environment Variables

环境变量

VariableDescription
URL_CHROME_PATH
Custom Chrome executable path
URL_DATA_DIR
Custom data directory
URL_CHROME_PROFILE_DIR
Custom Chrome profile directory
变量说明
URL_CHROME_PATH
自定义Chrome可执行文件路径
URL_DATA_DIR
自定义数据目录
URL_CHROME_PROFILE_DIR
自定义Chrome配置文件目录

Extension Support

扩展支持

Custom configurations via EXTEND.md.
Check paths (priority order):
  1. .content-gen-skills/url-to-markdown/EXTEND.md
    (project)
  2. ~/.content-gen-skills/url-to-markdown/EXTEND.md
    (user)
If found, load before workflow. Extension content overrides defaults.
通过EXTEND.md进行自定义配置。
路径检查优先级
  1. .content-gen-skills/url-to-markdown/EXTEND.md
    (项目级)
  2. ~/.content-gen-skills/url-to-markdown/EXTEND.md
    (用户级)
如果找到该文件,将在工作流开始前加载。扩展内容将覆盖默认设置。