baoyu-url-to-markdown

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

URL to Markdown

URL转Markdown

Fetches any URL via

baoyu-fetch

CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.

通过

baoyu-fetch

CLI（Chrome CDP + 站点专属适配器）抓取任意URL并转换为整洁的markdown格式。

CLI Setup

CLI安装配置

Important: The CLI source is vendored in the

scripts/vendor/baoyu-fetch/

subdirectory of this skill.

Agent Execution Instructions:

Determine this SKILL.md file's directory path as
```
{baseDir}
```

CLI entry point =

{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts

Resolve
```
${BUN_X}
```
runtime: if
```
bun
```
installed →
```
bun
```
; if
```
npx
```
available →
```
npx -y bun
```
; else suggest installing bun

${READER}

${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts

Replace all
```
${READER}
```
in this document with the resolved value

重要提示：CLI源码存放于本skill的

scripts/vendor/baoyu-fetch/

子目录下。

Agent执行说明:

确定当前SKILL.md文件的目录路径为
```
{baseDir}
```

CLI入口文件 =

{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts

解析
```
${BUN_X}
```
运行时：如果已安装
```
bun
```
则使用
```
bun
```
；如果可用
```
npx
```
则使用
```
npx -y bun
```
；否则建议用户安装bun

${READER}

${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts

将本文档中所有
```
${READER}
```
替换为解析后的实际值

Preferences (EXTEND.md)

偏好配置 (EXTEND.md)

Check EXTEND.md existence (priority order):

bash

undefined

按优先级顺序检查EXTEND.md是否存在:

bash

undefined

macOS, Linux, WSL, Git Bash

test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project" test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg" test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"


```powershell


```powershell

PowerShell (Windows)

if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" } $xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" } if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" } if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }


| Path | Location |
|------|----------|
| `.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | Project directory |
| `$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | User home |

| Result | Action |
|--------|--------|
| Found | Read, parse, apply settings |
| Not found | **MUST** run first-time setup (see below) — do NOT silently create defaults |

**EXTEND.md Supports**: Download media by default | Default output directory


| 路径 | 位置 |
|------|----------|
| `.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | 项目目录 |
| `$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | 用户根目录 |

| 结果 | 操作 |
|--------|--------|
| 已找到 | 读取、解析并应用配置 |
| 未找到 | **必须**执行首次配置（见下文）—— 请勿静默创建默认配置 |

**EXTEND.md支持配置项**：默认下载媒体资源 | 默认输出目录

First-Time Setup (BLOCKING)

首次配置（阻塞操作）

CRITICAL: When EXTEND.md is not found, you MUST use
AskUserQuestion
to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.

Use

AskUserQuestion

with ALL questions in ONE call:

Question 1 — header: "Media", question: "How to handle images and videos in pages?"

"Ask each time (Recommended)" — After saving markdown, ask whether to download media
"Always download" — Always download media to local imgs/ and videos/ directories
"Never download" — Keep original remote URLs in markdown

Question 2 — header: "Output", question: "Default output directory?"

"url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md
(User may choose "Other" to type a custom path)

Question 3 — header: "Save", question: "Where to save preferences?"

"User (Recommended)" — ~/.baoyu-skills/ (all projects)
"Project" — .baoyu-skills/ (this project only)

After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.

Full reference: references/config/first-time-setup.md

关键要求：未找到EXTEND.md时，你必须使用
AskUserQuestion
在创建EXTEND.md前询问用户偏好。绝对不要未经询问就创建带默认配置的EXTEND.md。这是阻塞操作——配置完成前不要执行任何转换操作。

调用

AskUserQuestion

时需将所有问题放在单次请求中:

问题1 — 标题: "媒体资源", 问题: "如何处理页面中的图片和视频?"

"每次询问（推荐）" — 保存markdown后询问是否下载媒体资源
"始终下载" — 始终将媒体资源下载到本地imgs/和videos/目录
"从不下载" — 在markdown中保留原始远程URL

问题2 — 标题: "输出路径", 问题: "默认输出目录是?"

"url-to-markdown（推荐）" — 保存到 ./url-to-markdown/{domain}/{slug}.md
（用户可选择「其他」输入自定义路径）

问题3 — 标题: "保存位置", 问题: "偏好配置保存到哪里?"

"用户全局（推荐）" — ~/.baoyu-skills/ （对所有项目生效）
"当前项目" — .baoyu-skills/ （仅对本项目生效）

用户回答后，在选择的位置创建EXTEND.md，提示「偏好配置已保存到[路径]」，再继续后续操作。

完整参考: references/config/first-time-setup.md

Supported Keys

支持的配置项

Key	Default	Values	Description
`download_media`	`ask`	`ask` / `1` / `0`	`ask` = prompt each time, `1` = always download, `0` = never
`default_output_dir`	empty	path or empty	Default output directory (empty = `./url-to-markdown/` )

EXTEND.md → CLI mapping:

EXTEND.md key	CLI argument	Notes
`download_media: 1`	`--download-media`	Requires `--output` to be set
`default_output_dir: ./posts/`	Agent constructs `--output ./posts/{domain}/{slug}.md`	Agent generates path, not a direct CLI flag

Value priority:

CLI arguments (
```
--download-media
```
,
```
--output
```
)
EXTEND.md
Skill defaults

配置键	默认值	可选值	描述
`download_media`	`ask`	`ask` / `1` / `0`	`ask` = 每次操作前询问, `1` = 始终下载, `0` = 从不下载
`default_output_dir`	空	路径或空	默认输出目录（空 = 默认为 `./url-to-markdown/` ）

EXTEND.md → CLI参数映射:

EXTEND.md配置键	CLI参数	说明
`download_media: 1`	`--download-media`	需要同时设置 `--output`
`default_output_dir: ./posts/`	Agent自动构建 `--output ./posts/{domain}/{slug}.md`	由Agent生成路径，不是直接传入CLI的参数

配置优先级:

CLI参数 (
```
--download-media
```
,
```
--output
```
)
EXTEND.md配置
Skill默认值

Features

功能特性

Chrome CDP for full JavaScript rendering via
```
baoyu-fetch
```
CLI
Site-specific adapters: X/Twitter, YouTube, Hacker News, generic (Defuddle)
Automatic adapter selection based on URL, or force with
```
--adapter
```
Interaction gate detection: Cloudflare, reCAPTCHA, hCAPTCHA, custom challenges
Two capture modes: headless (default) or interactive with wait-for-interaction
Clean markdown output with YAML front matter
Structured JSON output available via
```
--format json
```
X/Twitter: extracts tweets, threads, and X Articles with media
YouTube: transcript/caption extraction, chapters, cover images
Hacker News: threaded comment parsing with proper nesting
Generic: Defuddle extraction with Readability fallback
Download images and videos to local directories
Chrome profile persistence for authenticated sessions
Debug artifact output for troubleshooting

基于Chrome CDP实现完整JavaScript渲染，由
```
baoyu-fetch
```
CLI提供支持
站点专属适配器：X/Twitter、YouTube、Hacker News、通用站点（Defuddle）
基于URL自动选择适配器，也可通过
```
--adapter
```
强制指定
交互门槛检测：Cloudflare、reCAPTCHA、hCAPTCHA、自定义验证
两种捕获模式：无头模式（默认）或带交互等待的可视化模式
整洁的markdown输出，自带YAML front matter
可通过
```
--format json
```
输出结构化JSON结果
X/Twitter：提取推文、讨论线程、X专栏文章及关联媒体资源
YouTube：提取字幕/文案、章节信息、封面图
Hacker News：带层级嵌套的评论线程解析
通用站点：Defuddle提取，兜底使用Readability方案
支持将图片和视频下载到本地目录
Chrome配置文件持久化，支持已登录会话抓取
调试产物输出，方便排查问题

Usage

使用方法

bash

undefined

bash

undefined

Default: headless capture, markdown to stdout

默认：无头模式捕获，markdown输出到标准输出

${READER} <url>

Save to file

保存到文件

${READER} <url> --output article.md

Save with media download

保存同时下载媒体资源

${READER} <url> --output article.md --download-media

Headless mode (explicit)

显式指定无头模式

${READER} <url> --headless --output article.md

Wait for interaction (login/CAPTCHA) — auto-detect and continue

等待交互（登录/CAPTCHA）—— 自动检测完成后继续

${READER} <url> --wait-for interaction --output article.md

Wait for interaction — manual control (Enter to continue)

等待交互——手动控制（按Enter继续）

${READER} <url> --wait-for force --output article.md

JSON output

JSON格式输出

${READER} <url> --format json --output article.json

Force specific adapter

强制使用指定适配器

${READER} <url> --adapter youtube --output transcript.md

Connect to existing Chrome

连接到已运行的Chrome实例

${READER} <url> --cdp-url http://localhost:9222 --output article.md

Debug artifacts

输出调试产物

${READER} <url> --output article.md --debug-dir ./debug/

undefined

${READER} <url> --output article.md --debug-dir ./debug/

undefined

Options

可选参数

Option	Description
`<url>`	URL to fetch
`--output <path>`	Output file path (default: stdout)
`--format <type>`	Output format: `markdown` (default) or `json`
`--json`	Shorthand for `--format json`
`--adapter <name>`	Force adapter: `x` , `youtube` , `hn` , or `generic` (default: auto-detect)
`--headless`	Force headless Chrome (no visible window)
`--wait-for <mode>`	Interaction wait mode: `none` (default), `interaction` , or `force`
`--wait-for-interaction`	Alias for `--wait-for interaction`
`--wait-for-login`	Alias for `--wait-for interaction`
`--timeout <ms>`	Page load timeout (default: 30000)
`--interaction-timeout <ms>`	Login/CAPTCHA wait timeout (default: 600000 = 10 min)
`--interaction-poll-interval <ms>`	Poll interval for interaction checks (default: 1500)
`--download-media`	Download images/videos to local `imgs/` and `videos/` , rewrite markdown links. Requires `--output`
`--media-dir <dir>`	Base directory for downloaded media (default: same as `--output` directory)
`--cdp-url <url>`	Reuse existing Chrome DevTools Protocol endpoint
`--browser-path <path>`	Custom Chrome/Chromium binary path
`--chrome-profile-dir <path>`	Chrome user data directory (default: `BAOYU_CHROME_PROFILE_DIR` env or `./baoyu-skills/chrome-profile` )
`--debug-dir <dir>`	Write debug artifacts (document.json, markdown.md, page.html, network.json)

参数	描述
`<url>`	要抓取的URL
`--output <path>`	输出文件路径（默认：标准输出）
`--format <type>`	输出格式： `markdown` （默认）或 `json`
`--json`	`--format json` 的简写
`--adapter <name>`	强制指定适配器： `x` , `youtube` , `hn` , 或 `generic` （默认：自动检测）
`--headless`	强制使用无头Chrome（无可视化窗口）
`--wait-for <mode>`	交互等待模式： `none` （默认）, `interaction` , 或 `force`
`--wait-for-interaction`	`--wait-for interaction` 的别名
`--wait-for-login`	`--wait-for interaction` 的别名
`--timeout <ms>`	页面加载超时时间（默认：30000）
`--interaction-timeout <ms>`	登录/CAPTCHA等待超时时间（默认：600000 = 10分钟）
`--interaction-poll-interval <ms>`	交互状态检测轮询间隔（默认：1500）
`--download-media`	将图片/视频下载到本地 `imgs/` 和 `videos/` 目录，重写markdown中的链接。需要同时设置 `--output`
`--media-dir <dir>`	下载媒体资源的根目录（默认：与 `--output` 目录相同）
`--cdp-url <url>`	复用已有的Chrome DevTools Protocol端点
`--browser-path <path>`	自定义Chrome/Chromium二进制文件路径
`--chrome-profile-dir <path>`	Chrome用户数据目录（默认： `BAOYU_CHROME_PROFILE_DIR` 环境变量或 `./baoyu-skills/chrome-profile` ）
`--debug-dir <dir>`	输出调试产物（document.json, markdown.md, page.html, network.json）

Capture Modes

捕获模式

Mode	Behavior	Use When
Default	Headless Chrome, auto-extract on network idle	Public pages, static content
`--headless`	Explicit headless (same as default)	Clarify intent
`--wait-for interaction`	Opens visible Chrome, auto-detects login/CAPTCHA gates, waits for them to clear, then continues	Login-required, CAPTCHA-protected
`--wait-for force`	Opens visible Chrome, auto-detects OR accepts Enter keypress to continue	Complex flows, lazy loading, paywalls

Interaction gate auto-detection:

Cloudflare Turnstile / "just a moment" pages
Google reCAPTCHA
hCaptcha
Custom challenge / verification screens

Wait-for-interaction workflow:

Run with
```
--wait-for interaction
```
→ Chrome opens visibly
CLI auto-detects login/CAPTCHA gates
User completes login or solves CAPTCHA in the browser
CLI auto-detects gate cleared → captures page
If
```
--wait-for force
```
is used, user can also press Enter to trigger capture manually

模式	行为	适用场景
默认	无头Chrome，网络空闲时自动提取内容	公开页面、静态内容
`--headless`	显式指定无头模式（与默认行为一致）	明确声明执行意图
`--wait-for interaction`	打开可视化Chrome窗口，自动检测登录/CAPTCHA门槛，等待验证完成后继续	需要登录、CAPTCHA保护的页面
`--wait-for force`	打开可视化Chrome窗口，自动检测或等待用户按Enter触发后续流程	复杂交互流程、懒加载页面、付费墙

交互门槛自动检测支持:

Cloudflare Turnstile / "请稍候"页面
Google reCAPTCHA
hCAPTCHA
自定义挑战/验证页面

交互等待工作流:

携带
```
--wait-for interaction
```
运行 → Chrome可视化窗口打开
CLI自动检测登录/CAPTCHA门槛
用户在浏览器中完成登录或完成CAPTCHA验证
CLI自动检测到门槛已清除 → 捕获页面内容
如果使用
```
--wait-for force
```
，用户也可以按Enter手动触发捕获

Agent Quality Gate

Agent质量检测门槛

CRITICAL: The agent must treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without causing the CLI to fail.

After every headless run, the agent MUST inspect the saved markdown output.

关键要求：Agent必须将默认无头模式的捕获结果视为临时结果。部分站点在无头模式下渲染结果不同，可能静默返回低质量内容但不会导致CLI执行失败。

每次无头模式运行后，Agent必须检查保存的markdown输出内容。

Quality checks the agent must perform

Agent必须执行的质量检查

Confirm the markdown title matches the target page, not a generic site shell
Confirm the body contains the expected article or page content, not just navigation, footer, or a generic error
Watch for obvious failure signs:
- ```
Application error
```
- ```
This page could not be found
```
- Login, signup, subscribe, or verification shells
- Extremely short markdown for a page that should be long-form
- Raw framework payloads or mostly boilerplate content
If the result is low quality, incomplete, or clearly wrong, do not accept the run as successful just because the CLI exited with code 0

Tip: Use

--format json

to get structured output including

status

login.state

, and

interaction

fields for programmatic quality assessment. A

"status": "needs_interaction"

response means the page requires manual interaction.

确认markdown标题与目标页面匹配，不是通用的站点框架标题
确认正文包含预期的文章或页面内容，而不只是导航、页脚或通用错误提示
注意明显的失败特征:
- ```
Application error
```
  （应用错误）
- ```
This page could not be found
```
  （页面未找到）
- 登录、注册、订阅或验证引导页面
- 本该是长内容的页面输出极短的markdown
- 原始框架 payload 或大部分是模板内容
如果结果质量低、不完整或明显错误，不要仅因为CLI退出码为0就认为运行成功

提示：使用

--format json

可以获得包含

status

、

login.state

、

interaction

字段的结构化输出，方便程序化质量评估。返回

"status": "needs_interaction"

表示页面需要手动交互。

Recovery workflow the agent must follow

Agent必须遵循的恢复工作流

First run headless (default) unless there is already a clear reason to use interaction mode
Review markdown quality immediately after the run
If the content is low quality or indicates login/CAPTCHA:
- ```
--wait-for interaction
```
  for auto-detected gates (login, CAPTCHA, Cloudflare)
- ```
--wait-for force
```
  when the page needs manual browsing, scroll loading, or complex interaction
If
```
--wait-for
```
is used, tell the user exactly what to do:
- If login is required, ask them to sign in in the browser
- If CAPTCHA appears, ask them to solve it
- If the page needs time to load, ask them to wait until content is visible
- For
```
--wait-for force
```
  : tell them to press Enter when ready

If JSON output shows

"status": "needs_interaction"

, switch to

--wait-for interaction

automatically

首次运行默认使用无头模式，除非已经有明确理由使用交互模式
运行完成后立即检查markdown质量
如果内容质量低或提示需要登录/CAPTCHA:
- 对于可自动检测的门槛（登录、CAPTCHA、Cloudflare）使用
```
--wait-for interaction
```
- 当页面需要手动浏览、滚动加载或复杂交互时使用
```
--wait-for force
```
如果使用
```
--wait-for
```
参数，明确告知用户需要执行的操作:
- 如果需要登录，提示用户在浏览器中完成登录
- 如果出现CAPTCHA，提示用户完成验证
- 如果页面需要加载时间，提示用户等待内容显示完整
- 对于
```
--wait-for force
```
  模式：告知用户准备完成后按Enter

如果JSON输出显示

"status": "needs_interaction"

，自动切换到

--wait-for interaction

模式

Output Path Generation

输出路径生成规则

The agent must construct the output file path since

baoyu-fetch

does not auto-generate paths.

Algorithm:

Determine base directory from EXTEND.md
```
default_output_dir
```
or default
```
./url-to-markdown/
```
Extract domain from URL (e.g.,
```
example.com
```
)
Generate slug from URL path or page title (kebab-case, 2-6 words)
Construct:
```
{base_dir}/{domain}/{slug}/{slug}.md
```
— each URL gets its own directory so media files stay isolated

Conflict resolution: append timestamp

{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md

Pass the constructed path to

--output

. Media files (

--download-media

) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.

Agent必须自行构建输出文件路径，因为

baoyu-fetch

不会自动生成路径。

生成算法:

从EXTEND.md的
```
default_output_dir
```
获取基础目录，默认使用
```
./url-to-markdown/
```
从URL提取域名（例如
```
example.com
```
）
从URL路径或页面标题生成slug（短横线分隔格式，2-6个单词）
构建路径：
```
{base_dir}/{domain}/{slug}/{slug}.md
```
—— 每个URL对应独立目录，方便媒体资源隔离

冲突处理：追加时间戳

{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md

将构建好的路径传给

--output

参数。媒体文件（开启

--download-media

时）会保存到markdown文件同级的子目录中，保证每个URL的资源独立存储。

Output Format

输出格式

Markdown output to stdout (or file with

--output

) as clean markdown text.

JSON output (

--format json

) returns structured data including:

```
adapter
```
— which adapter handled the URL
```
status
```
—
```
"ok"
```
or
```
"needs_interaction"
```
```
login
```
— login state detection (
```
logged_in
```
,
```
logged_out
```
,
```
unknown
```
)
```
interaction
```
— interaction gate details (kind, provider, prompt)
```
document
```
— structured content (url, title, author, publishedAt, content blocks, metadata)
```
media
```
— collected media assets with url, kind, role
```
markdown
```
— converted markdown text
```
downloads
```
— media download results (when
```
--download-media
```
used)

When

--download-media

is enabled:

Images are saved to
```
imgs/
```
next to the output file (or in
```
--media-dir
```
)
Videos are saved to
```
videos/
```
next to the output file (or in
```
--media-dir
```
)
Markdown media links are rewritten to local relative paths

Markdown输出到标准输出（或通过

--output

指定的文件），为整洁的markdown文本。

JSON输出（

--format json

）返回结构化数据，包含:

```
adapter
```
—— 处理该URL的适配器
```
status
```
——
```
"ok"
```
或
```
"needs_interaction"
```
```
login
```
—— 登录状态检测结果（
```
logged_in
```
已登录,
```
logged_out
```
未登录,
```
unknown
```
未知）
```
interaction
```
—— 交互门槛详情（类型、提供商、提示）
```
document
```
—— 结构化内容（url、标题、作者、发布时间、内容块、元数据）
```
media
```
—— 采集到的媒体资源，包含url、类型、作用
```
markdown
```
—— 转换后的markdown文本
```
downloads
```
—— 媒体下载结果（开启
```
--download-media
```
时返回）

开启

--download-media

时:

图片保存到输出文件同级的
```
imgs/
```
目录（或
```
--media-dir
```
指定的目录）
视频保存到输出文件同级的
```
videos/
```
目录（或
```
--media-dir
```
指定的目录）
Markdown中的媒体链接会重写为本地相对路径

Built-in Adapters

内置适配器

Adapter	URLs	Key Features
`x`	x.com, twitter.com	Tweets, threads, X Articles, media, login detection
`youtube`	youtube.com, youtu.be	Transcript/captions, chapters, cover image, metadata
`hn`	news.ycombinator.com	Threaded comments, story metadata, nested replies
`generic`	Any URL (fallback)	Defuddle extraction, Readability fallback, auto-scroll, network idle detection

Adapter is auto-selected based on URL. Use

--adapter <name>

to override.

适配器	适配URL	核心功能
`x`	x.com, twitter.com	推文、讨论线程、X专栏文章、媒体资源、登录状态检测
`youtube`	youtube.com, youtu.be	字幕/文案、章节信息、封面图、元数据
`hn`	news.ycombinator.com	层级化评论、帖子元数据、嵌套回复
`generic`	任意URL（兜底适配）	Defuddle提取、Readability兜底、自动滚动、网络空闲检测

适配器会根据URL自动选择，可通过

--adapter <name>

强制指定。

Media Download Workflow

媒体下载工作流

Based on

download_media

setting in EXTEND.md:

Setting	Behavior
`1` (always)	Run CLI with `--download-media --output <path>`
`0` (never)	Run CLI with `--output <path>` (no media download)
`ask` (default)	Follow the ask-each-time flow below

基于EXTEND.md中的

download_media

配置执行:

配置值	行为
`1` （始终下载）	运行CLI时携带 `--download-media --output <path>` 参数
`0` （从不下载）	运行CLI时携带 `--output <path>` 参数（不下载媒体）
`ask` （默认）	遵循下文的每次询问流程

Ask-Each-Time Flow

每次询问流程

Run CLI without
```
--download-media
```
with
```
--output <path>
```
→ markdown saved
Check saved markdown for remote media URLs (
```
https://
```
in image/video links)
If no remote media found → done, no prompt needed
If remote media found → use
```
AskUserQuestion
```
:
- header: "Media", question: "Download N images/videos to local files?"
- "Yes" — Download to local directories
- "No" — Keep remote URLs
If user confirms → run CLI again with
```
--download-media --output <same-path>
```
(overwrites markdown with localized links)

运行CLI不携带
```
--download-media
```
参数，携带
```
--output <path>
```
→ 保存markdown
检查保存的markdown中是否存在远程媒体URL（图片/视频链接中包含
```
https://
```
）
如果未找到远程媒体 → 流程结束，无需提示
如果找到远程媒体 → 调用
```
AskUserQuestion
```
:
- 标题: "媒体资源", 问题: "是否下载N个图片/视频到本地?"
- "是" —— 下载到本地目录
- "否" —— 保留远程URL
如果用户确认 → 再次运行CLI携带
```
--download-media --output <相同路径>
```
（覆盖markdown，将链接替换为本地路径）

Environment Variables

环境变量

Variable	Description
`BAOYU_CHROME_PROFILE_DIR`	Chrome user data directory (can also use `--chrome-profile-dir` )

Troubleshooting: Chrome not found → use

--browser-path

. Timeout → increase

--timeout

. Login/CAPTCHA pages → use

--wait-for interaction

. Debug → use

--debug-dir

to inspect captured HTML and network logs.

变量名	描述
`BAOYU_CHROME_PROFILE_DIR`	Chrome用户数据目录（也可通过 `--chrome-profile-dir` 指定）

故障排查：找不到Chrome → 使用

--browser-path

指定路径。超时 → 调大

--timeout

参数。登录/CAPTCHA页面 → 使用

--wait-for interaction

。调试 → 使用

--debug-dir

检查捕获的HTML和网络日志。

YouTube Notes

YouTube使用注意

YouTube adapter extracts transcripts/captions automatically when available
Transcript format:
```
[MM:SS] Text segment
```
with chapter headings
Transcript availability depends on YouTube exposing a caption track. Videos with captions disabled or restricted playback may produce description-only output
Use
```
--wait-for force
```
if the page needs time to finish loading player metadata

可用时YouTube适配器会自动提取字幕/文案
字幕格式：
```
[MM:SS] 文本片段
```
，附带章节标题
字幕可用性取决于YouTube是否开放字幕轨道。关闭字幕或播放受限的视频可能仅返回描述内容
如果页面需要时间加载播放器元数据，使用
```
--wait-for force
```

X/Twitter Notes

X/Twitter使用注意

Extracts single tweets, threads, and X Articles
Auto-detects login state; if logged out and content requires auth, JSON output will show
```
"status": "needs_interaction"
```
Use
```
--wait-for interaction
```
for login-protected content

支持提取单条推文、讨论线程、X专栏文章
自动检测登录状态；如果未登录且内容需要权限，JSON输出会返回
```
"status": "needs_interaction"
```
对于登录保护的内容使用
```
--wait-for interaction
```

Hacker News Notes

Hacker News使用注意

Parses threaded comments with proper nesting and reply hierarchy
Includes story metadata (title, URL, author, score, comment count)
Shows comment deletion/dead status

解析带正确层级和回复关系的嵌套评论
包含帖子元数据（标题、URL、作者、评分、评论数）
显示评论删除/失效状态

Extension Support

扩展支持

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

通过EXTEND.md实现自定义配置，路径和支持的选项见偏好配置章节。