baoyu-url-to-markdown
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseURL to Markdown
URL转Markdown
Fetches any URL via CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.
baoyu-fetch通过 CLI(Chrome CDP + 站点专属适配器)抓取任意URL并转换为整洁的markdown格式。
baoyu-fetchCLI Setup
CLI安装配置
Important: The CLI source is vendored in the subdirectory of this skill.
scripts/vendor/baoyu-fetch/Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
{baseDir} - CLI entry point =
{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts - Resolve runtime: if
${BUN_X}installed →bun; ifbunavailable →npx; else suggest installing bunnpx -y bun - =
${READER}${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts - Replace all in this document with the resolved value
${READER}
重要提示:CLI源码存放于本skill的子目录下。
scripts/vendor/baoyu-fetch/Agent执行说明:
- 确定当前SKILL.md文件的目录路径为
{baseDir} - CLI入口文件 =
{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts - 解析运行时:如果已安装
${BUN_X}则使用bun;如果可用bun则使用npx;否则建议用户安装bunnpx -y bun - =
${READER}${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts - 将本文档中所有替换为解析后的实际值
${READER}
Preferences (EXTEND.md)
偏好配置 (EXTEND.md)
Check EXTEND.md existence (priority order):
bash
undefined按优先级顺序检查EXTEND.md是否存在:
bash
undefinedmacOS, Linux, WSL, Git Bash
macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
```powershelltest -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
```powershellPowerShell (Windows)
PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }
| Path | Location |
|------|----------|
| `.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | Project directory |
| `$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | User home |
| Result | Action |
|--------|--------|
| Found | Read, parse, apply settings |
| Not found | **MUST** run first-time setup (see below) — do NOT silently create defaults |
**EXTEND.md Supports**: Download media by default | Default output directoryif (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }
| 路径 | 位置 |
|------|----------|
| `.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | 项目目录 |
| `$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | 用户根目录 |
| 结果 | 操作 |
|--------|--------|
| 已找到 | 读取、解析并应用配置 |
| 未找到 | **必须**执行首次配置(见下文)—— 请勿静默创建默认配置 |
**EXTEND.md支持配置项**:默认下载媒体资源 | 默认输出目录First-Time Setup (BLOCKING)
首次配置(阻塞操作)
CRITICAL: When EXTEND.md is not found, you MUST use to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.
AskUserQuestionUse with ALL questions in ONE call:
AskUserQuestionQuestion 1 — header: "Media", question: "How to handle images and videos in pages?"
- "Ask each time (Recommended)" — After saving markdown, ask whether to download media
- "Always download" — Always download media to local imgs/ and videos/ directories
- "Never download" — Keep original remote URLs in markdown
Question 2 — header: "Output", question: "Default output directory?"
- "url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md
- (User may choose "Other" to type a custom path)
Question 3 — header: "Save", question: "Where to save preferences?"
- "User (Recommended)" — ~/.baoyu-skills/ (all projects)
- "Project" — .baoyu-skills/ (this project only)
After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.
Full reference: references/config/first-time-setup.md
关键要求:未找到EXTEND.md时,你必须使用在创建EXTEND.md前询问用户偏好。绝对不要未经询问就创建带默认配置的EXTEND.md。这是阻塞操作——配置完成前不要执行任何转换操作。
AskUserQuestion调用时需将所有问题放在单次请求中:
AskUserQuestion问题1 — 标题: "媒体资源", 问题: "如何处理页面中的图片和视频?"
- "每次询问(推荐)" — 保存markdown后询问是否下载媒体资源
- "始终下载" — 始终将媒体资源下载到本地imgs/和videos/目录
- "从不下载" — 在markdown中保留原始远程URL
问题2 — 标题: "输出路径", 问题: "默认输出目录是?"
- "url-to-markdown(推荐)" — 保存到 ./url-to-markdown/{domain}/{slug}.md
- (用户可选择「其他」输入自定义路径)
问题3 — 标题: "保存位置", 问题: "偏好配置保存到哪里?"
- "用户全局(推荐)" — ~/.baoyu-skills/ (对所有项目生效)
- "当前项目" — .baoyu-skills/ (仅对本项目生效)
用户回答后,在选择的位置创建EXTEND.md,提示「偏好配置已保存到[路径]」,再继续后续操作。
完整参考: references/config/first-time-setup.md
Supported Keys
支持的配置项
| Key | Default | Values | Description |
|---|---|---|---|
| | | |
| empty | path or empty | Default output directory (empty = |
EXTEND.md → CLI mapping:
| EXTEND.md key | CLI argument | Notes |
|---|---|---|
| | Requires |
| Agent constructs | Agent generates path, not a direct CLI flag |
Value priority:
- CLI arguments (,
--download-media)--output - EXTEND.md
- Skill defaults
| 配置键 | 默认值 | 可选值 | 描述 |
|---|---|---|---|
| | | |
| 空 | 路径或空 | 默认输出目录(空 = 默认为 |
EXTEND.md → CLI参数映射:
| EXTEND.md配置键 | CLI参数 | 说明 |
|---|---|---|
| | 需要同时设置 |
| Agent自动构建 | 由Agent生成路径,不是直接传入CLI的参数 |
配置优先级:
- CLI参数 (,
--download-media)--output - EXTEND.md配置
- Skill默认值
Features
功能特性
- Chrome CDP for full JavaScript rendering via CLI
baoyu-fetch - Site-specific adapters: X/Twitter, YouTube, Hacker News, generic (Defuddle)
- Automatic adapter selection based on URL, or force with
--adapter - Interaction gate detection: Cloudflare, reCAPTCHA, hCAPTCHA, custom challenges
- Two capture modes: headless (default) or interactive with wait-for-interaction
- Clean markdown output with YAML front matter
- Structured JSON output available via
--format json - X/Twitter: extracts tweets, threads, and X Articles with media
- YouTube: transcript/caption extraction, chapters, cover images
- Hacker News: threaded comment parsing with proper nesting
- Generic: Defuddle extraction with Readability fallback
- Download images and videos to local directories
- Chrome profile persistence for authenticated sessions
- Debug artifact output for troubleshooting
- 基于Chrome CDP实现完整JavaScript渲染,由CLI提供支持
baoyu-fetch - 站点专属适配器:X/Twitter、YouTube、Hacker News、通用站点(Defuddle)
- 基于URL自动选择适配器,也可通过强制指定
--adapter - 交互门槛检测:Cloudflare、reCAPTCHA、hCAPTCHA、自定义验证
- 两种捕获模式:无头模式(默认)或带交互等待的可视化模式
- 整洁的markdown输出,自带YAML front matter
- 可通过输出结构化JSON结果
--format json - X/Twitter:提取推文、讨论线程、X专栏文章及关联媒体资源
- YouTube:提取字幕/文案、章节信息、封面图
- Hacker News:带层级嵌套的评论线程解析
- 通用站点:Defuddle提取,兜底使用Readability方案
- 支持将图片和视频下载到本地目录
- Chrome配置文件持久化,支持已登录会话抓取
- 调试产物输出,方便排查问题
Usage
使用方法
bash
undefinedbash
undefinedDefault: headless capture, markdown to stdout
默认:无头模式捕获,markdown输出到标准输出
${READER} <url>
${READER} <url>
Save to file
保存到文件
${READER} <url> --output article.md
${READER} <url> --output article.md
Save with media download
保存同时下载媒体资源
${READER} <url> --output article.md --download-media
${READER} <url> --output article.md --download-media
Headless mode (explicit)
显式指定无头模式
${READER} <url> --headless --output article.md
${READER} <url> --headless --output article.md
Wait for interaction (login/CAPTCHA) — auto-detect and continue
等待交互(登录/CAPTCHA)—— 自动检测完成后继续
${READER} <url> --wait-for interaction --output article.md
${READER} <url> --wait-for interaction --output article.md
Wait for interaction — manual control (Enter to continue)
等待交互——手动控制(按Enter继续)
${READER} <url> --wait-for force --output article.md
${READER} <url> --wait-for force --output article.md
JSON output
JSON格式输出
${READER} <url> --format json --output article.json
${READER} <url> --format json --output article.json
Force specific adapter
强制使用指定适配器
${READER} <url> --adapter youtube --output transcript.md
${READER} <url> --adapter youtube --output transcript.md
Connect to existing Chrome
连接到已运行的Chrome实例
${READER} <url> --cdp-url http://localhost:9222 --output article.md
${READER} <url> --cdp-url http://localhost:9222 --output article.md
Debug artifacts
输出调试产物
${READER} <url> --output article.md --debug-dir ./debug/
undefined${READER} <url> --output article.md --debug-dir ./debug/
undefinedOptions
可选参数
| Option | Description |
|---|---|
| URL to fetch |
| Output file path (default: stdout) |
| Output format: |
| Shorthand for |
| Force adapter: |
| Force headless Chrome (no visible window) |
| Interaction wait mode: |
| Alias for |
| Alias for |
| Page load timeout (default: 30000) |
| Login/CAPTCHA wait timeout (default: 600000 = 10 min) |
| Poll interval for interaction checks (default: 1500) |
| Download images/videos to local |
| Base directory for downloaded media (default: same as |
| Reuse existing Chrome DevTools Protocol endpoint |
| Custom Chrome/Chromium binary path |
| Chrome user data directory (default: |
| Write debug artifacts (document.json, markdown.md, page.html, network.json) |
| 参数 | 描述 |
|---|---|
| 要抓取的URL |
| 输出文件路径(默认:标准输出) |
| 输出格式: |
| |
| 强制指定适配器: |
| 强制使用无头Chrome(无可视化窗口) |
| 交互等待模式: |
| |
| |
| 页面加载超时时间(默认:30000) |
| 登录/CAPTCHA等待超时时间(默认:600000 = 10分钟) |
| 交互状态检测轮询间隔(默认:1500) |
| 将图片/视频下载到本地 |
| 下载媒体资源的根目录(默认:与 |
| 复用已有的Chrome DevTools Protocol端点 |
| 自定义Chrome/Chromium二进制文件路径 |
| Chrome用户数据目录(默认: |
| 输出调试产物(document.json, markdown.md, page.html, network.json) |
Capture Modes
捕获模式
| Mode | Behavior | Use When |
|---|---|---|
| Default | Headless Chrome, auto-extract on network idle | Public pages, static content |
| Explicit headless (same as default) | Clarify intent |
| Opens visible Chrome, auto-detects login/CAPTCHA gates, waits for them to clear, then continues | Login-required, CAPTCHA-protected |
| Opens visible Chrome, auto-detects OR accepts Enter keypress to continue | Complex flows, lazy loading, paywalls |
Interaction gate auto-detection:
- Cloudflare Turnstile / "just a moment" pages
- Google reCAPTCHA
- hCaptcha
- Custom challenge / verification screens
Wait-for-interaction workflow:
- Run with → Chrome opens visibly
--wait-for interaction - CLI auto-detects login/CAPTCHA gates
- User completes login or solves CAPTCHA in the browser
- CLI auto-detects gate cleared → captures page
- If is used, user can also press Enter to trigger capture manually
--wait-for force
| 模式 | 行为 | 适用场景 |
|---|---|---|
| 默认 | 无头Chrome,网络空闲时自动提取内容 | 公开页面、静态内容 |
| 显式指定无头模式(与默认行为一致) | 明确声明执行意图 |
| 打开可视化Chrome窗口,自动检测登录/CAPTCHA门槛,等待验证完成后继续 | 需要登录、CAPTCHA保护的页面 |
| 打开可视化Chrome窗口,自动检测或等待用户按Enter触发后续流程 | 复杂交互流程、懒加载页面、付费墙 |
交互门槛自动检测支持:
- Cloudflare Turnstile / "请稍候"页面
- Google reCAPTCHA
- hCAPTCHA
- 自定义挑战/验证页面
交互等待工作流:
- 携带运行 → Chrome可视化窗口打开
--wait-for interaction - CLI自动检测登录/CAPTCHA门槛
- 用户在浏览器中完成登录或完成CAPTCHA验证
- CLI自动检测到门槛已清除 → 捕获页面内容
- 如果使用,用户也可以按Enter手动触发捕获
--wait-for force
Agent Quality Gate
Agent质量检测门槛
CRITICAL: The agent must treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without causing the CLI to fail.
After every headless run, the agent MUST inspect the saved markdown output.
关键要求:Agent必须将默认无头模式的捕获结果视为临时结果。部分站点在无头模式下渲染结果不同,可能静默返回低质量内容但不会导致CLI执行失败。
每次无头模式运行后,Agent必须检查保存的markdown输出内容。
Quality checks the agent must perform
Agent必须执行的质量检查
- Confirm the markdown title matches the target page, not a generic site shell
- Confirm the body contains the expected article or page content, not just navigation, footer, or a generic error
- Watch for obvious failure signs:
Application errorThis page could not be found- Login, signup, subscribe, or verification shells
- Extremely short markdown for a page that should be long-form
- Raw framework payloads or mostly boilerplate content
- If the result is low quality, incomplete, or clearly wrong, do not accept the run as successful just because the CLI exited with code 0
Tip: Use to get structured output including , , and fields for programmatic quality assessment. A response means the page requires manual interaction.
--format jsonstatuslogin.stateinteraction"status": "needs_interaction"- 确认markdown标题与目标页面匹配,不是通用的站点框架标题
- 确认正文包含预期的文章或页面内容,而不只是导航、页脚或通用错误提示
- 注意明显的失败特征:
- (应用错误)
Application error - (页面未找到)
This page could not be found - 登录、注册、订阅或验证引导页面
- 本该是长内容的页面输出极短的markdown
- 原始框架 payload 或大部分是模板内容
- 如果结果质量低、不完整或明显错误,不要仅因为CLI退出码为0就认为运行成功
提示:使用可以获得包含、、字段的结构化输出,方便程序化质量评估。返回表示页面需要手动交互。
--format jsonstatuslogin.stateinteraction"status": "needs_interaction"Recovery workflow the agent must follow
Agent必须遵循的恢复工作流
- First run headless (default) unless there is already a clear reason to use interaction mode
- Review markdown quality immediately after the run
- If the content is low quality or indicates login/CAPTCHA:
- for auto-detected gates (login, CAPTCHA, Cloudflare)
--wait-for interaction - when the page needs manual browsing, scroll loading, or complex interaction
--wait-for force
- If is used, tell the user exactly what to do:
--wait-for- If login is required, ask them to sign in in the browser
- If CAPTCHA appears, ask them to solve it
- If the page needs time to load, ask them to wait until content is visible
- For : tell them to press Enter when ready
--wait-for force
- If JSON output shows , switch to
"status": "needs_interaction"automatically--wait-for interaction
- 首次运行默认使用无头模式,除非已经有明确理由使用交互模式
- 运行完成后立即检查markdown质量
- 如果内容质量低或提示需要登录/CAPTCHA:
- 对于可自动检测的门槛(登录、CAPTCHA、Cloudflare)使用
--wait-for interaction - 当页面需要手动浏览、滚动加载或复杂交互时使用
--wait-for force
- 对于可自动检测的门槛(登录、CAPTCHA、Cloudflare)使用
- 如果使用参数,明确告知用户需要执行的操作:
--wait-for- 如果需要登录,提示用户在浏览器中完成登录
- 如果出现CAPTCHA,提示用户完成验证
- 如果页面需要加载时间,提示用户等待内容显示完整
- 对于模式:告知用户准备完成后按Enter
--wait-for force
- 如果JSON输出显示,自动切换到
"status": "needs_interaction"模式--wait-for interaction
Output Path Generation
输出路径生成规则
The agent must construct the output file path since does not auto-generate paths.
baoyu-fetchAlgorithm:
- Determine base directory from EXTEND.md or default
default_output_dir./url-to-markdown/ - Extract domain from URL (e.g., )
example.com - Generate slug from URL path or page title (kebab-case, 2-6 words)
- Construct: — each URL gets its own directory so media files stay isolated
{base_dir}/{domain}/{slug}/{slug}.md - Conflict resolution: append timestamp
{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md
Pass the constructed path to . Media files () are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.
--output--download-mediaAgent必须自行构建输出文件路径,因为不会自动生成路径。
baoyu-fetch生成算法:
- 从EXTEND.md的获取基础目录,默认使用
default_output_dir./url-to-markdown/ - 从URL提取域名(例如)
example.com - 从URL路径或页面标题生成slug(短横线分隔格式,2-6个单词)
- 构建路径:—— 每个URL对应独立目录,方便媒体资源隔离
{base_dir}/{domain}/{slug}/{slug}.md - 冲突处理:追加时间戳
{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md
将构建好的路径传给参数。媒体文件(开启时)会保存到markdown文件同级的子目录中,保证每个URL的资源独立存储。
--output--download-mediaOutput Format
输出格式
Markdown output to stdout (or file with ) as clean markdown text.
--outputJSON output () returns structured data including:
--format json- — which adapter handled the URL
adapter - —
statusor"ok""needs_interaction" - — login state detection (
login,logged_in,logged_out)unknown - — interaction gate details (kind, provider, prompt)
interaction - — structured content (url, title, author, publishedAt, content blocks, metadata)
document - — collected media assets with url, kind, role
media - — converted markdown text
markdown - — media download results (when
downloadsused)--download-media
When is enabled:
--download-media- Images are saved to next to the output file (or in
imgs/)--media-dir - Videos are saved to next to the output file (or in
videos/)--media-dir - Markdown media links are rewritten to local relative paths
Markdown输出到标准输出(或通过指定的文件),为整洁的markdown文本。
--outputJSON输出()返回结构化数据,包含:
--format json- —— 处理该URL的适配器
adapter - ——
status或"ok""needs_interaction" - —— 登录状态检测结果(
login已登录,logged_in未登录,logged_out未知)unknown - —— 交互门槛详情(类型、提供商、提示)
interaction - —— 结构化内容(url、标题、作者、发布时间、内容块、元数据)
document - —— 采集到的媒体资源,包含url、类型、作用
media - —— 转换后的markdown文本
markdown - —— 媒体下载结果(开启
downloads时返回)--download-media
开启时:
--download-media- 图片保存到输出文件同级的目录(或
imgs/指定的目录)--media-dir - 视频保存到输出文件同级的目录(或
videos/指定的目录)--media-dir - Markdown中的媒体链接会重写为本地相对路径
Built-in Adapters
内置适配器
| Adapter | URLs | Key Features |
|---|---|---|
| x.com, twitter.com | Tweets, threads, X Articles, media, login detection |
| youtube.com, youtu.be | Transcript/captions, chapters, cover image, metadata |
| news.ycombinator.com | Threaded comments, story metadata, nested replies |
| Any URL (fallback) | Defuddle extraction, Readability fallback, auto-scroll, network idle detection |
Adapter is auto-selected based on URL. Use to override.
--adapter <name>| 适配器 | 适配URL | 核心功能 |
|---|---|---|
| x.com, twitter.com | 推文、讨论线程、X专栏文章、媒体资源、登录状态检测 |
| youtube.com, youtu.be | 字幕/文案、章节信息、封面图、元数据 |
| news.ycombinator.com | 层级化评论、帖子元数据、嵌套回复 |
| 任意URL(兜底适配) | Defuddle提取、Readability兜底、自动滚动、网络空闲检测 |
适配器会根据URL自动选择,可通过强制指定。
--adapter <name>Media Download Workflow
媒体下载工作流
Based on setting in EXTEND.md:
download_media| Setting | Behavior |
|---|---|
| Run CLI with |
| Run CLI with |
| Follow the ask-each-time flow below |
基于EXTEND.md中的配置执行:
download_media| 配置值 | 行为 |
|---|---|
| 运行CLI时携带 |
| 运行CLI时携带 |
| 遵循下文的每次询问流程 |
Ask-Each-Time Flow
每次询问流程
- Run CLI without with
--download-media→ markdown saved--output <path> - Check saved markdown for remote media URLs (in image/video links)
https:// - If no remote media found → done, no prompt needed
- If remote media found → use :
AskUserQuestion- header: "Media", question: "Download N images/videos to local files?"
- "Yes" — Download to local directories
- "No" — Keep remote URLs
- If user confirms → run CLI again with (overwrites markdown with localized links)
--download-media --output <same-path>
- 运行CLI不携带参数,携带
--download-media→ 保存markdown--output <path> - 检查保存的markdown中是否存在远程媒体URL(图片/视频链接中包含)
https:// - 如果未找到远程媒体 → 流程结束,无需提示
- 如果找到远程媒体 → 调用:
AskUserQuestion- 标题: "媒体资源", 问题: "是否下载N个图片/视频到本地?"
- "是" —— 下载到本地目录
- "否" —— 保留远程URL
- 如果用户确认 → 再次运行CLI携带(覆盖markdown,将链接替换为本地路径)
--download-media --output <相同路径>
Environment Variables
环境变量
| Variable | Description |
|---|---|
| Chrome user data directory (can also use |
Troubleshooting: Chrome not found → use . Timeout → increase . Login/CAPTCHA pages → use . Debug → use to inspect captured HTML and network logs.
--browser-path--timeout--wait-for interaction--debug-dir| 变量名 | 描述 |
|---|---|
| Chrome用户数据目录(也可通过 |
故障排查:找不到Chrome → 使用指定路径。超时 → 调大参数。登录/CAPTCHA页面 → 使用。调试 → 使用检查捕获的HTML和网络日志。
--browser-path--timeout--wait-for interaction--debug-dirYouTube Notes
YouTube使用注意
- YouTube adapter extracts transcripts/captions automatically when available
- Transcript format: with chapter headings
[MM:SS] Text segment - Transcript availability depends on YouTube exposing a caption track. Videos with captions disabled or restricted playback may produce description-only output
- Use if the page needs time to finish loading player metadata
--wait-for force
- 可用时YouTube适配器会自动提取字幕/文案
- 字幕格式:,附带章节标题
[MM:SS] 文本片段 - 字幕可用性取决于YouTube是否开放字幕轨道。关闭字幕或播放受限的视频可能仅返回描述内容
- 如果页面需要时间加载播放器元数据,使用
--wait-for force
X/Twitter Notes
X/Twitter使用注意
- Extracts single tweets, threads, and X Articles
- Auto-detects login state; if logged out and content requires auth, JSON output will show
"status": "needs_interaction" - Use for login-protected content
--wait-for interaction
- 支持提取单条推文、讨论线程、X专栏文章
- 自动检测登录状态;如果未登录且内容需要权限,JSON输出会返回
"status": "needs_interaction" - 对于登录保护的内容使用
--wait-for interaction
Hacker News Notes
Hacker News使用注意
- Parses threaded comments with proper nesting and reply hierarchy
- Includes story metadata (title, URL, author, score, comment count)
- Shows comment deletion/dead status
- 解析带正确层级和回复关系的嵌套评论
- 包含帖子元数据(标题、URL、作者、评分、评论数)
- 显示评论删除/失效状态
Extension Support
扩展支持
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
通过EXTEND.md实现自定义配置,路径和支持的选项见偏好配置章节。