url-to-markdown

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

URL To Markdown

URL 转 Markdown

Overview

概述

Use this skill to fetch a public URL, convert it to Markdown, and save the result as a timestamped Markdown file for later agent use.
Conversion priority is fixed:
  1. https://r.jina.ai/<url>
    (primary)
  2. https://markdown.new/
    (fallback only when
    r.jina.ai
    is unavailable)
This skill is execution-oriented. Prefer running the bundled script instead of manually recreating the workflow.
使用本技能可获取公开URL,将其转换为Markdown格式,并将结果保存为带时间戳的Markdown文件,供后续Agent使用。
转换优先级固定:
  1. https://r.jina.ai/<url>
    (首选)
  2. https://markdown.new/
    (仅当
    r.jina.ai
    不可用时作为降级方案)
本技能以执行为导向,优先运行附带的脚本,而非手动重现工作流程。

When To Use

使用场景

Use this skill when the user asks for any of the following:
  • convert a URL or webpage to Markdown
  • save an article, doc page, or blog post as
    .md
  • ingest a public webpage for later summarization or extraction
  • preserve page content in a machine-friendly text format
  • pull a documentation page into a local Markdown file
Do not use this skill for:
  • private pages that require browser login
  • sites the user is not authorized to access
  • tasks that require full site crawling rather than a single page fetch
当用户提出以下需求时,可使用本技能:
  • 将URL或网页转换为Markdown格式
  • 将文章、文档页面或博客文章保存为
    .md
    文件
  • 获取公开网页内容,用于后续摘要提取或信息抽取
  • 以机器友好的文本格式保存页面内容
  • 将文档页面拉取到本地Markdown文件中
请勿在以下场景使用本技能:
  • 需要浏览器登录的私有页面
  • 用户无权访问的站点
  • 需要爬取整个站点而非单页内容的任务

Inputs

输入参数

Decide these inputs before running the script:
  • url
    : required; must be a public URL
  • method
    : optional; one of
    auto
    ,
    ai
    ,
    browser
    ; default
    auto
    ; used by
    markdown.new
    fallback
  • retain_images
    : optional; default
    false
    ; used by
    markdown.new
    fallback
  • transport
    : optional; one of
    auto
    ,
    get
    ,
    post
    ; default
    auto
    ; used by
    markdown.new
    fallback
  • timeout
    : optional; default
    30
  • force_markdown_new
    : optional; default
    false
    ; when
    true
    , skip
    r.jina.ai
    and call
    markdown.new
    directly
  • output
    : default
    ./output/
    (current directory +
    output/
    ); if the user explicitly provides an output path in the prompt, use that path instead
If the user does not specify these options, keep the defaults.
Output path rule:
  • Always pass
    --output
    when invoking
    url_to_md.py
    .
  • If user prompt explicitly specifies an output path, use that exact path.
  • Otherwise use
    --output "output/"
    (relative to current working directory).
运行脚本前需确定以下输入参数:
  • url
    : 必填项;必须为公开URL
  • method
    : 可选项;取值为
    auto
    ,
    ai
    ,
    browser
    ;默认值
    auto
    ;仅在使用
    markdown.new
    降级方案时生效
  • retain_images
    : 可选项;默认值
    false
    ;仅在使用
    markdown.new
    降级方案时生效
  • transport
    : 可选项;取值为
    auto
    ,
    get
    ,
    post
    ;默认值
    auto
    ;仅在使用
    markdown.new
    降级方案时生效
  • timeout
    : 可选项;默认值
    30
  • force_markdown_new
    : 可选项;默认值
    false
    ;设为
    true
    时,将跳过
    r.jina.ai
    ,直接调用
    markdown.new
  • output
    : 默认值
    ./output/
    (当前目录下的
    output/
    文件夹);如果用户在提示中明确指定输出路径,则使用该路径
若用户未指定上述选项,则使用默认值。
输出路径规则:
  • 调用
    url_to_md.py
    时必须传入
    --output
    参数
  • 如果用户提示中明确指定输出路径,则使用该精确路径
  • 否则使用
    --output "output/"
    (相对于当前工作目录)

Run The Script

运行脚本

From the skill directory, run:
bash
python scripts/url_to_md.py "<url>" --output "output/"
Common variants:
bash
python scripts/url_to_md.py "<url>" --output "output/"
python scripts/url_to_md.py "<url>" --method browser --retain-images --output "output/"
python scripts/url_to_md.py "<url>" --transport post --timeout 45 --output "output/"
python scripts/url_to_md.py "<url>" --force-markdown-new --output "output/"
python scripts/url_to_md.py "<url>" --output "<user_explicit_path>"
Behavior notes:
  • The script always attempts
    r.jina.ai
    first.
  • If
    --force-markdown-new
    is set, the script skips
    r.jina.ai
    and uses
    markdown.new
    directly.
  • It falls back to
    markdown.new
    only when
    r.jina.ai
    is unavailable (for example timeout, network failure, 5xx, or rate limit).
  • Skill-level default output directory is
    ./output/
    , and the invocation should always include
    --output
    .
  • If
    --output
    is a filename, the script appends a timestamp before the extension.
  • If
    --output
    is a directory, the script creates a slug-based filename with a timestamp.
从技能目录下运行:
bash
python scripts/url_to_md.py "<url>" --output "output/"
常见变体:
bash
python scripts/url_to_md.py "<url>" --output "output/"
python scripts/url_to_md.py "<url>" --method browser --retain-images --output "output/"
python scripts/url_to_md.py "<url>" --transport post --timeout 45 --output "output/"
python scripts/url_to_md.py "<url>" --force-markdown-new --output "output/"
python scripts/url_to_md.py "<url>" --output "<user_explicit_path>"
行为说明:
  • 脚本始终优先尝试使用
    r.jina.ai
  • 如果设置了
    --force-markdown-new
    ,脚本将跳过
    r.jina.ai
    ,直接使用
    markdown.new
  • 仅当
    r.jina.ai
    不可用时(例如超时、网络故障、5xx错误或触发速率限制),才会降级使用
    markdown.new
  • 技能级别的默认输出目录为
    ./output/
    ,调用时必须包含
    --output
    参数
  • 如果
    --output
    是文件名,脚本会在扩展名前添加时间戳
  • 如果
    --output
    是目录,脚本会生成一个基于URL slug且带时间戳的文件名

Required Output Behavior

要求的输出行为

Prefer producing both:
  1. A saved Markdown file.
  2. A short conversational summary.
The summary should include:
  • source URL
  • whether the conversion succeeded
  • provider used:
    r.jina.ai
    or
    markdown.new
  • saved file path, if a file was written
  • key options used if non-default:
    method
    ,
    retain_images
    ,
    transport
    ,
    timeout
优先生成以下两种输出:
  1. 保存的Markdown文件
  2. 简短的对话式摘要
摘要应包含:
  • 源URL
  • 转换是否成功
  • 使用的服务提供商:
    r.jina.ai
    markdown.new
  • 保存的文件路径(如果已写入文件)
  • 若使用了非默认选项,需列出关键选项:
    method
    ,
    retain_images
    ,
    transport
    ,
    timeout

Summary Template

摘要模板

Use this structure:
text
Source URL: <url>
Status: success
Provider: <r.jina.ai|markdown.new>
Saved Markdown: <path>
Options: method=<value>, retain_images=<value>, transport=<value>, timeout=<value>
If defaults were used, keep
Options
brief.
使用以下结构:
text
源URL: <url>
状态: 成功
服务提供商: <r.jina.ai|markdown.new>
已保存的Markdown文件: <path>
选项: method=<值>, retain_images=<值>, transport=<值>, timeout=<值>
如果使用的是默认选项,可简化
选项
部分。

Error Handling

错误处理

If the script fails:
  • say that URL-to-Markdown conversion failed
  • include the main error briefly
  • do not invent page content
  • mention likely cause when obvious: network issue, timeout, rate limit, unsupported page access
If both providers fail, report which provider failed first and which provider failed last. If the service returns rate limiting, report that directly and avoid pretending a retry succeeded.
如果脚本执行失败:
  • 说明URL转Markdown格式失败
  • 简要说明主要错误信息
  • 切勿编造页面内容
  • 若原因明显,需提及可能的原因:网络问题、超时、速率限制、页面访问不被支持
如果两个服务提供商均失败,需报告先失败的提供商和最后失败的提供商。如果服务返回速率限制,需直接说明,切勿假装重试成功。

Notes

注意事项

  • Prefer saved Markdown over raw stdout because agents can reuse local files more reliably.
  • The bundled script uses only the Python standard library.
  • The script supports both importable usage and CLI execution, but this skill should normally use the CLI path.
  • 优先保存Markdown文件,而非直接输出到标准输出,因为Agent更可靠地复用本地文件
  • 附带的脚本仅使用Python标准库
  • 脚本支持导入使用和CLI执行,但本技能通常应使用CLI方式