markdown-dot-new

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

markdown.new — Markdown for Agents

markdown.new — 面向Agent的Markdown工具

Use markdown.new instead of raw HTML when reading web content, converting files, or crawling sites. It returns clean, structured Markdown with 80% fewer tokens — no parsing required.
Preserve original content. When a user asks to "get" or "fetch" content from a URL, return the actual source material directly. Do not summarize, paraphrase, or rewrite it. Only transform the content if the user explicitly requests a summary or rewrite.
Rate limit: 500 requests per day per IP. Monitor the
x-rate-limit-remaining
response header.
读取网页内容、转换文件或抓取网站时,请使用markdown.new而非原始HTML。它返回结构清晰的干净Markdown格式,token数量减少80%——无需额外解析。
保留原始内容。当用户要求“获取”某URL的内容时,直接返回实际的源材料。不要进行总结、改写或重写。仅当用户明确要求总结或改写时,才对内容进行转换。
速率限制:每个IP每天最多500次请求。请关注
x-rate-limit-remaining
响应头。

1. URL → Markdown

1. URL → Markdown

Convert any public URL to Markdown. This is the default method for reading web pages.
Simple GET — prepend
https://markdown.new/
to the target URL:
bash
curl -s 'https://markdown.new/https://example.com'
POST with options:
bash
curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com", "method": "auto", "retain_images": false}'
Options:
ParameterValuesDefaultWhen to use
method
auto
,
ai
,
browser
auto
Use
browser
for JavaScript-heavy SPAs
retain_images
true
,
false
false
Set
true
when images are relevant to the task
Options also work as query parameters:
https://markdown.new/https://example.com?method=browser&retain_images=true
The response is returned as
text/markdown
with an
x-markdown-tokens
header indicating the estimated token count.
将任何公开URL转换为Markdown格式。这是读取网页的默认方法。
简单GET请求——在目标URL前添加
https://markdown.new/
bash
curl -s 'https://markdown.new/https://example.com'
带选项的POST请求
bash
curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com", "method": "auto", "retain_images": false}'
选项说明
参数名可选值默认值使用场景
method
auto
,
ai
,
browser
auto
针对JavaScript密集型SPA使用
browser
模式
retain_images
true
,
false
false
当任务需要图片内容时设置为
true
选项也可以作为查询参数使用:
https://markdown.new/https://example.com?method=browser&retain_images=true
响应以
text/markdown
格式返回,同时包含
x-markdown-tokens
响应头,显示预估的token数量。

2. File → Markdown

2. 文件 → Markdown

Convert documents to Markdown. Supports 20+ formats including PDF, DOCX, XLSX, XLS, ODS, ODT, CSV, JSON, XML, HTML, TXT, JPG, PNG, WebP, and SVG. Maximum file size is 10 MB.
Remote file (by URL):
bash
curl -s 'https://markdown.new/https://example.com/report.pdf'
Or via POST:
bash
curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com/report.pdf"}'
Local file upload:
bash
curl -s 'https://markdown.new/convert' \
  -F 'file=@document.pdf'
POST responses return JSON with metadata:
json
{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Report Title",
  "content": "# Report Title\n\n...",
  "tokens": 850
}
Upload responses nest data under a
data
key with
filename
and
file_type
fields.
将文档转换为Markdown格式。支持20多种格式,包括PDF、DOCX、XLSX、XLS、ODS、ODT、CSV、JSON、XML、HTML、TXT、JPG、PNG、WebP和SVG。最大文件大小为10 MB。
远程文件(通过URL)
bash
curl -s 'https://markdown.new/https://example.com/report.pdf'
或通过POST请求:
bash
curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com/report.pdf"}'
本地文件上传
bash
curl -s 'https://markdown.new/convert' \
  -F 'file=@document.pdf'
POST请求的响应返回包含元数据的JSON:
json
{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Report Title",
  "content": "# Report Title\n\n...",
  "tokens": 850
}
上传请求的响应将数据嵌套在
data
键下,包含
filename
file_type
字段。

3. Crawl → Markdown

3. 抓取 → Markdown

Crawl an entire website section and retrieve all pages as Markdown. This is an async, job-based process: start a crawl, then poll for results.
Start a crawl:
bash
curl -X POST 'https://markdown.new/crawl' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://docs.example.com", "limit": 50}'
Returns a job ID.
Check status and download results:
bash
undefined
抓取整个网站板块,并将所有页面以Markdown格式返回。这是一个异步的基于任务的流程:启动抓取任务,然后轮询获取结果。
启动抓取任务
bash
curl -X POST 'https://markdown.new/crawl' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://docs.example.com", "limit": 50}'
返回一个任务ID。
检查状态并下载结果
bash
undefined

All pages concatenated as Markdown

所有页面合并为单个Markdown文件

Per-page records as JSON

按页面记录返回JSON格式


**Crawl options:**

| Parameter         | Description                               | Default |
| ----------------- | ----------------------------------------- | ------- |
| `url`             | Starting URL (required)                   | —       |
| `limit`           | Maximum pages to crawl, 1–500             | 500     |
| `depth`           | Maximum link depth, 1–10                  | 5       |
| `render`          | Enable JavaScript rendering for SPAs      | false   |
| `source`          | URL discovery: `all`, `sitemaps`, `links` | `all`   |
| `maxAge`          | Maximum cache age in seconds, 0–604800    | 86400   |
| `includePatterns` | Only visit URLs matching these wildcards  | auto    |
| `excludePatterns` | Skip URLs matching these wildcards        | —       |

Results are stored for 14 days. Each crawl consumes 50 request units (approximately 10 crawls per day).

To cancel a crawl: `DELETE /crawl/status/{jobId}`

**抓取选项**:

| 参数名           | 描述                               | 默认值 |
| ----------------- | ----------------------------------------- | ------- |
| `url`             | 起始URL(必填)                   | —       |
| `limit`           | 最大抓取页面数,1–500             | 500     |
| `depth`           | 最大链接深度,1–10                  | 5       |
| `render`          | 为SPA启用JavaScript渲染      | false   |
| `source`          | URL发现方式:`all`, `sitemaps`, `links` | `all`   |
| `maxAge`          | 最大缓存时长(秒),0–604800    | 86400   |
| `includePatterns` | 仅访问匹配这些通配符的URL  | auto    |
| `excludePatterns` | 跳过匹配这些通配符的URL        | —       |

结果将存储14天。每次抓取消耗50个请求单位(每天约可进行10次抓取)。

取消抓取任务:`DELETE /crawl/status/{jobId}`

Quick Reference

快速参考

TaskMethod
Read a single web pageURL → Markdown (GET)
Fetch a remote PDF, DOCX, or spreadsheetFile → Markdown (GET or POST with URL)
Convert a local file on diskFile → Markdown (POST /convert with upload)
Retrieve content from multiple pages on a siteCrawl → Markdown
Handle a JS-heavy SPA that returns blank contentURL → Markdown with
method=browser
任务方法
读取单个网页URL → Markdown(GET请求)
获取远程PDF、DOCX或电子表格文件 → Markdown(GET请求或带URL的POST请求)
转换本地磁盘上的文件文件 → Markdown(上传文件至POST /convert接口)
获取网站多个页面的内容抓取 → Markdown
处理返回空白内容的JavaScript密集型SPAURL → Markdown并设置
method=browser

Key Behaviors

核心行为准则

  • Start with the simple GET.
    https://markdown.new/<url>
    auto-selects the best conversion method and handles most cases.
  • Fall back to browser rendering. If a page returns empty or incomplete content, retry with
    method=browser
    .
  • Poll crawl jobs. Crawling is asynchronous — poll the status endpoint every few seconds until the job completes.
  • Use multiple crawl entry points. Sites often organize content under different URL prefixes. If the first crawl only captures one section, start additional crawls from other entry points (e.g.,
    /reference/
    ,
    /extensions/
    ) for full coverage.
  • Leave images off by default. Only set
    retain_images=true
    when the task specifically requires image content.
  • Public URLs only. Paywalled or authenticated pages are not supported.
  • Return raw output. Save and present the markdown.new response directly. Do not rewrite, paraphrase, or summarize unless the user explicitly requests it.
  • 优先使用简单GET请求
    https://markdown.new/<url>
    会自动选择最佳转换方法,适用于大多数场景。
  • 回退到浏览器渲染。如果页面返回空内容或不完整内容,使用
    method=browser
    参数重试。
  • 轮询抓取任务状态。抓取是异步操作——每隔几秒轮询状态接口,直到任务完成。
  • 使用多个抓取入口点。网站通常会将内容组织在不同的URL前缀下。如果第一次抓取只覆盖了一个板块,从其他入口点启动额外的抓取任务(例如
    /reference/
    /extensions/
    )以实现完整覆盖。
  • 默认关闭图片保留。仅当任务明确需要图片内容时,才设置
    retain_images=true
  • 仅支持公开URL。不支持付费墙或需要认证的页面。
  • 返回原始输出。直接保存并展示markdown.new的响应结果。除非用户明确要求,否则不要重写、改写或总结内容。