markdown-dot-new

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

markdown.new — Markdown for Agents

markdown.new — 面向Agent的Markdown工具

Use markdown.new instead of raw HTML when reading web content, converting files, or crawling sites. It returns clean, structured Markdown with 80% fewer tokens — no parsing required.

Preserve original content. When a user asks to "get" or "fetch" content from a URL, return the actual source material directly. Do not summarize, paraphrase, or rewrite it. Only transform the content if the user explicitly requests a summary or rewrite.

Rate limit: 500 requests per day per IP. Monitor the

x-rate-limit-remaining

response header.

读取网页内容、转换文件或抓取网站时，请使用markdown.new而非原始HTML。它返回结构清晰的干净Markdown格式，token数量减少80%——无需额外解析。

保留原始内容。当用户要求“获取”某URL的内容时，直接返回实际的源材料。不要进行总结、改写或重写。仅当用户明确要求总结或改写时，才对内容进行转换。

速率限制：每个IP每天最多500次请求。请关注

x-rate-limit-remaining

响应头。

1. URL → Markdown

Convert any public URL to Markdown. This is the default method for reading web pages.

Simple GET — prepend

https://markdown.new/

to the target URL:

bash

curl -s 'https://markdown.new/https://example.com'

POST with options:

bash

curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com", "method": "auto", "retain_images": false}'

Options:

Parameter	Values	Default	When to use
`method`	`auto` , `ai` , `browser`	`auto`	Use `browser` for JavaScript-heavy SPAs
`retain_images`	`true` , `false`	`false`	Set `true` when images are relevant to the task

Options also work as query parameters:

https://markdown.new/https://example.com?method=browser&retain_images=true

The response is returned as

text/markdown

with an

x-markdown-tokens

header indicating the estimated token count.

将任何公开URL转换为Markdown格式。这是读取网页的默认方法。

简单GET请求——在目标URL前添加

https://markdown.new/

：

bash

curl -s 'https://markdown.new/https://example.com'

带选项的POST请求：

bash

curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com", "method": "auto", "retain_images": false}'

选项说明：

参数名	可选值	默认值	使用场景
`method`	`auto` , `ai` , `browser`	`auto`	针对JavaScript密集型SPA使用 `browser` 模式
`retain_images`	`true` , `false`	`false`	当任务需要图片内容时设置为 `true`

选项也可以作为查询参数使用：

https://markdown.new/https://example.com?method=browser&retain_images=true

响应以

text/markdown

格式返回，同时包含

x-markdown-tokens

响应头，显示预估的token数量。

2. File → Markdown

2. 文件 → Markdown

Convert documents to Markdown. Supports 20+ formats including PDF, DOCX, XLSX, XLS, ODS, ODT, CSV, JSON, XML, HTML, TXT, JPG, PNG, WebP, and SVG. Maximum file size is 10 MB.

Remote file (by URL):

bash

curl -s 'https://markdown.new/https://example.com/report.pdf'

Or via POST:

bash

curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com/report.pdf"}'

Local file upload:

bash

curl -s 'https://markdown.new/convert' \
  -F 'file=@document.pdf'

POST responses return JSON with metadata:

json

{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Report Title",
  "content": "# Report Title\n\n...",
  "tokens": 850
}

Upload responses nest data under a

data

key with

filename

and

file_type

fields.

将文档转换为Markdown格式。支持20多种格式，包括PDF、DOCX、XLSX、XLS、ODS、ODT、CSV、JSON、XML、HTML、TXT、JPG、PNG、WebP和SVG。最大文件大小为10 MB。

远程文件（通过URL）：

bash

curl -s 'https://markdown.new/https://example.com/report.pdf'

或通过POST请求：

bash

curl -s 'https://markdown.new/' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com/report.pdf"}'

本地文件上传：

bash

curl -s 'https://markdown.new/convert' \
  -F 'file=@document.pdf'

POST请求的响应返回包含元数据的JSON：

json

{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Report Title",
  "content": "# Report Title\n\n...",
  "tokens": 850
}

上传请求的响应将数据嵌套在

data

键下，包含

filename

和

file_type

字段。

3. Crawl → Markdown

3. 抓取 → Markdown

Crawl an entire website section and retrieve all pages as Markdown. This is an async, job-based process: start a crawl, then poll for results.

Start a crawl:

bash

curl -X POST 'https://markdown.new/crawl' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://docs.example.com", "limit": 50}'

Returns a job ID.

Check status and download results:

bash

undefined

抓取整个网站板块，并将所有页面以Markdown格式返回。这是一个异步的基于任务的流程：启动抓取任务，然后轮询获取结果。

启动抓取任务：

bash

curl -X POST 'https://markdown.new/crawl' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://docs.example.com", "limit": 50}'

返回一个任务ID。

检查状态并下载结果：

bash

undefined

All pages concatenated as Markdown

所有页面合并为单个Markdown文件

curl -s 'https://markdown.new/crawl/status/{jobId}'

Per-page records as JSON

按页面记录返回JSON格式

curl -s 'https://markdown.new/crawl/status/{jobId}?format=json'


**Crawl options:**

| Parameter         | Description                               | Default |
| ----------------- | ----------------------------------------- | ------- |
| `url`             | Starting URL (required)                   | —       |
| `limit`           | Maximum pages to crawl, 1–500             | 500     |
| `depth`           | Maximum link depth, 1–10                  | 5       |
| `render`          | Enable JavaScript rendering for SPAs      | false   |
| `source`          | URL discovery: `all`, `sitemaps`, `links` | `all`   |
| `maxAge`          | Maximum cache age in seconds, 0–604800    | 86400   |
| `includePatterns` | Only visit URLs matching these wildcards  | auto    |
| `excludePatterns` | Skip URLs matching these wildcards        | —       |

Results are stored for 14 days. Each crawl consumes 50 request units (approximately 10 crawls per day).

To cancel a crawl: `DELETE /crawl/status/{jobId}`

curl -s 'https://markdown.new/crawl/status/{jobId}?format=json'


**抓取选项**：

| 参数名           | 描述                               | 默认值 |
| ----------------- | ----------------------------------------- | ------- |
| `url`             | 起始URL（必填）                   | —       |
| `limit`           | 最大抓取页面数，1–500             | 500     |
| `depth`           | 最大链接深度，1–10                  | 5       |
| `render`          | 为SPA启用JavaScript渲染      | false   |
| `source`          | URL发现方式：`all`, `sitemaps`, `links` | `all`   |
| `maxAge`          | 最大缓存时长（秒），0–604800    | 86400   |
| `includePatterns` | 仅访问匹配这些通配符的URL  | auto    |
| `excludePatterns` | 跳过匹配这些通配符的URL        | —       |

结果将存储14天。每次抓取消耗50个请求单位（每天约可进行10次抓取）。

取消抓取任务：`DELETE /crawl/status/{jobId}`

Quick Reference

快速参考

Task	Method
Read a single web page	URL → Markdown (GET)
Fetch a remote PDF, DOCX, or spreadsheet	File → Markdown (GET or POST with URL)
Convert a local file on disk	File → Markdown (POST /convert with upload)
Retrieve content from multiple pages on a site	Crawl → Markdown
Handle a JS-heavy SPA that returns blank content	URL → Markdown with `method=browser`

任务	方法
读取单个网页	URL → Markdown（GET请求）
获取远程PDF、DOCX或电子表格	文件 → Markdown（GET请求或带URL的POST请求）
转换本地磁盘上的文件	文件 → Markdown（上传文件至POST /convert接口）
获取网站多个页面的内容	抓取 → Markdown
处理返回空白内容的JavaScript密集型SPA	URL → Markdown并设置 `method=browser`

Key Behaviors

核心行为准则

Start with the simple GET.
```
https://markdown.new/<url>
```
auto-selects the best conversion method and handles most cases.
Fall back to browser rendering. If a page returns empty or incomplete content, retry with
```
method=browser
```
.
Poll crawl jobs. Crawling is asynchronous — poll the status endpoint every few seconds until the job completes.
Use multiple crawl entry points. Sites often organize content under different URL prefixes. If the first crawl only captures one section, start additional crawls from other entry points (e.g.,
```
/reference/
```
,
```
/extensions/
```
) for full coverage.
Leave images off by default. Only set
```
retain_images=true
```
when the task specifically requires image content.
Public URLs only. Paywalled or authenticated pages are not supported.
Return raw output. Save and present the markdown.new response directly. Do not rewrite, paraphrase, or summarize unless the user explicitly requests it.

优先使用简单GET请求。
```
https://markdown.new/<url>
```
会自动选择最佳转换方法，适用于大多数场景。
回退到浏览器渲染。如果页面返回空内容或不完整内容，使用
```
method=browser
```
参数重试。
轮询抓取任务状态。抓取是异步操作——每隔几秒轮询状态接口，直到任务完成。
使用多个抓取入口点。网站通常会将内容组织在不同的URL前缀下。如果第一次抓取只覆盖了一个板块，从其他入口点启动额外的抓取任务（例如
```
/reference/
```
、
```
/extensions/
```
）以实现完整覆盖。
默认关闭图片保留。仅当任务明确需要图片内容时，才设置
```
retain_images=true
```
。
仅支持公开URL。不支持付费墙或需要认证的页面。
返回原始输出。直接保存并展示markdown.new的响应结果。除非用户明确要求，否则不要重写、改写或总结内容。