markdown-dot-new
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemarkdown.new — Markdown for Agents
markdown.new — 面向Agent的Markdown工具
Use markdown.new instead of raw HTML when reading web content, converting files, or crawling sites. It returns clean, structured Markdown with 80% fewer tokens — no parsing required.
Preserve original content. When a user asks to "get" or "fetch" content from a URL, return the actual source material directly. Do not summarize, paraphrase, or rewrite it. Only transform the content if the user explicitly requests a summary or rewrite.
Rate limit: 500 requests per day per IP. Monitor the response header.
x-rate-limit-remaining读取网页内容、转换文件或抓取网站时,请使用markdown.new而非原始HTML。它返回结构清晰的干净Markdown格式,token数量减少80%——无需额外解析。
保留原始内容。当用户要求“获取”某URL的内容时,直接返回实际的源材料。不要进行总结、改写或重写。仅当用户明确要求总结或改写时,才对内容进行转换。
速率限制:每个IP每天最多500次请求。请关注响应头。
x-rate-limit-remaining1. URL → Markdown
1. URL → Markdown
Convert any public URL to Markdown. This is the default method for reading web pages.
Simple GET — prepend to the target URL:
https://markdown.new/bash
curl -s 'https://markdown.new/https://example.com'POST with options:
bash
curl -s 'https://markdown.new/' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com", "method": "auto", "retain_images": false}'Options:
| Parameter | Values | Default | When to use |
|---|---|---|---|
| | | Use |
| | | Set |
Options also work as query parameters:
https://markdown.new/https://example.com?method=browser&retain_images=trueThe response is returned as with an header indicating the estimated token count.
text/markdownx-markdown-tokens将任何公开URL转换为Markdown格式。这是读取网页的默认方法。
简单GET请求——在目标URL前添加:
https://markdown.new/bash
curl -s 'https://markdown.new/https://example.com'带选项的POST请求:
bash
curl -s 'https://markdown.new/' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com", "method": "auto", "retain_images": false}'选项说明:
| 参数名 | 可选值 | 默认值 | 使用场景 |
|---|---|---|---|
| | | 针对JavaScript密集型SPA使用 |
| | | 当任务需要图片内容时设置为 |
选项也可以作为查询参数使用:
https://markdown.new/https://example.com?method=browser&retain_images=true响应以格式返回,同时包含响应头,显示预估的token数量。
text/markdownx-markdown-tokens2. File → Markdown
2. 文件 → Markdown
Convert documents to Markdown. Supports 20+ formats including PDF, DOCX, XLSX, XLS, ODS, ODT, CSV, JSON, XML, HTML, TXT, JPG, PNG, WebP, and SVG. Maximum file size is 10 MB.
Remote file (by URL):
bash
curl -s 'https://markdown.new/https://example.com/report.pdf'Or via POST:
bash
curl -s 'https://markdown.new/' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com/report.pdf"}'Local file upload:
bash
curl -s 'https://markdown.new/convert' \
-F 'file=@document.pdf'POST responses return JSON with metadata:
json
{
"success": true,
"url": "https://example.com/report.pdf",
"title": "Report Title",
"content": "# Report Title\n\n...",
"tokens": 850
}Upload responses nest data under a key with and fields.
datafilenamefile_type将文档转换为Markdown格式。支持20多种格式,包括PDF、DOCX、XLSX、XLS、ODS、ODT、CSV、JSON、XML、HTML、TXT、JPG、PNG、WebP和SVG。最大文件大小为10 MB。
远程文件(通过URL):
bash
curl -s 'https://markdown.new/https://example.com/report.pdf'或通过POST请求:
bash
curl -s 'https://markdown.new/' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com/report.pdf"}'本地文件上传:
bash
curl -s 'https://markdown.new/convert' \
-F 'file=@document.pdf'POST请求的响应返回包含元数据的JSON:
json
{
"success": true,
"url": "https://example.com/report.pdf",
"title": "Report Title",
"content": "# Report Title\n\n...",
"tokens": 850
}上传请求的响应将数据嵌套在键下,包含和字段。
datafilenamefile_type3. Crawl → Markdown
3. 抓取 → Markdown
Crawl an entire website section and retrieve all pages as Markdown. This is an async, job-based process: start a crawl, then poll for results.
Start a crawl:
bash
curl -X POST 'https://markdown.new/crawl' \
-H 'Content-Type: application/json' \
-d '{"url": "https://docs.example.com", "limit": 50}'Returns a job ID.
Check status and download results:
bash
undefined抓取整个网站板块,并将所有页面以Markdown格式返回。这是一个异步的基于任务的流程:启动抓取任务,然后轮询获取结果。
启动抓取任务:
bash
curl -X POST 'https://markdown.new/crawl' \
-H 'Content-Type: application/json' \
-d '{"url": "https://docs.example.com", "limit": 50}'返回一个任务ID。
检查状态并下载结果:
bash
undefinedAll pages concatenated as Markdown
所有页面合并为单个Markdown文件
Per-page records as JSON
按页面记录返回JSON格式
**Crawl options:**
| Parameter | Description | Default |
| ----------------- | ----------------------------------------- | ------- |
| `url` | Starting URL (required) | — |
| `limit` | Maximum pages to crawl, 1–500 | 500 |
| `depth` | Maximum link depth, 1–10 | 5 |
| `render` | Enable JavaScript rendering for SPAs | false |
| `source` | URL discovery: `all`, `sitemaps`, `links` | `all` |
| `maxAge` | Maximum cache age in seconds, 0–604800 | 86400 |
| `includePatterns` | Only visit URLs matching these wildcards | auto |
| `excludePatterns` | Skip URLs matching these wildcards | — |
Results are stored for 14 days. Each crawl consumes 50 request units (approximately 10 crawls per day).
To cancel a crawl: `DELETE /crawl/status/{jobId}`
**抓取选项**:
| 参数名 | 描述 | 默认值 |
| ----------------- | ----------------------------------------- | ------- |
| `url` | 起始URL(必填) | — |
| `limit` | 最大抓取页面数,1–500 | 500 |
| `depth` | 最大链接深度,1–10 | 5 |
| `render` | 为SPA启用JavaScript渲染 | false |
| `source` | URL发现方式:`all`, `sitemaps`, `links` | `all` |
| `maxAge` | 最大缓存时长(秒),0–604800 | 86400 |
| `includePatterns` | 仅访问匹配这些通配符的URL | auto |
| `excludePatterns` | 跳过匹配这些通配符的URL | — |
结果将存储14天。每次抓取消耗50个请求单位(每天约可进行10次抓取)。
取消抓取任务:`DELETE /crawl/status/{jobId}`Quick Reference
快速参考
| Task | Method |
|---|---|
| Read a single web page | URL → Markdown (GET) |
| Fetch a remote PDF, DOCX, or spreadsheet | File → Markdown (GET or POST with URL) |
| Convert a local file on disk | File → Markdown (POST /convert with upload) |
| Retrieve content from multiple pages on a site | Crawl → Markdown |
| Handle a JS-heavy SPA that returns blank content | URL → Markdown with |
| 任务 | 方法 |
|---|---|
| 读取单个网页 | URL → Markdown(GET请求) |
| 获取远程PDF、DOCX或电子表格 | 文件 → Markdown(GET请求或带URL的POST请求) |
| 转换本地磁盘上的文件 | 文件 → Markdown(上传文件至POST /convert接口) |
| 获取网站多个页面的内容 | 抓取 → Markdown |
| 处理返回空白内容的JavaScript密集型SPA | URL → Markdown并设置 |
Key Behaviors
核心行为准则
- Start with the simple GET. auto-selects the best conversion method and handles most cases.
https://markdown.new/<url> - Fall back to browser rendering. If a page returns empty or incomplete content, retry with .
method=browser - Poll crawl jobs. Crawling is asynchronous — poll the status endpoint every few seconds until the job completes.
- Use multiple crawl entry points. Sites often organize content under different URL prefixes. If the first crawl only captures one section, start additional crawls from other entry points (e.g., ,
/reference/) for full coverage./extensions/ - Leave images off by default. Only set when the task specifically requires image content.
retain_images=true - Public URLs only. Paywalled or authenticated pages are not supported.
- Return raw output. Save and present the markdown.new response directly. Do not rewrite, paraphrase, or summarize unless the user explicitly requests it.
- 优先使用简单GET请求。会自动选择最佳转换方法,适用于大多数场景。
https://markdown.new/<url> - 回退到浏览器渲染。如果页面返回空内容或不完整内容,使用参数重试。
method=browser - 轮询抓取任务状态。抓取是异步操作——每隔几秒轮询状态接口,直到任务完成。
- 使用多个抓取入口点。网站通常会将内容组织在不同的URL前缀下。如果第一次抓取只覆盖了一个板块,从其他入口点启动额外的抓取任务(例如、
/reference/)以实现完整覆盖。/extensions/ - 默认关闭图片保留。仅当任务明确需要图片内容时,才设置。
retain_images=true - 仅支持公开URL。不支持付费墙或需要认证的页面。
- 返回原始输出。直接保存并展示markdown.new的响应结果。除非用户明确要求,否则不要重写、改写或总结内容。