markitdown
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemarkitdown - Document to Markdown
markitdown - 文档转Markdown
Convert local documents to clean Markdown. One tool for PDF, Word, Excel, PowerPoint, images, and more.
将本地文档转换为格式整洁的Markdown。一款工具即可处理PDF、Word、Excel、PowerPoint、图片等多种格式。
When to Use markitdown
何时使用markitdown
| Use Case | Recommendation |
|---|---|
| Local files (PDF, Word, Excel) | ✅ Use markitdown - unique capability |
| Web pages | ❌ Use Jina ( |
| Blocked/anti-bot sites | ❌ Use Firecrawl |
| OCR on images | ✅ Use markitdown |
| Audio transcription | ✅ Use markitdown |
| 使用场景 | 推荐方案 |
|---|---|
| 本地文件(PDF、Word、Excel) | ✅ 使用markitdown - 独有功能 |
| 网页内容 | ❌ 使用Jina ( |
| 反爬虫/受限网站 | ❌ 使用Firecrawl |
| 图片OCR识别 | ✅ 使用markitdown |
| 音频转录 | ✅ 使用markitdown |
Basic Usage
基础用法
bash
undefinedbash
undefinedLocal files (primary use case)
Local files (primary use case)
markitdown document.pdf
markitdown report.docx
markitdown data.xlsx
markitdown slides.pptx
markitdown screenshot.png # OCR
markitdown document.pdf
markitdown report.docx
markitdown data.xlsx
markitdown slides.pptx
markitdown screenshot.png # OCR
URLs (works, but Jina is faster)
URLs (works, but Jina is faster)
markitdown https://example.com
markitdown https://example.com
Save output
Save output
markitdown document.pdf > document.md
undefinedmarkitdown document.pdf > document.md
undefinedSupported Formats
支持的格式
| Format | Extensions | Notes |
|---|---|---|
| Text extraction, tables | |
| Word | | Formatting preserved |
| Excel | | Tables to markdown |
| PowerPoint | | Slides as sections |
| Images | | OCR text extraction |
| HTML | | Clean conversion |
| Audio | | Speech-to-text |
| Text | | Pass-through/structure |
| URLs | | Works but slower than Jina |
| 格式 | 扩展名 | 说明 |
|---|---|---|
| 提取文本、表格 | |
| Word | | 保留格式 |
| Excel | | 转换表格为Markdown |
| PowerPoint | | 将幻灯片转为章节 |
| 图片 | | OCR文本提取 |
| HTML | | 干净转换 |
| 音频 | | 语音转文字 |
| 文本 | | 直接传递/结构化处理 |
| URL | | 可使用但速度慢于Jina |
Benchmarked Performance (URLs)
性能基准测试(URL处理)
| Tool | Avg Speed | Success Rate |
|---|---|---|
| Jina | 0.5s | 10/10 |
| markitdown | 2.5s | 9/10 |
| Firecrawl | 4.5s | 10/10 |
Verdict: For URLs, use Jina. For local files, markitdown is the only option.
| 工具 | 平均速度 | 成功率 |
|---|---|---|
| Jina | 0.5秒 | 10/10 |
| markitdown | 2.5秒 | 9/10 |
| Firecrawl | 4.5秒 | 10/10 |
结论:处理URL内容时使用Jina;处理本地文件时,markitdown是唯一选择。
Examples
示例
bash
undefinedbash
undefinedPDF to markdown (primary use case)
PDF to markdown (primary use case)
markitdown report.pdf > report.md
markitdown report.pdf > report.md
Excel spreadsheet
Excel spreadsheet
markitdown financials.xlsx
markitdown financials.xlsx
Image with text (OCR)
Image with text (OCR)
markitdown screenshot.png
markitdown screenshot.png
PowerPoint deck
PowerPoint deck
markitdown presentation.pptx > slides.md
markitdown presentation.pptx > slides.md
Audio transcription
Audio transcription
markitdown meeting.mp3 > transcript.md
undefinedmarkitdown meeting.mp3 > transcript.md
undefinedComparison with Alternatives
与替代工具对比
| Task | markitdown | Alternative |
|---|---|---|
| PDF text | | PyMuPDF, pdfplumber |
| Word docs | | python-docx |
| Excel | | pandas, openpyxl |
| OCR | | Tesseract |
| Web pages | Use Jina instead | |
markitdown's advantage: One CLI for all local document formats. No code needed.
| 任务 | markitdown | 替代工具 |
|---|---|---|
| PDF文本提取 | | PyMuPDF, pdfplumber |
| Word文档处理 | | python-docx |
| Excel文件处理 | | pandas, openpyxl |
| OCR识别 | | Tesseract |
| 网页内容处理 | 建议使用Jina | |
markitdown的优势:一款CLI工具即可处理所有本地文档格式,无需编写代码。