markitdown

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

markitdown - Document to Markdown

markitdown - 文档转Markdown

Convert local documents to clean Markdown. One tool for PDF, Word, Excel, PowerPoint, images, and more.
将本地文档转换为格式整洁的Markdown。一款工具即可处理PDF、Word、Excel、PowerPoint、图片等多种格式。

When to Use markitdown

何时使用markitdown

Use CaseRecommendation
Local files (PDF, Word, Excel)Use markitdown - unique capability
Web pages❌ Use Jina (
r.jina.ai/
) - 5x faster
Blocked/anti-bot sites❌ Use Firecrawl
OCR on imagesUse markitdown
Audio transcriptionUse markitdown
使用场景推荐方案
本地文件(PDF、Word、Excel)使用markitdown - 独有功能
网页内容❌ 使用Jina (
r.jina.ai/
) - 速度快5倍
反爬虫/受限网站❌ 使用Firecrawl
图片OCR识别使用markitdown
音频转录使用markitdown

Basic Usage

基础用法

bash
undefined
bash
undefined

Local files (primary use case)

Local files (primary use case)

markitdown document.pdf markitdown report.docx markitdown data.xlsx markitdown slides.pptx markitdown screenshot.png # OCR
markitdown document.pdf markitdown report.docx markitdown data.xlsx markitdown slides.pptx markitdown screenshot.png # OCR

URLs (works, but Jina is faster)

URLs (works, but Jina is faster)

Save output

Save output

markitdown document.pdf > document.md
undefined
markitdown document.pdf > document.md
undefined

Supported Formats

支持的格式

FormatExtensionsNotes
PDF
.pdf
Text extraction, tables
Word
.docx
Formatting preserved
Excel
.xlsx
Tables to markdown
PowerPoint
.pptx
Slides as sections
Images
.jpg
,
.png
OCR text extraction
HTML
.html
Clean conversion
Audio
.mp3
,
.wav
Speech-to-text
Text
.txt
,
.csv
,
.json
,
.xml
Pass-through/structure
URLs
https://...
Works but slower than Jina
格式扩展名说明
PDF
.pdf
提取文本、表格
Word
.docx
保留格式
Excel
.xlsx
转换表格为Markdown
PowerPoint
.pptx
将幻灯片转为章节
图片
.jpg
,
.png
OCR文本提取
HTML
.html
干净转换
音频
.mp3
,
.wav
语音转文字
文本
.txt
,
.csv
,
.json
,
.xml
直接传递/结构化处理
URL
https://...
可使用但速度慢于Jina

Benchmarked Performance (URLs)

性能基准测试(URL处理)

ToolAvg SpeedSuccess Rate
Jina0.5s10/10
markitdown2.5s9/10
Firecrawl4.5s10/10
Verdict: For URLs, use Jina. For local files, markitdown is the only option.
工具平均速度成功率
Jina0.5秒10/10
markitdown2.5秒9/10
Firecrawl4.5秒10/10
结论:处理URL内容时使用Jina;处理本地文件时,markitdown是唯一选择。

Examples

示例

bash
undefined
bash
undefined

PDF to markdown (primary use case)

PDF to markdown (primary use case)

markitdown report.pdf > report.md
markitdown report.pdf > report.md

Excel spreadsheet

Excel spreadsheet

markitdown financials.xlsx
markitdown financials.xlsx

Image with text (OCR)

Image with text (OCR)

markitdown screenshot.png
markitdown screenshot.png

PowerPoint deck

PowerPoint deck

markitdown presentation.pptx > slides.md
markitdown presentation.pptx > slides.md

Audio transcription

Audio transcription

markitdown meeting.mp3 > transcript.md
undefined
markitdown meeting.mp3 > transcript.md
undefined

Comparison with Alternatives

与替代工具对比

TaskmarkitdownAlternative
PDF text
markitdown file.pdf
PyMuPDF, pdfplumber
Word docs
markitdown file.docx
python-docx
Excel
markitdown file.xlsx
pandas, openpyxl
OCR
markitdown image.png
Tesseract
Web pagesUse Jina instead
r.jina.ai/URL
(5x faster)
markitdown's advantage: One CLI for all local document formats. No code needed.
任务markitdown替代工具
PDF文本提取
markitdown file.pdf
PyMuPDF, pdfplumber
Word文档处理
markitdown file.docx
python-docx
Excel文件处理
markitdown file.xlsx
pandas, openpyxl
OCR识别
markitdown image.png
Tesseract
网页内容处理建议使用Jina
r.jina.ai/URL
(速度快5倍)
markitdown的优势:一款CLI工具即可处理所有本地文档格式,无需编写代码。