markdown-converter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMarkdown Converter
Markdown 转换器
Convert files to Markdown using — no installation required.
uvx markitdown使用将文件转换为Markdown格式——无需安装。
uvx markitdownBasic Usage
基本用法
bash
undefinedbash
undefinedConvert to stdout
转换后输出到标准输出
uvx markitdown input.pdf
uvx markitdown input.pdf
Save to file
保存到文件
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md
From stdin
从标准输入读取
cat input.pdf | uvx markitdown
undefinedcat input.pdf | uvx markitdown
undefinedSupported Formats
支持的格式
- Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- Web/Data: HTML, CSV, JSON, XML
- Media: Images (EXIF + OCR), Audio (EXIF + transcription)
- Other: ZIP (iterates contents), YouTube URLs, EPub
- 文档类:PDF、Word(.docx)、PowerPoint(.pptx)、Excel(.xlsx、.xls)
- 网页/数据类:HTML、CSV、JSON、XML
- 媒体类:图片(支持EXIF + OCR)、音频(支持EXIF + 转录)
- 其他类:ZIP(遍历压缩包内容)、YouTube链接、EPub
Options
选项
bash
-o OUTPUT # Output file
-x EXTENSION # Hint file extension (for stdin)
-m MIME_TYPE # Hint MIME type
-c CHARSET # Hint charset (e.g., UTF-8)
-d # Use Azure Document Intelligence
-e ENDPOINT # Document Intelligence endpoint
--use-plugins # Enable 3rd-party plugins
--list-plugins # Show installed pluginsbash
-o OUTPUT # 输出文件
-x EXTENSION # 指定文件扩展名提示(针对标准输入)
-m MIME_TYPE # 指定MIME类型提示
-c CHARSET # 指定字符编码(例如:UTF-8)
-d # 使用Azure Document Intelligence
-e ENDPOINT # Document Intelligence服务端点
--use-plugins # 启用第三方插件
--list-plugins # 显示已安装的插件Examples
示例
bash
undefinedbash
undefinedConvert Word document
转换Word文档
uvx markitdown report.docx -o report.md
uvx markitdown report.docx -o report.md
Convert Excel spreadsheet
转换Excel表格
uvx markitdown data.xlsx > data.md
uvx markitdown data.xlsx > data.md
Convert PowerPoint presentation
转换PowerPoint演示文稿
uvx markitdown slides.pptx -o slides.md
uvx markitdown slides.pptx -o slides.md
Convert with file type hint (for stdin)
针对标准输入指定文件类型提示
cat document | uvx markitdown -x .pdf > output.md
cat document | uvx markitdown -x .pdf > output.md
Use Azure Document Intelligence for better PDF extraction
使用Azure Document Intelligence提升PDF提取效果
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
undefineduvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
undefinedNotes
注意事项
- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use with Azure Document Intelligence
-d
- 输出内容会保留文档结构:标题、表格、列表、链接
- 首次运行会缓存依赖项;后续运行速度更快
- 对于提取难度大的复杂PDF,可使用选项搭配Azure Document Intelligence
-d