markdown-converter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Markdown Converter

Markdown 转换器

Convert files to Markdown using

uvx markitdown

— no installation required.

使用

uvx markitdown

将文件转换为Markdown格式——无需安装。

Basic Usage

基本用法

bash

undefined

bash

undefined

Convert to stdout

转换后输出到标准输出

uvx markitdown input.pdf

Save to file

保存到文件

uvx markitdown input.pdf -o output.md uvx markitdown input.docx > output.md

From stdin

从标准输入读取

cat input.pdf | uvx markitdown

undefined

cat input.pdf | uvx markitdown

undefined

Supported Formats

支持的格式

Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
Web/Data: HTML, CSV, JSON, XML
Media: Images (EXIF + OCR), Audio (EXIF + transcription)
Other: ZIP (iterates contents), YouTube URLs, EPub

文档类：PDF、Word（.docx）、PowerPoint（.pptx）、Excel（.xlsx、.xls）
网页/数据类：HTML、CSV、JSON、XML
媒体类：图片（支持EXIF + OCR）、音频（支持EXIF + 转录）
其他类：ZIP（遍历压缩包内容）、YouTube链接、EPub

Options

选项

bash

-o OUTPUT      # Output file
-x EXTENSION   # Hint file extension (for stdin)
-m MIME_TYPE   # Hint MIME type
-c CHARSET     # Hint charset (e.g., UTF-8)
-d             # Use Azure Document Intelligence
-e ENDPOINT    # Document Intelligence endpoint
--use-plugins  # Enable 3rd-party plugins
--list-plugins # Show installed plugins

bash

-o OUTPUT      # 输出文件
-x EXTENSION   # 指定文件扩展名提示（针对标准输入）
-m MIME_TYPE   # 指定MIME类型提示
-c CHARSET     # 指定字符编码（例如：UTF-8）
-d             # 使用Azure Document Intelligence
-e ENDPOINT    # Document Intelligence服务端点
--use-plugins  # 启用第三方插件
--list-plugins # 显示已安装的插件

Examples

示例

bash

undefined

bash

undefined

Convert Word document

转换Word文档

uvx markitdown report.docx -o report.md

Convert Excel spreadsheet

转换Excel表格

uvx markitdown data.xlsx > data.md

Convert PowerPoint presentation

转换PowerPoint演示文稿

uvx markitdown slides.pptx -o slides.md

Convert with file type hint (for stdin)

针对标准输入指定文件类型提示

cat document | uvx markitdown -x .pdf > output.md

Use Azure Document Intelligence for better PDF extraction

使用Azure Document Intelligence提升PDF提取效果

uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"

undefined

uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"

undefined

Notes

注意事项

Output preserves document structure: headings, tables, lists, links
First run caches dependencies; subsequent runs are faster
For complex PDFs with poor extraction, use
```
-d
```
with Azure Document Intelligence

输出内容会保留文档结构：标题、表格、列表、链接
首次运行会缓存依赖项；后续运行速度更快
对于提取难度大的复杂PDF，可使用
```
-d
```
选项搭配Azure Document Intelligence