mistral-ocr

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Mistral OCR

Extract text from images and PDFs using Mistral's dedicated OCR API. No external dependencies required.

使用Mistral专属OCR API从图片和PDF中提取文本，无需外部依赖。

Requirements

前提条件

This skill requires a Mistral API key. If you don't have one, follow the guide in reference/getting-started.md.

该Skill需要Mistral API密钥。如果您还没有，请遵循reference/getting-started.md中的指南获取。

API Key

API密钥

The user must provide their Mistral API key. Ask for it if not available.

Option 1 (Recommended for AI agents): User provides key directly in message:

"Use this Mistral key: aBc123XyZ..."
"Convert this PDF to markdown, my API key is aBc123XyZ..."

Option 2: Environment variable

$MISTRAL_API_KEY

Option 3: Claude Code settings (

~/.claude/settings.json

)

If no key is available, guide the user to get one at console.mistral.ai.

用户必须提供自己的Mistral API密钥。如果未获取到，请向用户索要。

选项1（推荐AI Agent使用）：用户在消息中直接提供密钥：

"Use this Mistral key: aBc123XyZ..."
"Convert this PDF to markdown, my API key is aBc123XyZ..."

选项2：环境变量
$MISTRAL_API_KEY

选项3：Claude Code设置（
~/.claude/settings.json
）

如果没有可用密钥，引导用户前往console.mistral.ai获取。

API Endpoint

API端点

Use the dedicated OCR endpoint for all document processing:

POST https://api.mistral.ai/v1/ocr

Model:

mistral-ocr-latest

使用专属OCR端点处理所有文档：

POST https://api.mistral.ai/v1/ocr

模型：

mistral-ocr-latest

Features

功能特性

1. PDF → Markdown (Direct, no conversion needed!)

1. PDF → Markdown（直接转换，无需额外步骤！）

bash

curl -s "https://api.mistral.ai/v1/ocr" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-ocr-latest",
    "document": {
      "type": "document_url",
      "document_url": "https://example.com/document.pdf"
    }
  }'

bash

curl -s "https://api.mistral.ai/v1/ocr" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-ocr-latest",
    "document": {
      "type": "document_url",
      "document_url": "https://example.com/document.pdf"
    }
  }'

2. Image → Text

2. 图片 → 文本

Works with JPG, PNG, WEBP, GIF:

bash

curl -s "https://api.mistral.ai/v1/ocr" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-ocr-latest",
    "document": {
      "type": "image_url",
      "image_url": "https://example.com/image.jpg"
    }
  }'

支持JPG、PNG、WEBP、GIF格式：

bash

curl -s "https://api.mistral.ai/v1/ocr" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-ocr-latest",
    "document": {
      "type": "image_url",
      "image_url": "https://example.com/image.jpg"
    }
  }'

3. Local Files (Base64 Data URL)

3. 本地文件（Base64数据URL）

For local PDFs or images, encode as base64 and use a data URL.

ALWAYS use curl (works on all platforms including Windows via Git Bash):

bash

undefined

对于本地PDF或图片，将其编码为base64格式并使用数据URL。

请始终使用curl（适用于所有平台，包括Windows的Git Bash）：

bash

undefined

For local PDF

处理本地PDF

BASE64=$(base64 -w0 document.pdf) curl -s "https://api.mistral.ai/v1/ocr"
-H "Authorization: Bearer $MISTRAL_API_KEY"
-H "Content-Type: application/json"
-d '{ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "data:application/pdf;base64,'"$BASE64"'" } }'

For local images (PNG, JPG, etc.)

处理本地图片（PNG、JPG等）

BASE64=$(base64 -w0 image.png) curl -s "https://api.mistral.ai/v1/ocr"
-H "Authorization: Bearer $MISTRAL_API_KEY"
-H "Content-Type: application/json"
-d '{ "model": "mistral-ocr-latest", "document": { "type": "image_url", "image_url": "data:image/png;base64,'"$BASE64"'" } }'


**MIME types:**
- PDF: `data:application/pdf;base64,...`
- PNG: `data:image/png;base64,...`
- JPG: `data:image/jpeg;base64,...`
- WEBP: `data:image/webp;base64,...`


**MIME类型：**
- PDF: `data:application/pdf;base64,...`
- PNG: `data:image/png;base64,...`
- JPG: `data:image/jpeg;base64,...`
- WEBP: `data:image/webp;base64,...`

4. Structured JSON Output

4. 结构化JSON输出

For invoices, forms, tables - ask for JSON in a follow-up or use Document AI annotations.

对于发票、表单、表格，可以在后续请求中要求返回JSON格式，或使用Document AI注释功能。

Response Format

响应格式

The API returns markdown directly:

json

{
  "pages": [
    {
      "index": 0,
      "markdown": "# Document Title\n\nExtracted content here...",
      "images": [],
      "tables": [],
      "dimensions": {"dpi": 200, "height": 842, "width": 595}
    }
  ],
  "model": "mistral-ocr-latest",
  "usage_info": {"pages_processed": 1, "doc_size_bytes": 12345}
}

API直接返回Markdown格式内容：

json

{
  "pages": [
    {
      "index": 0,
      "markdown": "# Document Title\n\nExtracted content here...",
      "images": [],
      "tables": [],
      "dimensions": {"dpi": 200, "height": 842, "width": 595}
    }
  ],
  "model": "mistral-ocr-latest",
  "usage_info": {"pages_processed": 1, "doc_size_bytes": 12345}
}

Workflow

工作流程

User requests OCR from image or PDF

用户请求对图片或PDF进行OCR识别

Get API key - Ask user if not in environment
Determine input type (URL or local file)
For local files, ALWAYS use temp file approach (avoids "Argument list too long" error):

bash

undefined

获取API密钥 - 如果环境变量中没有，向用户索要
确定输入类型（URL或本地文件）
对于本地文件，始终使用临时文件方法（避免"参数列表过长"错误）：

bash

undefined

Cross-platform temp directory

跨平台临时目录

TMPDIR="${TMPDIR:-${TEMP:-/tmp}}"

Step 1: Encode file to base64

步骤1：将文件编码为base64格式

base64 -w0 "document.pdf" > "$TMPDIR/b64.txt"

Step 2: Create JSON request file

步骤2：创建JSON请求文件

echo '{"model":"mistral-ocr-latest","document":{"type":"document_url","document_url":"data:application/pdf;base64,'$(cat "$TMPDIR/b64.txt")'"}}' > "$TMPDIR/request.json"

Step 3: Call API with -d @file (use actual key, not variable)

步骤3：使用-d @file调用API（请使用实际密钥，而非变量）

curl -s "https://api.mistral.ai/v1/ocr"
-H "Authorization: Bearer YOUR_API_KEY_HERE"
-H "Content-Type: application/json"
-d @"$TMPDIR/request.json" > "$TMPDIR/response.json"

Step 4: Extract markdown with node (NOT jq - not available on all systems)

步骤4：使用node提取Markdown（请勿使用jq - 并非所有系统都支持）

node -e "const fs=require('fs'); const r=JSON.parse(fs.readFileSync('$TMPDIR/response.json')); console.log(r.pages.map(p=>p.markdown).join('\n\n---\n\n'))"


4. **Save to .md file** using Write tool
5. Confirm file location to user

node -e "const fs=require('fs'); const r=JSON.parse(fs.readFileSync('$TMPDIR/response.json')); console.log(r.pages.map(p=>p.markdown).join('\n\n---\n\n'))"


4. **使用Write工具保存为.md文件**
5. 向用户确认文件保存位置

IMPORTANT: Cross-Platform Compatibility

重要提示：跨平台兼容性

ALWAYS use curl (works on Windows via Git Bash)
ALWAYS use
-d @file
for request body (handles large files)
NEVER use jq - use node instead to parse JSON
Use
${TMPDIR:-${TEMP:-/tmp}}
for temp files (works on all systems)
Copy response.json to user directory before parsing with node on Windows

始终使用curl（适用于Windows的Git Bash）
**始终使用
```
-d @file
```
**传递请求体（处理大文件）
请勿使用jq - 改用node解析JSON
**使用
```
${TMPDIR:-${TEMP:-/tmp}}
```
**存储临时文件（适用于所有系统）
在Windows上，解析前将response.json复制到用户目录

Usage Examples

使用示例

When the user says:

User Request	Action
"Convert this PDF to markdown"	OCR the PDF, save as .md file
"Extract text from this image"	OCR the image, return text
"Give me a .md of this document"	OCR and save as .md file
"What does this PDF say?"	OCR and summarize content
"OCR this receipt"	Extract text, optionally structure as JSON

当用户提出以下请求时：

用户请求	操作
"把这个PDF转换成Markdown"	对PDF进行OCR识别，保存为.md文件
"提取这张图片里的文本"	对图片进行OCR识别，返回文本内容
"给我这个文档的.md版本"	进行OCR识别并保存为.md文件
"这个PDF里写了什么？"	进行OCR识别并总结内容
"识别这张收据的内容"	提取文本，可选转换为结构化JSON格式

Error Handling

错误处理

Error	Cause	Solution
401 Unauthorized	Invalid API key	Verify key, guide to getting-started.md
400 Bad Request	Invalid document	Check format and URL accessibility
3310 File fetch error	URL not accessible	Use base64 for local files
Rate limit	Too many requests	Wait and retry

错误	原因	解决方案
401 Unauthorized	API密钥无效	验证密钥，引导用户查看getting-started.md
400 Bad Request	文档无效	检查格式和URL可访问性
3310 File fetch error	URL无法访问	对本地文件使用base64编码
速率限制	请求过于频繁	等待后重试

Supported Formats

支持的格式

Format	Support
PDF	✅ Direct (no conversion)
PNG	✅ Direct
JPG/JPEG	✅ Direct
WEBP	✅ Direct
GIF	✅ Direct

No external dependencies required! Unlike other OCR solutions, Mistral OCR handles PDFs directly without needing pdftoppm, ImageMagick, or any other tools.

格式	支持情况
PDF	✅ 直接支持（无需转换）
PNG	✅ 直接支持
JPG/JPEG	✅ 直接支持
WEBP	✅ 直接支持
GIF	✅ 直接支持

无需外部依赖！ 与其他OCR解决方案不同，Mistral OCR可直接处理PDF，无需pdftoppm、ImageMagick或任何其他工具。

Pricing

定价

As of 2025, Mistral OCR pricing:

$2 per 1,000 pages
50% discount with Batch API

Check current rates at mistral.ai/pricing

截至2025年，Mistral OCR定价：

每1000页2美元
使用批量API可享受50%折扣

请查看mistral.ai/pricing获取最新价格。

References

参考资料

Getting Started - How to get your API key
PDF to Markdown - PDF conversion examples
Output Formats - JSON, Markdown, plain text
Step-by-Step Guide - Complete tutorial with examples

Skill by Parlamento AI

快速入门 - 如何获取API密钥
PDF转Markdown - PDF转换示例
输出格式 - JSON、Markdown、纯文本
分步指南 - 完整示例教程

该Skill由Parlamento AI开发