glmocr-table

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GLM-OCR Table Recognition Skill / GLM-OCR 表格识别技能

GLM-OCR 表格识别技能

Extract tables from images and PDFs and convert them to Markdown format using the ZhiPu GLM-OCR layout parsing API.

使用智谱GLM-OCR版面解析API，从图片和PDF中提取表格并转换为Markdown格式。

When to Use / 使用场景

使用场景

Extract tables from images or scanned documents / 从图片或扫描件中提取表格
Convert table images to Markdown or Excel format / 将表格图片转为 Markdown 或可编辑格式
Recognize complex tables with merged cells / 识别含合并单元格的复杂表格
Parse financial statements, invoices, reports with tables / 解析财务报表、发票、带表格的报告
User mentions "extract table", "recognize table", "表格识别", "提取表格", "表格OCR", "表格转文字"

从图片或扫描件中提取表格
将表格图片转为Markdown或可编辑格式
识别含合并单元格的复杂表格
解析财务报表、发票、带表格的报告
用户提及"extract table"、"recognize table"、"表格识别"、"提取表格"、"表格OCR"、"表格转文字"

Key Features / 核心特性

核心特性

Complex table support: Handles merged cells, nested tables, multi-row headers
Markdown output: Tables are output in clean Markdown format, easy to edit and convert
Multi-page PDF: Supports batch extraction from multi-page PDF documents
Local file & URL: Supports both local files and remote URLs

支持复杂表格：可处理合并单元格、嵌套表格、多行表头
Markdown输出：表格以整洁的Markdown格式输出，易于编辑和转换
多页PDF支持：支持从多页PDF文档中批量提取表格
支持本地文件和URL：同时支持本地文件和远程URL两种输入方式

Resource Links / 资源链接

资源链接

Resource	Link
Get API Key	智谱开放平台 API Keys
API Docs	Layout Parsing / 版面解析

资源	链接
获取API Key	智谱开放平台 API Keys
API文档	版面解析

Prerequisites / 前置条件

前置条件

API Key Setup / API Key 配置（Required / 必需）

API Key 配置（必需）

脚本通过

ZHIPU_API_KEY

环境变量获取密钥，可与其他智谱技能复用同一个 key。 This script reads the key from the

ZHIPU_API_KEY

environment variable. Reusing the same key across Zhipu skills is optional.

Get Key / 获取 Key： Visit 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

Global config (recommended) / 全局配置（推荐）： Set once in

openclaw.json

under

env.vars

, all Zhipu skills will share it:

json

{
  "env": {
    "vars": {
      "ZHIPU_API_KEY": "你的密钥"
    }
  }
}

Skill-level config / Skill 级别配置： Set for this skill only in

openclaw.json

json

{
  "skills": {
    "entries": {
      "glmocr-table": {
        "env": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
  }
}

Shell environment variable / Shell 环境变量： Add to
```
~/.zshrc
```
:
bash
```
export ZHIPU_API_KEY="你的密钥"
```

💡 如果你已为其他智谱 skill（如
glmocr
、
glmv-caption
、
glm-image-generation
）配置过 key，它们共享同一个
ZHIPU_API_KEY
，无需重复配置。

脚本通过

ZHIPU_API_KEY

环境变量获取密钥，可与其他智谱技能复用同一个key。该脚本从

ZHIPU_API_KEY

环境变量读取密钥，支持在多个智谱技能中复用同一个密钥。

获取Key： 访问智谱开放平台 API Keys创建或复制你的密钥。

配置方式（任选一种）：

全局配置（推荐）： 在

openclaw.json

的

env.vars

中配置一次，所有智谱技能都可共享：

json

{
  "env": {
    "vars": {
      "ZHIPU_API_KEY": "你的密钥"
    }
  }
}

Skill级别配置： 仅在

openclaw.json

中为该技能单独配置：

json

{
  "skills": {
    "entries": {
      "glmocr-table": {
        "env": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
  }
}

Shell环境变量： 添加到

~/.zshrc

中：

bash

export ZHIPU_API_KEY="你的密钥"

💡 如果你已为其他智谱skill（如
glmocr
、
glmv-caption
、
glm-image-generation
）配置过密钥，它们共享同一个
ZHIPU_API_KEY
，无需重复配置。

Security & Transparency / 安全与透明度

安全与透明度

Environment variables used / 使用的环境变量：
- ```
ZHIPU_API_KEY
```
  (required / 必需)
- ```
GLM_OCR_TIMEOUT
```
  (optional timeout seconds / 可选超时秒数)

Fixed endpoint / 固定官方端点：

https://open.bigmodel.cn/api/paas/v4/layout_parsing

No custom API URL override / 不支持自定义 API URL 覆盖： this avoids accidental key exfiltration via redirected endpoints.
Raw upstream response is optional / 原始响应默认不返回： use
```
--include-raw
```
only when needed for debugging.

⛔ MANDATORY RESTRICTIONS / 强制限制 ⛔

ONLY use GLM-OCR API — Execute the script
```
python scripts/glm_ocr_cli.py
```
NEVER parse tables yourself — Do NOT try to extract tables using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to recognize it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt table extraction any other way

使用的环境变量：
- ```
ZHIPU_API_KEY
```
  （必需）
- ```
GLM_OCR_TIMEOUT
```
  （可选，超时秒数）

固定官方端点：

https://open.bigmodel.cn/api/paas/v4/layout_parsing

不支持自定义API URL覆盖： 避免端点重定向导致密钥意外泄露
默认不返回原始上游响应： 仅在需要调试时使用
```
--include-raw
```
参数返回

⛔ 强制限制 ⛔

仅可使用GLM-OCR API — 执行脚本
```
python scripts/glm_ocr_cli.py
```
不得自行解析表格 — 不要尝试使用内置视觉能力或其他任何方法提取表格
不得提供替代方案 — 不要给出"我可以尝试识别"之类的建议
如果API调用失败 — 展示错误信息并立即停止处理
无备用方案 — 不得尝试通过其他任何方式提取表格

📋 Output Display Rules / 输出展示规则

📋 输出展示规则

After running the script, present the OCR result clearly and safely.

Show extracted table Markdown (
```
text
```
) in full
Summarization is allowed, but do not hide important extraction failures
If
```
layout_details
```
contains table-related entries, you may highlight them
If the result file is saved, tell the user the file path
Show raw upstream response only when explicitly requested or debugging (
```
--include-raw
```
)

运行脚本后，清晰、安全地展示OCR结果。

完整展示提取到的表格Markdown（
```
text
```
字段内容）
允许进行总结，但不得隐瞒重要的提取失败信息
如果
```
layout_details
```
包含表格相关条目，可以高亮展示
如果结果保存为文件，告知用户文件路径
仅在用户明确要求或调试（使用
```
--include-raw
```
）时展示原始上游响应

How to Use / 使用方法

使用方法

Extract from URL / 从 URL 提取

从URL提取

bash

python scripts/glm_ocr_cli.py --file-url "https://example.com/table.png"

bash

python scripts/glm_ocr_cli.py --file-url "https://example.com/table.png"

Extract from Local File / 从本地文件提取

从本地文件提取

bash

python scripts/glm_ocr_cli.py --file /path/to/table.png

bash

python scripts/glm_ocr_cli.py --file /path/to/table.png

Save Result to File / 保存结果到文件

保存结果到文件

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --pretty

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --pretty

Include Raw Upstream Response (Debug Only) / 包含原始上游响应（仅调试）

包含原始上游响应（仅调试）

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --include-raw

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --include-raw

CLI Reference / CLI 参数

CLI 参数参考

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output` , `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output
`--include-raw`	No	Include raw upstream API response in `result` field (debug only)

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]

参数	是否必填	说明
`--file-url`	二选一	图片/PDF的URL
`--file`	二选一	图片/PDF的本地文件路径
`--output` , `-o`	否	将结果JSON保存到指定文件
`--pretty`	否	格式化输出JSON
`--include-raw`	否	在 `result` 字段中包含原始上游API响应（仅用于调试）

Response Format / 响应格式

响应格式

json

{
  "ok": true,
  "text": "| Column 1 | Column 2 |\n|----------|----------|\n| Data     | Data     |",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

Key fields:

```
ok
```
— whether extraction succeeded
```
text
```
— extracted text in Markdown (use this for display)
```
layout_details
```
— layout analysis details
```
error
```
— error details on failure

json

{
  "ok": true,
  "text": "| 列1 | 列2 |\n|----------|----------|\n| 数据     | 数据     |",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

关键字段说明：

```
ok
```
— 提取是否成功
```
text
```
— 提取到的Markdown格式文本（用于展示）
```
layout_details
```
— 版面分析详情
```
error
```
— 提取失败时的错误详情

Error Handling / 错误处理

错误处理

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

API密钥未配置：

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ 向用户展示完整错误信息，引导用户完成配置

认证失败（401/403）： API密钥无效/过期 → 引导用户重新配置

频率限制（429）： 配额耗尽 → 告知用户稍后再试

文件未找到： 本地文件不存在 → 引导用户检查路径