glmv-pdf-to-ppt

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PDF → HTML PPT Skill

PDF → HTML PPT 技能

Convert any PDF into a multi-slide HTML presentation. Pages are converted to images at DPI 120, read sequentially to understand the content, then a structured

outline.json

is saved, images are cropped locally (no cloud upload), slides are rendered one by one, and finally a

summary.md

is generated.

Scripts are in:

{SKILL_DIR}/scripts/

将任意PDF转换为多页HTML演示文稿。PDF页面会以120 DPI的精度转换为图片，按顺序读取以理解内容，随后保存结构化的

outline.json

文件，所有图片都在本地裁剪（无需上传到云端），逐张渲染幻灯片，最终生成

summary.md

文件。

脚本存放路径:

{SKILL_DIR}/scripts/

Dependencies

依赖项

Python packages (install once):

bash

pip install pymupdf pillow

System tools:

curl

(pre-installed on macOS/Linux).

Python依赖包（仅需安装一次）:

bash

pip install pymupdf pillow

系统工具:

curl

（macOS/Linux系统默认已安装）。

When to Use

触发时机

Trigger when the user asks to make slides or a presentation from a PDF — phrases like: "make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", or any similar intent in Chinese or English.

当用户要求基于PDF制作幻灯片或演示文稿时触发，包括但不限于以下中英文表述： "make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", 以及其他中英文的相似意图表述。

Output Directory Convention

输出目录规范

All output goes under

{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/

ppt/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← structured slide plan (SlidesPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── slide3_method_crop.png
    │   └── slide5_results_crop.png
    ├── slide_01.html
    ├── slide_02.html
    ├── ...
    └── summary.md          ← final summary document

```
<pdf_stem>
```
= PDF filename without extension

<timestamp>

= format

YYYYMMDD_HHMMSS

(e.g.

20240119_143022

)

Cropped images go in
```
crops/
```
subfolder
Each slide HTML references images via relative path
```
crops/<name>.png
```

所有输出文件都保存在

{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/

路径下：

ppt/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← 结构化幻灯片大纲（遵循SlidesPlan schema）
    │   ├── slide3_method_crop.png
    │   └── slide5_results_crop.png
    ├── slide_01.html
    ├── slide_02.html
    ├── ...
    └── summary.md          ← 最终摘要文档

```
<pdf_stem>
```
= 不带后缀的PDF文件名

<timestamp>

= 格式为

YYYYMMDD_HHMMSS

的时间戳（例如

20240119_143022

）

裁剪后的图片存放在
```
crops/
```
子文件夹中
每张幻灯片HTML通过相对路径
```
crops/<name>.png
```
引用图片资源

Input

输入

$ARGUMENTS

is the path to the PDF file (local) or an HTTP/HTTPS URL.

If user provides a URL: download with curl first, then convert
If user provides a local PDF path: convert directly

$ARGUMENTS

为本地PDF文件路径，或者HTTP/HTTPS格式的PDF链接。

如果用户提供URL链接: 先用curl下载到本地，再进行转换
如果用户提供本地PDF路径: 直接进行转换

Workflow

工作流程

Phase 0 — Create Output Directory

阶段0 — 创建输出目录

Compute the output path:

python

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")

Create it immediately:

bash

mkdir -p "<out_dir>/crops"

Record

out_dir

— use it for all subsequent phases.

计算输出路径:

python

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")

立即创建目录:

bash

mkdir -p "<out_dir>/crops"

记录

out_dir

路径，后续所有阶段都使用该路径作为输出根目录。

Phase 1 — Convert PDF Pages to Images (DPI 120)

阶段1 — 将PDF页面转换为图片（DPI 120）

If the input is a URL, download it first:

bash

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

Then convert (pass either the downloaded path or the original local path):

bash

python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120

Outputs JSON to stdout:

json

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

Parse and store the full

page → path

map. These local paths are used for viewing pages and as

--path

input to

crop.py

如果输入是URL链接，先下载到本地:

bash

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

随后执行转换（传入下载后的路径或者原始本地路径均可）:

bash

python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120

执行后会在标准输出返回JSON结果:

json

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

解析并存储完整的

页码 → 图片路径

映射关系，这些本地路径将用于后续页面浏览，以及作为

crop.py

的

--path

输入参数。

Phase 2 — Read All Pages in Order

阶段2 — 按顺序读取所有页面内容

View all page images sequentially before planning anything. Your goal here is pure understanding — absorb the full structure, content, figures, and arguments of the document.

While reading, note:

What figures, charts, or tables appear on which pages
The overall arc (intro → method → results → conclusion for papers; or logical structure for other doc types)
Candidate visuals worth cropping for slides (page number + rough region)

Do NOT plan or write slides yet — just read and understand all pages first.

在规划任何内容之前，按顺序浏览所有页面图片。本阶段的目标是纯内容理解，完整掌握文档的结构、内容、图表和核心论点。

阅读过程中需要记录:

哪些页面出现了哪些图表、流程图或表格
文档的整体逻辑脉络（学术论文一般为引言→方法→结果→结论；其他类型文档对应各自的逻辑结构）
适合裁剪放到幻灯片中的候选可视化素材（页码+大致区域）

本阶段不要规划或编写幻灯片内容，仅完成全文档阅读和理解即可。

Phase 3 — Plan Outline & Save outline.json

阶段3 — 规划大纲并保存outline.json

After reading all pages, plan 8–15 slides (adapt freely for non-academic documents).

Slide	Typical purpose
1	Title, authors, affiliation, venue/year
2	Motivation / Problem statement
3	Related Work (brief)
4–N-2	Method / Core contributions (one concept per slide)
N-1	Results & Experiments
N	Conclusion & Future Work

For each slide that needs a visual, identify:

Which page it comes from (the local page path from Phase 1)
A description of what the visual shows and why it belongs on this slide

Save the outline as
<out_dir>/outline.json
using exactly this schema:

json

{
  "presentation_title": "Paper Title Here",
  "lang": "Chinese",
  "total_slides": 10,
  "slides_plan": [
    {
      "slide_index": 1,
      "title": "Slide Title",
      "main_content": "Key points and text content for this slide",
      "template_id": null,
      "required_crops": [
        {
          "url": "<page_image_url_from_phase1>",
          "visual_description": "Figure 3: architecture diagram showing encoder-decoder",
          "usage_reason": "Illustrates the core model structure for slide 4"
        }
      ]
    }
  ]
}

Field notes:

```
lang
```
:
```
"Chinese"
```
or
```
"English"
```
— match the PDF language
```
template_id
```
: always
```
null
```
```
required_crops
```
: empty array
```
[]
```
if this slide needs no images
```
url
```
in each crop: the local file path of the source page image (from Phase 1
```
path
```
field) — this is what crop.py will open and crop from
```
visual_description
```
: what the visual shows, including figure/table number if available
```
usage_reason
```
: why this visual belongs on this particular slide
For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4

Write

outline.json

using the Write tool to

<out_dir>/outline.json

读完所有页面后，规划8-15张幻灯片（非学术类文档可灵活调整数量）。

幻灯片序号	典型用途
1	标题、作者、所属机构、发表场合/年份
2	研究动机/问题定义
3	相关工作（简述）
4–N-2	研究方法/核心贡献（每张幻灯片讲解一个概念）
N-1	结果与实验
N	结论与未来工作

对于每张需要可视化素材的幻灯片，需要明确:

素材来自哪一页（对应阶段1生成的本地页面图片路径）
可视化素材的内容说明，以及该素材适合放在当前幻灯片的原因

将大纲保存为
<out_dir>/outline.json
，严格遵循以下schema:

json

{
  "presentation_title": "此处填写演示文稿标题",
  "lang": "Chinese",
  "total_slides": 10,
  "slides_plan": [
    {
      "slide_index": 1,
      "title": "幻灯片标题",
      "main_content": "本幻灯片的核心要点和文本内容",
      "template_id": null,
      "required_crops": [
        {
          "url": "<阶段1生成的页面图片路径>",
          "visual_description": "图3：展示编码器-解码器架构的流程图",
          "usage_reason": "用于说明第4张幻灯片的核心模型结构"
        }
      ]
    }
  ]
}

字段说明:

```
lang
```
: 取值为
```
"Chinese"
```
或
```
"English"
```
，和PDF的语言保持一致
```
template_id
```
: 始终设为
```
null
```
```
required_crops
```
: 如果当前幻灯片不需要图片，设为空数组
```
[]
```
每个裁剪项的
```
url
```
: 源页面图片的本地文件路径（来自阶段1返回的
```
path
```
字段），是crop.py读取和裁剪的源文件路径
```
visual_description
```
: 可视化素材的内容说明，如果有图表编号需要包含在内
```
usage_reason
```
: 该素材适合放在当前幻灯片的原因
对于需要裁剪的图片，仅需要记录大致区域，精确的裁剪框将在阶段4确定

使用Write工具将

outline.json

写入

<out_dir>/outline.json

路径。

Phase 4 — Crop Required Images (Grounding + Subagent)

阶段4 — 裁剪所需图片（Grounding + 子Agent）

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.

IMPORTANT: You MUST use the provided
{SKILL_DIR}/scripts/crop.py
script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.

Read

outline.json

. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.

Use the Agent tool like this:

Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    <page_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "slide<N>_<descriptive_name>"
       Target: "<visual_description from outline.json>"
       Context: "<usage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]

Replace

<page_image_path>

<SKILL_DIR>

<out_dir>

, and crop details with actual values from your context.

The crop.py script outputs JSON:

{"path": "/abs/path/slide3_method_crop.png"}

Collect results from all subagents and build the mapping:

slide_index → [crop filename, ...]

to reference in HTML. The filename will be

<name>_crop.png

Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.

重要提示：你必须使用Agent工具将所有裁剪任务委托给全新的子Agent完成。 到本阶段时你的上下文已经非常长（包含所有页面图片+大纲），会降低视觉坐标识别的准确率。仅加载目标图片的全新子Agent可以生成精准度高得多的坐标。

重要提示：所有图片裁剪必须使用提供的
{SKILL_DIR}/scripts/crop.py
脚本完成。不要编写自己的裁剪代码，不要直接使用PIL/Pillow，不要使用任何其他裁剪方法。

读取

outline.json

，收集所有需要的裁剪任务，然后每个源页面启动一个子Agent（如果裁剪素材来自不同页面，也可以每个裁剪任务启动一个子Agent）。子Agent使用grounding风格的定位方法：浏览图片，定位目标元素，输出归一化0-999范围的精确边界框。

按以下格式调用Agent工具:

Agent工具调用:
  description: "Grounding crop page N"
  prompt: |
    你是视觉定位与裁剪助手，你的任务是在页面图片中精确定位指定的视觉元素并裁剪出来。

    ## 定位方法

    使用视觉定位能力定位每个目标元素:
    1. 使用Read工具读取源图片进行浏览
    2. 识别下文描述的目标元素
    3. 确定目标元素的边界框，使用0-999范围的归一化坐标:
       - 0 = 图片的左/上边缘
       - 999 = 图片的右/下边缘
       - 坐标单位为千分比，不是像素，也不是0-100的百分比
       - 格式: [x1, y1, x2, y2]，其中(x1,y1)是左上角坐标，(x2,y2)是右下角坐标
       - 示例: [0, 0, 500, 500] = 图片左上角四分之一区域
    4. 保证精度：边界框紧贴目标元素，周围保留少量边距（约10-20单位），不要裁剪范围过大或过小。

    ## 源图片
    <page_image_path>

    ## 需要裁剪的内容

    对于下方列出的每个裁剪项，先执行定位（找到目标元素），再进行裁剪:

    1. 名称: "slide<N>_<描述性名称>"
       目标: "<outline.json中的visual_description字段内容>"
       上下文: "<outline.json中的usage_reason字段内容>"

    ## 裁剪命令

    确定每个目标的边界框[X1, Y1, X2, Y2]后，执行以下命令:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<裁剪项名称>" \
        --out-dir "<out_dir>/crops"
    ```

    ## 验证

    每次裁剪完成后，读取输出的图片进行视觉验证，确认裁剪的区域正确。如果裁剪遗漏了目标或者范围过大/过小，调整坐标后重新运行crop.py。

    ## 输出

    将最终结果整理为列表返回:
    - crop_name: <名称>, file: <输出文件名>, box: [X1, Y1, X2, Y2]

将

<page_image_path>

、

<SKILL_DIR>

、

<out_dir>

和裁剪详情替换为上下文中的实际值。

crop.py脚本会输出JSON结果:

{"path": "/abs/path/slide3_method_crop.png"}

收集所有子Agent的返回结果，构建映射关系：

slide_index → [裁剪文件名, ...]

，用于后续HTML中引用资源。裁剪后的文件名为

<name>_crop.png

。

如果裁剪任务来自独立页面，可以并行启动子Agent，等待所有子Agent执行完成后再进入下一阶段。

Phase 5 — Measure Cropped Image Dimensions

阶段5 — 测量裁剪后图片的尺寸

After cropping, get pixel dimensions:

bash

python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"

Use aspect ratios to pick each slide's layout:

Aspect ratio	Layout recommendation
< 0.7 (tall/narrow)	`text + image` side-by-side — `max-height: 600px` on image
0.7 – 1.3 (square-ish)	`text + image` — image takes ~50% width
> 1.3 (wide)	Image on top or bottom, text above/below
> 2.0 (very wide, e.g. tables)	`full-image` — spans full 1280px width, caption below

裁剪完成后，获取图片的像素尺寸:

bash

python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"

根据宽高比选择每张幻灯片的布局:

宽高比	布局建议
< 0.7（高/窄型）	文本+图片左右排版 — 图片设置 `max-height: 600px`
0.7 – 1.3（接近正方形）	文本+图片左右排版 — 图片占约50%宽度
> 1.3（宽型）	图片在上或在下，文本对应放在下方或上方
> 2.0（超宽型，例如表格）	全图布局 — 图片占满1280px宽度，下方配说明文字

Phase 6 — Generate Slides One by One

阶段6 — 逐张生成幻灯片

For each slide, write the HTML, save it to a temp file, then call

generate_slide.py

Step A — Write HTML to

/tmp/slide_N.html

All
```
<img src="...">
```
must use relative paths:
```
crops/<name>_crop.png
```
Do NOT use absolute paths or URLs for cropped images
Navigation is click-area based — no buttons needed:
- Clicking the left half of the slide navigates to the previous slide
- Clicking the right half of the slide navigates to the next slide
- On slide 1, left click does nothing; on the last slide, right click does nothing
- Keyboard
```
←
```
  /
```
→
```
  arrows also navigate
- Implement with two transparent
```
<div>
```
  overlays covering each half, positioned absolute over the slide canvas

Step B — Save slide:

bash

python {SKILL_DIR}/scripts/generate_slide.py \
    --html-file /tmp/slide_N.html \
    --index N \
    --total <total> \
    --title "<presentation title>" \
    --out-dir "<out_dir>/"

Repeat until all slides are saved.

对于每张幻灯片，先编写HTML，保存到临时文件，然后调用

generate_slide.py

。

步骤A — 编写HTML保存到

/tmp/slide_N.html

所有
```
<img src="...">
```
必须使用相对路径:
```
crops/<name>_crop.png
```
裁剪后的图片不要使用绝对路径或URL
导航基于点击区域实现，不需要额外按钮:
- 点击幻灯片左半部分跳转到上一张幻灯片
- 点击幻灯片右半部分跳转到下一张幻灯片
- 第一张幻灯片点击左半部分无响应；最后一张幻灯片点击右半部分无响应
- 键盘
```
←
```
  /
```
→
```
  方向键也可以实现导航
- 通过两个绝对定位的透明
```
<div>
```
  覆盖层实现，分别覆盖幻灯片的左右两半区域

步骤B — 保存幻灯片:

bash

python {SKILL_DIR}/scripts/generate_slide.py \
    --html-file /tmp/slide_N.html \
    --index N \
    --total <总幻灯片数> \
    --title "<演示文稿标题>" \
    --out-dir "<out_dir>/"

重复以上步骤直到所有幻灯片保存完成。

Phase 7 — Generate summary.md

阶段7 — 生成summary.md

Write

<out_dir>/summary.md

in the same language as the slides (

lang

from

outline.json

Include:

Document title and basic info (authors, venue, year if applicable)
Brief abstract/overview (2–3 sentences)
Per-slide breakdown table: slide number, title, 1–2 sentence summary
Main contributions or takeaways (bullet list)
Link to
```
slide_01.html
```
to open the first slide

Example structure:

markdown

undefined

使用和幻灯片相同的语言（对应

outline.json

中的

lang

字段）编写

<out_dir>/summary.md

。

内容包括:

文档标题和基础信息（如果适用，包含作者、发表场合、年份）
简短的摘要/概述（2-3句话）
每张幻灯片的明细表格：幻灯片序号、标题、1-2句话的内容摘要
核心贡献或要点（无序列表）
指向
```
slide_01.html
```
的链接，用于打开第一张幻灯片

示例结构:

markdown

undefined

[Presentation Title]

[演示文稿标题]

来源 / Source: [PDF filename] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10

来源 / Source: [PDF文件名] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10

摘要

[2-3 sentence overview]

[2-3句话的概述内容]

幻灯片概览

#	标题	主要内容
1	标题页	...
...

序号	标题	主要内容
1	标题页	...
...

主要贡献

📂 打开演示文稿

▶ 开始播放

---

▶ 开始播放

---

HTML Slide Spec

HTML幻灯片规范

Each slide is a standalone HTML file — full

<html>…</html>

with embedded CSS only.

Canvas: fixed

1280 × 720 px

overflow: hidden

— nothing scrolls.

Consistent design across all slides:

Choose a visual style that fits the document's domain and tone — no fixed palette or font required
If the user specifies a style, follow it exactly; otherwise infer from the content (e.g. a ML paper → clean modern; a historical report → editorial serif; a product pitch → bold and branded)
Same fonts, colors, and spacing system applied uniformly to every slide
Every slide shows: slide title, page counter (bottom-right corner), presentation title (subtle footer)

Navigation on each slide:

Two transparent click areas cover the full slide height: left 50% → previous slide, right 50% → next slide
On slide 1 the left area is inert; on the last slide the right area is inert
Keyboard
```
←
```
/
```
→
```
arrows also navigate
No visible buttons needed — optionally show a subtle
```
‹
```
/
```
›
```
hint at the edges that fades in on hover

Layout patterns:

```
title-card
```
— centered hero, large title, authors/venue below
```
text-only
```
— structured bullet points, max 5–6 items, generous whitespace
```
text + image
```
— image right or left, text opposite
```
full-image
```
— image fills canvas, minimal text overlay
```
grid
```
— 2×2 or 3-column figures with captions

Images:

Use relative paths:
```
crops/<name>_crop.png
```

Add

style="object-fit: contain; max-width: 100%; max-height: 100%;"

Add captions below in small italic text

Do NOT:

Use external JS frameworks or icon CDNs
Use placeholder/stock images — only the cropped PDFs
Generate generic purple-gradient-on-white slides
Let content overflow the 720px height

每张幻灯片是独立的HTML文件 — 完整的

<html>…</html>

结构，仅使用内嵌CSS。

画布: 固定尺寸

1280 × 720 px

，

overflow: hidden

— 无滚动内容。

所有幻灯片保持一致的设计风格:

选择匹配文档领域和调性的视觉风格，不要求固定的配色或字体
如果用户指定了风格，严格遵循；否则根据内容推断风格（例如机器学习论文→简洁现代风；历史报告→编辑衬线字体风；产品推介→醒目品牌风）
所有幻灯片统一使用相同的字体、配色和间距体系
每张幻灯片都要显示：幻灯片标题、页码计数器（右下角）、演示文稿标题（低调的页脚样式）

每张幻灯片的导航功能:

两个透明点击区域覆盖整个幻灯片高度：左50%→上一张，右50%→下一张
第一张幻灯片的左半区域无响应；最后一张幻灯片的右半区域无响应
键盘
```
←
```
/
```
→
```
方向键也支持导航
不需要显示可见按钮，可选择在边缘添加鼠标hover时淡入的低调
```
‹
```
/
```
›
```
提示

布局模式:

```
title-card
```
— 居中展示，大标题，下方显示作者/发表信息
```
text-only
```
— 结构化无序列表，最多5-6个条目，保留充足留白
```
text + image
```
— 图片在左或右，文本在另一侧
```
full-image
```
— 图片占满画布，仅保留最少的文字覆盖
```
grid
```
— 2×2或3列的图表布局，配说明文字

图片规范:

使用相对路径:
```
crops/<name>_crop.png
```

添加

style="object-fit: contain; max-width: 100%; max-height: 100%;"

属性

下方添加小号斜体的说明文字

禁止操作:

使用外部JS框架或图标CDN
使用占位图/图库图片 — 仅使用PDF裁剪得到的图片
生成通用的白底色紫渐变幻灯片
内容超出720px高度限制

Quality Checklist

质量检查清单

Output directory named
```
<pdf_stem>_<timestamp>/
```
```
outline.json
```
saved with valid SlidesPlan schema
All crops saved to
```
crops/
```
(local only, no cloud upload)
Each slide fits within 1280×720, nothing overflows
Consistent theme across all slides
Crop images referenced via relative path
```
crops/<name>_crop.png
```
Slide number and presentation title visible on every slide
Left/right click-area navigation works, keyboard arrows work
```
summary.md
```
written in the correct language, links to
```
slide_01.html
```

输出目录命名为
```
<pdf_stem>_<timestamp>/
```
格式
```
outline.json
```
已保存，且符合SlidesPlan schema规范
所有裁剪后的图片都保存到
```
crops/
```
目录（仅本地存储，不上传到云端）
每张幻灯片尺寸都在1280×720范围内，无内容溢出
所有幻灯片使用一致的主题风格
裁剪图片通过相对路径
```
crops/<name>_crop.png
```
引用
每张幻灯片都显示页码和演示文稿标题
左右点击区域导航正常，键盘方向键导航正常
```
summary.md
```
使用正确的语言编写，包含指向
```
slide_01.html
```
的链接

Language

语言规范

Match the PDF language. Chinese PDF → Chinese slides and summary. English → English. No mixing.

和PDF的语言保持一致。中文PDF→中文幻灯片和摘要；英文PDF→英文幻灯片和摘要，不要混合语言。