glmv-pdf-to-ppt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PDF → HTML PPT Skill

PDF → HTML PPT 技能

Convert any PDF into a multi-slide HTML presentation. Pages are converted to images at DPI 120, read sequentially to understand the content, then a structured
outline.json
is saved, images are cropped locally (no cloud upload), slides are rendered one by one, and finally a
summary.md
is generated.
Scripts are in:
{SKILL_DIR}/scripts/
将任意PDF转换为多页HTML演示文稿。PDF页面会以120 DPI的精度转换为图片,按顺序读取以理解内容,随后保存结构化的
outline.json
文件,所有图片都在本地裁剪(无需上传到云端),逐张渲染幻灯片,最终生成
summary.md
文件。
脚本存放路径:
{SKILL_DIR}/scripts/

Dependencies

依赖项

Python packages (install once):
bash
pip install pymupdf pillow
System tools:
curl
(pre-installed on macOS/Linux).
Python依赖包(仅需安装一次):
bash
pip install pymupdf pillow
系统工具:
curl
(macOS/Linux系统默认已安装)。

When to Use

触发时机

Trigger when the user asks to make slides or a presentation from a PDF — phrases like: "make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", or any similar intent in Chinese or English.
当用户要求基于PDF制作幻灯片或演示文稿时触发,包括但不限于以下中英文表述: "make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", 以及其他中英文的相似意图表述。

Output Directory Convention

输出目录规范

All output goes under
{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/
:
ppt/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← structured slide plan (SlidesPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── slide3_method_crop.png
    │   └── slide5_results_crop.png
    ├── slide_01.html
    ├── slide_02.html
    ├── ...
    └── summary.md          ← final summary document
  • <pdf_stem>
    = PDF filename without extension
  • <timestamp>
    = format
    YYYYMMDD_HHMMSS
    (e.g.
    20240119_143022
    )
  • Cropped images go in
    crops/
    subfolder
  • Each slide HTML references images via relative path
    crops/<name>.png
所有输出文件都保存在
{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/
路径下:
ppt/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← 结构化幻灯片大纲(遵循SlidesPlan schema)
    │   ├── slide3_method_crop.png
    │   └── slide5_results_crop.png
    ├── slide_01.html
    ├── slide_02.html
    ├── ...
    └── summary.md          ← 最终摘要文档
  • <pdf_stem>
    = 不带后缀的PDF文件名
  • <timestamp>
    = 格式为
    YYYYMMDD_HHMMSS
    的时间戳(例如
    20240119_143022
  • 裁剪后的图片存放在
    crops/
    子文件夹中
  • 每张幻灯片HTML通过相对路径
    crops/<name>.png
    引用图片资源

Input

输入

$ARGUMENTS
is the path to the PDF file (local) or an HTTP/HTTPS URL.
  • If user provides a URL: download with curl first, then convert
  • If user provides a local PDF path: convert directly

$ARGUMENTS
为本地PDF文件路径,或者HTTP/HTTPS格式的PDF链接。
  • 如果用户提供URL链接: 先用curl下载到本地,再进行转换
  • 如果用户提供本地PDF路径: 直接进行转换

Workflow

工作流程

Phase 0 — Create Output Directory

阶段0 — 创建输出目录

Compute the output path:
python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")
Create it immediately:
bash
mkdir -p "<out_dir>/crops"
Record
out_dir
— use it for all subsequent phases.

计算输出路径:
python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")
立即创建目录:
bash
mkdir -p "<out_dir>/crops"
记录
out_dir
路径,后续所有阶段都使用该路径作为输出根目录。

Phase 1 — Convert PDF Pages to Images (DPI 120)

阶段1 — 将PDF页面转换为图片(DPI 120)

If the input is a URL, download it first:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
Then convert (pass either the downloaded path or the original local path):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
Outputs JSON to stdout:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
Parse and store the full
page → path
map. These local paths are used for viewing pages and as
--path
input to
crop.py
.

如果输入是URL链接,先下载到本地:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
随后执行转换(传入下载后的路径或者原始本地路径均可):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
执行后会在标准输出返回JSON结果:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
解析并存储完整的
页码 → 图片路径
映射关系,这些本地路径将用于后续页面浏览,以及作为
crop.py
--path
输入参数。

Phase 2 — Read All Pages in Order

阶段2 — 按顺序读取所有页面内容

View all page images sequentially before planning anything. Your goal here is pure understanding — absorb the full structure, content, figures, and arguments of the document.
While reading, note:
  • What figures, charts, or tables appear on which pages
  • The overall arc (intro → method → results → conclusion for papers; or logical structure for other doc types)
  • Candidate visuals worth cropping for slides (page number + rough region)
Do NOT plan or write slides yet — just read and understand all pages first.

在规划任何内容之前,按顺序浏览所有页面图片。本阶段的目标是纯内容理解,完整掌握文档的结构、内容、图表和核心论点。
阅读过程中需要记录:
  • 哪些页面出现了哪些图表、流程图或表格
  • 文档的整体逻辑脉络(学术论文一般为引言→方法→结果→结论;其他类型文档对应各自的逻辑结构)
  • 适合裁剪放到幻灯片中的候选可视化素材(页码+大致区域)
本阶段不要规划或编写幻灯片内容,仅完成全文档阅读和理解即可。

Phase 3 — Plan Outline & Save outline.json

阶段3 — 规划大纲并保存outline.json

After reading all pages, plan 8–15 slides (adapt freely for non-academic documents).
SlideTypical purpose
1Title, authors, affiliation, venue/year
2Motivation / Problem statement
3Related Work (brief)
4–N-2Method / Core contributions (one concept per slide)
N-1Results & Experiments
NConclusion & Future Work
For each slide that needs a visual, identify:
  • Which page it comes from (the local page path from Phase 1)
  • A description of what the visual shows and why it belongs on this slide
Save the outline as
<out_dir>/outline.json
using exactly this schema:
json
{
  "presentation_title": "Paper Title Here",
  "lang": "Chinese",
  "total_slides": 10,
  "slides_plan": [
    {
      "slide_index": 1,
      "title": "Slide Title",
      "main_content": "Key points and text content for this slide",
      "template_id": null,
      "required_crops": [
        {
          "url": "<page_image_url_from_phase1>",
          "visual_description": "Figure 3: architecture diagram showing encoder-decoder",
          "usage_reason": "Illustrates the core model structure for slide 4"
        }
      ]
    }
  ]
}
Field notes:
  • lang
    :
    "Chinese"
    or
    "English"
    — match the PDF language
  • template_id
    : always
    null
  • required_crops
    : empty array
    []
    if this slide needs no images
  • url
    in each crop: the local file path of the source page image (from Phase 1
    path
    field) — this is what crop.py will open and crop from
  • visual_description
    : what the visual shows, including figure/table number if available
  • usage_reason
    : why this visual belongs on this particular slide
  • For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4
Write
outline.json
using the Write tool to
<out_dir>/outline.json
.

读完所有页面后,规划8-15张幻灯片(非学术类文档可灵活调整数量)。
幻灯片序号典型用途
1标题、作者、所属机构、发表场合/年份
2研究动机/问题定义
3相关工作(简述)
4–N-2研究方法/核心贡献(每张幻灯片讲解一个概念)
N-1结果与实验
N结论与未来工作
对于每张需要可视化素材的幻灯片,需要明确:
  • 素材来自哪一页(对应阶段1生成的本地页面图片路径)
  • 可视化素材的内容说明,以及该素材适合放在当前幻灯片的原因
将大纲保存为
<out_dir>/outline.json
,严格遵循以下schema:
json
{
  "presentation_title": "此处填写演示文稿标题",
  "lang": "Chinese",
  "total_slides": 10,
  "slides_plan": [
    {
      "slide_index": 1,
      "title": "幻灯片标题",
      "main_content": "本幻灯片的核心要点和文本内容",
      "template_id": null,
      "required_crops": [
        {
          "url": "<阶段1生成的页面图片路径>",
          "visual_description": "图3:展示编码器-解码器架构的流程图",
          "usage_reason": "用于说明第4张幻灯片的核心模型结构"
        }
      ]
    }
  ]
}
字段说明:
  • lang
    : 取值为
    "Chinese"
    "English"
    ,和PDF的语言保持一致
  • template_id
    : 始终设为
    null
  • required_crops
    : 如果当前幻灯片不需要图片,设为空数组
    []
  • 每个裁剪项的
    url
    : 源页面图片的本地文件路径(来自阶段1返回的
    path
    字段),是crop.py读取和裁剪的源文件路径
  • visual_description
    : 可视化素材的内容说明,如果有图表编号需要包含在内
  • usage_reason
    : 该素材适合放在当前幻灯片的原因
  • 对于需要裁剪的图片,仅需要记录大致区域,精确的裁剪框将在阶段4确定
使用Write工具将
outline.json
写入
<out_dir>/outline.json
路径。

Phase 4 — Crop Required Images (Grounding + Subagent)

阶段4 — 裁剪所需图片(Grounding + 子Agent)

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided
{SKILL_DIR}/scripts/crop.py
script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
Read
outline.json
. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
Use the Agent tool like this:
Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    <page_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "slide<N>_<descriptive_name>"
       Target: "<visual_description from outline.json>"
       Context: "<usage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]
Replace
<page_image_path>
,
<SKILL_DIR>
,
<out_dir>
, and crop details with actual values from your context.
The crop.py script outputs JSON:
{"path": "/abs/path/slide3_method_crop.png"}
Collect results from all subagents and build the mapping:
slide_index → [crop filename, ...]
to reference in HTML. The filename will be
<name>_crop.png
.
Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.

重要提示:你必须使用Agent工具将所有裁剪任务委托给全新的子Agent完成。 到本阶段时你的上下文已经非常长(包含所有页面图片+大纲),会降低视觉坐标识别的准确率。仅加载目标图片的全新子Agent可以生成精准度高得多的坐标。
重要提示:所有图片裁剪必须使用提供的
{SKILL_DIR}/scripts/crop.py
脚本完成。不要编写自己的裁剪代码,不要直接使用PIL/Pillow,不要使用任何其他裁剪方法。
读取
outline.json
,收集所有需要的裁剪任务,然后每个源页面启动一个子Agent(如果裁剪素材来自不同页面,也可以每个裁剪任务启动一个子Agent)。子Agent使用grounding风格的定位方法:浏览图片,定位目标元素,输出归一化0-999范围的精确边界框。
按以下格式调用Agent工具:
Agent工具调用:
  description: "Grounding crop page N"
  prompt: |
    你是视觉定位与裁剪助手,你的任务是在页面图片中精确定位指定的视觉元素并裁剪出来。

    ## 定位方法

    使用视觉定位能力定位每个目标元素:
    1. 使用Read工具读取源图片进行浏览
    2. 识别下文描述的目标元素
    3. 确定目标元素的边界框,使用0-999范围的归一化坐标:
       - 0 = 图片的左/上边缘
       - 999 = 图片的右/下边缘
       - 坐标单位为千分比,不是像素,也不是0-100的百分比
       - 格式: [x1, y1, x2, y2],其中(x1,y1)是左上角坐标,(x2,y2)是右下角坐标
       - 示例: [0, 0, 500, 500] = 图片左上角四分之一区域
    4. 保证精度:边界框紧贴目标元素,周围保留少量边距(约10-20单位),不要裁剪范围过大或过小。

    ## 源图片
    <page_image_path>

    ## 需要裁剪的内容

    对于下方列出的每个裁剪项,先执行定位(找到目标元素),再进行裁剪:

    1. 名称: "slide<N>_<描述性名称>"
       目标: "<outline.json中的visual_description字段内容>"
       上下文: "<outline.json中的usage_reason字段内容>"

    ## 裁剪命令

    确定每个目标的边界框[X1, Y1, X2, Y2]后,执行以下命令:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<裁剪项名称>" \
        --out-dir "<out_dir>/crops"
    ```

    ## 验证

    每次裁剪完成后,读取输出的图片进行视觉验证,确认裁剪的区域正确。如果裁剪遗漏了目标或者范围过大/过小,调整坐标后重新运行crop.py。

    ## 输出

    将最终结果整理为列表返回:
    - crop_name: <名称>, file: <输出文件名>, box: [X1, Y1, X2, Y2]
<page_image_path>
<SKILL_DIR>
<out_dir>
和裁剪详情替换为上下文中的实际值。
crop.py脚本会输出JSON结果:
{"path": "/abs/path/slide3_method_crop.png"}
收集所有子Agent的返回结果,构建映射关系:
slide_index → [裁剪文件名, ...]
,用于后续HTML中引用资源。裁剪后的文件名为
<name>_crop.png
如果裁剪任务来自独立页面,可以并行启动子Agent,等待所有子Agent执行完成后再进入下一阶段。

Phase 5 — Measure Cropped Image Dimensions

阶段5 — 测量裁剪后图片的尺寸

After cropping, get pixel dimensions:
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
Use aspect ratios to pick each slide's layout:
Aspect ratioLayout recommendation
< 0.7 (tall/narrow)
text + image
side-by-side —
max-height: 600px
on image
0.7 – 1.3 (square-ish)
text + image
— image takes ~50% width
> 1.3 (wide)Image on top or bottom, text above/below
> 2.0 (very wide, e.g. tables)
full-image
— spans full 1280px width, caption below

裁剪完成后,获取图片的像素尺寸:
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
根据宽高比选择每张幻灯片的布局:
宽高比布局建议
< 0.7(高/窄型)文本+图片左右排版 — 图片设置
max-height: 600px
0.7 – 1.3(接近正方形)文本+图片左右排版 — 图片占约50%宽度
> 1.3(宽型)图片在上或在下,文本对应放在下方或上方
> 2.0(超宽型,例如表格)全图布局 — 图片占满1280px宽度,下方配说明文字

Phase 6 — Generate Slides One by One

阶段6 — 逐张生成幻灯片

For each slide, write the HTML, save it to a temp file, then call
generate_slide.py
.
Step A — Write HTML to
/tmp/slide_N.html
  • All
    <img src="...">
    must use relative paths:
    crops/<name>_crop.png
  • Do NOT use absolute paths or URLs for cropped images
  • Navigation is click-area based — no buttons needed:
    • Clicking the left half of the slide navigates to the previous slide
    • Clicking the right half of the slide navigates to the next slide
    • On slide 1, left click does nothing; on the last slide, right click does nothing
    • Keyboard
      /
      arrows also navigate
    • Implement with two transparent
      <div>
      overlays covering each half, positioned absolute over the slide canvas
Step B — Save slide:
bash
python {SKILL_DIR}/scripts/generate_slide.py \
    --html-file /tmp/slide_N.html \
    --index N \
    --total <total> \
    --title "<presentation title>" \
    --out-dir "<out_dir>/"
Repeat until all slides are saved.

对于每张幻灯片,先编写HTML,保存到临时文件,然后调用
generate_slide.py
步骤A — 编写HTML保存到
/tmp/slide_N.html
  • 所有
    <img src="...">
    必须使用相对路径:
    crops/<name>_crop.png
  • 裁剪后的图片不要使用绝对路径或URL
  • 导航基于点击区域实现,不需要额外按钮:
    • 点击幻灯片左半部分跳转到上一张幻灯片
    • 点击幻灯片右半部分跳转到下一张幻灯片
    • 第一张幻灯片点击左半部分无响应;最后一张幻灯片点击右半部分无响应
    • 键盘
      /
      方向键也可以实现导航
    • 通过两个绝对定位的透明
      <div>
      覆盖层实现,分别覆盖幻灯片的左右两半区域
步骤B — 保存幻灯片:
bash
python {SKILL_DIR}/scripts/generate_slide.py \
    --html-file /tmp/slide_N.html \
    --index N \
    --total <总幻灯片数> \
    --title "<演示文稿标题>" \
    --out-dir "<out_dir>/"
重复以上步骤直到所有幻灯片保存完成。

Phase 7 — Generate summary.md

阶段7 — 生成summary.md

Write
<out_dir>/summary.md
in the same language as the slides (
lang
from
outline.json
).
Include:
  • Document title and basic info (authors, venue, year if applicable)
  • Brief abstract/overview (2–3 sentences)
  • Per-slide breakdown table: slide number, title, 1–2 sentence summary
  • Main contributions or takeaways (bullet list)
  • Link to
    slide_01.html
    to open the first slide
Example structure:
markdown
undefined
使用和幻灯片相同的语言(对应
outline.json
中的
lang
字段)编写
<out_dir>/summary.md
内容包括:
  • 文档标题和基础信息(如果适用,包含作者、发表场合、年份)
  • 简短的摘要/概述(2-3句话)
  • 每张幻灯片的明细表格:幻灯片序号、标题、1-2句话的内容摘要
  • 核心贡献或要点(无序列表)
  • 指向
    slide_01.html
    的链接,用于打开第一张幻灯片
示例结构:
markdown
undefined

[Presentation Title]

[演示文稿标题]

来源 / Source: [PDF filename] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10
来源 / Source: [PDF文件名] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10

摘要

摘要

[2-3 sentence overview]
[2-3句话的概述内容]

幻灯片概览

幻灯片概览

#标题主要内容
1标题页...
...
序号标题主要内容
1标题页...
...

主要贡献

主要贡献

  • ...
  • ...

📂 打开演示文稿

📂 打开演示文稿

▶ 开始播放

---
▶ 开始播放

---

HTML Slide Spec

HTML幻灯片规范

Each slide is a standalone HTML file — full
<html>…</html>
with embedded CSS only.
Canvas: fixed
1280 × 720 px
,
overflow: hidden
— nothing scrolls.
Consistent design across all slides:
  • Choose a visual style that fits the document's domain and tone — no fixed palette or font required
  • If the user specifies a style, follow it exactly; otherwise infer from the content (e.g. a ML paper → clean modern; a historical report → editorial serif; a product pitch → bold and branded)
  • Same fonts, colors, and spacing system applied uniformly to every slide
  • Every slide shows: slide title, page counter (bottom-right corner), presentation title (subtle footer)
Navigation on each slide:
  • Two transparent click areas cover the full slide height: left 50% → previous slide, right 50% → next slide
  • On slide 1 the left area is inert; on the last slide the right area is inert
  • Keyboard
    /
    arrows also navigate
  • No visible buttons needed — optionally show a subtle
    /
    hint at the edges that fades in on hover
Layout patterns:
  • title-card
    — centered hero, large title, authors/venue below
  • text-only
    — structured bullet points, max 5–6 items, generous whitespace
  • text + image
    — image right or left, text opposite
  • full-image
    — image fills canvas, minimal text overlay
  • grid
    — 2×2 or 3-column figures with captions
Images:
  • Use relative paths:
    crops/<name>_crop.png
  • Add
    style="object-fit: contain; max-width: 100%; max-height: 100%;"
  • Add captions below in small italic text
Do NOT:
  • Use external JS frameworks or icon CDNs
  • Use placeholder/stock images — only the cropped PDFs
  • Generate generic purple-gradient-on-white slides
  • Let content overflow the 720px height

每张幻灯片是独立的HTML文件 — 完整的
<html>…</html>
结构,仅使用内嵌CSS。
画布: 固定尺寸
1280 × 720 px
overflow: hidden
— 无滚动内容。
所有幻灯片保持一致的设计风格:
  • 选择匹配文档领域和调性的视觉风格,不要求固定的配色或字体
  • 如果用户指定了风格,严格遵循;否则根据内容推断风格(例如机器学习论文→简洁现代风;历史报告→编辑衬线字体风;产品推介→醒目品牌风)
  • 所有幻灯片统一使用相同的字体、配色和间距体系
  • 每张幻灯片都要显示:幻灯片标题、页码计数器(右下角)、演示文稿标题(低调的页脚样式)
每张幻灯片的导航功能:
  • 两个透明点击区域覆盖整个幻灯片高度:左50%→上一张,右50%→下一张
  • 第一张幻灯片的左半区域无响应;最后一张幻灯片的右半区域无响应
  • 键盘
    /
    方向键也支持导航
  • 不需要显示可见按钮,可选择在边缘添加鼠标hover时淡入的低调
    /
    提示
布局模式:
  • title-card
    — 居中展示,大标题,下方显示作者/发表信息
  • text-only
    — 结构化无序列表,最多5-6个条目,保留充足留白
  • text + image
    — 图片在左或右,文本在另一侧
  • full-image
    — 图片占满画布,仅保留最少的文字覆盖
  • grid
    — 2×2或3列的图表布局,配说明文字
图片规范:
  • 使用相对路径:
    crops/<name>_crop.png
  • 添加
    style="object-fit: contain; max-width: 100%; max-height: 100%;"
    属性
  • 下方添加小号斜体的说明文字
禁止操作:
  • 使用外部JS框架或图标CDN
  • 使用占位图/图库图片 — 仅使用PDF裁剪得到的图片
  • 生成通用的白底色紫渐变幻灯片
  • 内容超出720px高度限制

Quality Checklist

质量检查清单

  • Output directory named
    <pdf_stem>_<timestamp>/
  • outline.json
    saved with valid SlidesPlan schema
  • All crops saved to
    crops/
    (local only, no cloud upload)
  • Each slide fits within 1280×720, nothing overflows
  • Consistent theme across all slides
  • Crop images referenced via relative path
    crops/<name>_crop.png
  • Slide number and presentation title visible on every slide
  • Left/right click-area navigation works, keyboard arrows work
  • summary.md
    written in the correct language, links to
    slide_01.html

  • 输出目录命名为
    <pdf_stem>_<timestamp>/
    格式
  • outline.json
    已保存,且符合SlidesPlan schema规范
  • 所有裁剪后的图片都保存到
    crops/
    目录(仅本地存储,不上传到云端)
  • 每张幻灯片尺寸都在1280×720范围内,无内容溢出
  • 所有幻灯片使用一致的主题风格
  • 裁剪图片通过相对路径
    crops/<name>_crop.png
    引用
  • 每张幻灯片都显示页码和演示文稿标题
  • 左右点击区域导航正常,键盘方向键导航正常
  • summary.md
    使用正确的语言编写,包含指向
    slide_01.html
    的链接

Language

语言规范

Match the PDF language. Chinese PDF → Chinese slides and summary. English → English. No mixing.
和PDF的语言保持一致。中文PDF→中文幻灯片和摘要;英文PDF→英文幻灯片和摘要,不要混合语言。