glmv-pdf-to-ppt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePDF → HTML PPT Skill
PDF → HTML PPT 技能
Convert any PDF into a multi-slide HTML presentation. Pages are converted to images at DPI 120, read sequentially to understand the content, then a structured is saved, images are cropped locally (no cloud upload), slides are rendered one by one, and finally a is generated.
outline.jsonsummary.mdScripts are in:
{SKILL_DIR}/scripts/将任意PDF转换为多页HTML演示文稿。PDF页面会以120 DPI的精度转换为图片,按顺序读取以理解内容,随后保存结构化的文件,所有图片都在本地裁剪(无需上传到云端),逐张渲染幻灯片,最终生成文件。
outline.jsonsummary.md脚本存放路径:
{SKILL_DIR}/scripts/Dependencies
依赖项
Python packages (install once):
bash
pip install pymupdf pillowSystem tools: (pre-installed on macOS/Linux).
curlPython依赖包(仅需安装一次):
bash
pip install pymupdf pillow系统工具: (macOS/Linux系统默认已安装)。
curlWhen to Use
触发时机
Trigger when the user asks to make slides or a presentation from a PDF — phrases like:
"make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", or any similar intent in Chinese or English.
当用户要求基于PDF制作幻灯片或演示文稿时触发,包括但不限于以下中英文表述:
"make a PPT from a PDF", "convert PDF to slides", "create a presentation from this paper", "根据pdf做ppt", "根据论文做幻灯片", "做PPT", "做幻灯片", "生成演示文稿", "把这个pdf转成ppt", 以及其他中英文的相似意图表述。
Output Directory Convention
输出目录规范
All output goes under :
{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/ppt/
└── <pdf_stem>_<timestamp>/
├── outline.json ← structured slide plan (SlidesPlan schema)
├── crops/ ← locally-saved cropped images
│ ├── slide3_method_crop.png
│ └── slide5_results_crop.png
├── slide_01.html
├── slide_02.html
├── ...
└── summary.md ← final summary document- = PDF filename without extension
<pdf_stem> - = format
<timestamp>(e.g.YYYYMMDD_HHMMSS)20240119_143022 - Cropped images go in subfolder
crops/ - Each slide HTML references images via relative path
crops/<name>.png
所有输出文件都保存在路径下:
{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/ppt/
└── <pdf_stem>_<timestamp>/
├── outline.json ← 结构化幻灯片大纲(遵循SlidesPlan schema)
│ ├── slide3_method_crop.png
│ └── slide5_results_crop.png
├── slide_01.html
├── slide_02.html
├── ...
└── summary.md ← 最终摘要文档- = 不带后缀的PDF文件名
<pdf_stem> - = 格式为
<timestamp>的时间戳(例如YYYYMMDD_HHMMSS)20240119_143022 - 裁剪后的图片存放在子文件夹中
crops/ - 每张幻灯片HTML通过相对路径引用图片资源
crops/<name>.png
Input
输入
$ARGUMENTS- If user provides a URL: download with curl first, then convert
- If user provides a local PDF path: convert directly
$ARGUMENTS- 如果用户提供URL链接: 先用curl下载到本地,再进行转换
- 如果用户提供本地PDF路径: 直接进行转换
Workflow
工作流程
Phase 0 — Create Output Directory
阶段0 — 创建输出目录
Compute the output path:
python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")Create it immediately:
bash
mkdir -p "<out_dir>/crops"Record — use it for all subsequent phases.
out_dir计算输出路径:
python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")立即创建目录:
bash
mkdir -p "<out_dir>/crops"记录路径,后续所有阶段都使用该路径作为输出根目录。
out_dirPhase 1 — Convert PDF Pages to Images (DPI 120)
阶段1 — 将PDF页面转换为图片(DPI 120)
If the input is a URL, download it first:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"Then convert (pass either the downloaded path or the original local path):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120Outputs JSON to stdout:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]Parse and store the full map. These local paths are used for viewing pages and as input to .
page → path--pathcrop.py如果输入是URL链接,先下载到本地:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"随后执行转换(传入下载后的路径或者原始本地路径均可):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120执行后会在标准输出返回JSON结果:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]解析并存储完整的映射关系,这些本地路径将用于后续页面浏览,以及作为的输入参数。
页码 → 图片路径crop.py--pathPhase 2 — Read All Pages in Order
阶段2 — 按顺序读取所有页面内容
View all page images sequentially before planning anything. Your goal here is pure understanding — absorb the full structure, content, figures, and arguments of the document.
While reading, note:
- What figures, charts, or tables appear on which pages
- The overall arc (intro → method → results → conclusion for papers; or logical structure for other doc types)
- Candidate visuals worth cropping for slides (page number + rough region)
Do NOT plan or write slides yet — just read and understand all pages first.
在规划任何内容之前,按顺序浏览所有页面图片。本阶段的目标是纯内容理解,完整掌握文档的结构、内容、图表和核心论点。
阅读过程中需要记录:
- 哪些页面出现了哪些图表、流程图或表格
- 文档的整体逻辑脉络(学术论文一般为引言→方法→结果→结论;其他类型文档对应各自的逻辑结构)
- 适合裁剪放到幻灯片中的候选可视化素材(页码+大致区域)
本阶段不要规划或编写幻灯片内容,仅完成全文档阅读和理解即可。
Phase 3 — Plan Outline & Save outline.json
阶段3 — 规划大纲并保存outline.json
After reading all pages, plan 8–15 slides (adapt freely for non-academic documents).
| Slide | Typical purpose |
|---|---|
| 1 | Title, authors, affiliation, venue/year |
| 2 | Motivation / Problem statement |
| 3 | Related Work (brief) |
| 4–N-2 | Method / Core contributions (one concept per slide) |
| N-1 | Results & Experiments |
| N | Conclusion & Future Work |
For each slide that needs a visual, identify:
- Which page it comes from (the local page path from Phase 1)
- A description of what the visual shows and why it belongs on this slide
Save the outline as using exactly this schema:
<out_dir>/outline.jsonjson
{
"presentation_title": "Paper Title Here",
"lang": "Chinese",
"total_slides": 10,
"slides_plan": [
{
"slide_index": 1,
"title": "Slide Title",
"main_content": "Key points and text content for this slide",
"template_id": null,
"required_crops": [
{
"url": "<page_image_url_from_phase1>",
"visual_description": "Figure 3: architecture diagram showing encoder-decoder",
"usage_reason": "Illustrates the core model structure for slide 4"
}
]
}
]
}Field notes:
-
:
langor"Chinese"— match the PDF language"English" -
: always
template_idnull -
: empty array
required_cropsif this slide needs no images[] -
in each crop: the local file path of the source page image (from Phase 1
urlfield) — this is what crop.py will open and crop frompath -
: what the visual shows, including figure/table number if available
visual_description -
: why this visual belongs on this particular slide
usage_reason -
For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4
Write using the Write tool to .
outline.json<out_dir>/outline.json读完所有页面后,规划8-15张幻灯片(非学术类文档可灵活调整数量)。
| 幻灯片序号 | 典型用途 |
|---|---|
| 1 | 标题、作者、所属机构、发表场合/年份 |
| 2 | 研究动机/问题定义 |
| 3 | 相关工作(简述) |
| 4–N-2 | 研究方法/核心贡献(每张幻灯片讲解一个概念) |
| N-1 | 结果与实验 |
| N | 结论与未来工作 |
对于每张需要可视化素材的幻灯片,需要明确:
- 素材来自哪一页(对应阶段1生成的本地页面图片路径)
- 可视化素材的内容说明,以及该素材适合放在当前幻灯片的原因
将大纲保存为,严格遵循以下schema:
<out_dir>/outline.jsonjson
{
"presentation_title": "此处填写演示文稿标题",
"lang": "Chinese",
"total_slides": 10,
"slides_plan": [
{
"slide_index": 1,
"title": "幻灯片标题",
"main_content": "本幻灯片的核心要点和文本内容",
"template_id": null,
"required_crops": [
{
"url": "<阶段1生成的页面图片路径>",
"visual_description": "图3:展示编码器-解码器架构的流程图",
"usage_reason": "用于说明第4张幻灯片的核心模型结构"
}
]
}
]
}字段说明:
-
: 取值为
lang或"Chinese",和PDF的语言保持一致"English" -
: 始终设为
template_idnull -
: 如果当前幻灯片不需要图片,设为空数组
required_crops[] -
每个裁剪项的: 源页面图片的本地文件路径(来自阶段1返回的
url字段),是crop.py读取和裁剪的源文件路径path -
: 可视化素材的内容说明,如果有图表编号需要包含在内
visual_description -
: 该素材适合放在当前幻灯片的原因
usage_reason -
对于需要裁剪的图片,仅需要记录大致区域,精确的裁剪框将在阶段4确定
使用Write工具将写入路径。
outline.json<out_dir>/outline.jsonPhase 4 — Crop Required Images (Grounding + Subagent)
阶段4 — 裁剪所需图片(Grounding + 子Agent)
IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
{SKILL_DIR}/scripts/crop.pyRead . Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
outline.jsonUse the Agent tool like this:
Agent tool call:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = left/top edge of the image
- 999 = right/bottom edge of the image
- These are thousandths, NOT pixels, NOT percentages (0–100)
- Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
- Example: [0, 0, 500, 500] = top-left quarter of the image
4. Be precise: tightly bound the target element with a small margin (~10–20 units)
around it. Do NOT crop too wide or too narrow.
## Source image
<page_image_path>
## Crops needed
For each crop below, first do grounding (locate the element), then crop:
1. Name: "slide<N>_<descriptive_name>"
Target: "<visual_description from outline.json>"
Context: "<usage_reason from outline.json>"
## Crop command
After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## Verification
After each crop, READ the output image to visually verify the correct region
was captured. If the crop missed the target or is too wide/narrow, adjust the
coordinates and re-run crop.py.
## Output
Report the final results as a list:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]Replace , , , and crop details with actual values from your context.
<page_image_path><SKILL_DIR><out_dir>The crop.py script outputs JSON:
{"path": "/abs/path/slide3_method_crop.png"}Collect results from all subagents and build the mapping: to reference in HTML. The filename will be .
slide_index → [crop filename, ...]<name>_crop.pngLaunch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.
重要提示:你必须使用Agent工具将所有裁剪任务委托给全新的子Agent完成。 到本阶段时你的上下文已经非常长(包含所有页面图片+大纲),会降低视觉坐标识别的准确率。仅加载目标图片的全新子Agent可以生成精准度高得多的坐标。
重要提示:所有图片裁剪必须使用提供的脚本完成。不要编写自己的裁剪代码,不要直接使用PIL/Pillow,不要使用任何其他裁剪方法。
{SKILL_DIR}/scripts/crop.py读取,收集所有需要的裁剪任务,然后每个源页面启动一个子Agent(如果裁剪素材来自不同页面,也可以每个裁剪任务启动一个子Agent)。子Agent使用grounding风格的定位方法:浏览图片,定位目标元素,输出归一化0-999范围的精确边界框。
outline.json按以下格式调用Agent工具:
Agent工具调用:
description: "Grounding crop page N"
prompt: |
你是视觉定位与裁剪助手,你的任务是在页面图片中精确定位指定的视觉元素并裁剪出来。
## 定位方法
使用视觉定位能力定位每个目标元素:
1. 使用Read工具读取源图片进行浏览
2. 识别下文描述的目标元素
3. 确定目标元素的边界框,使用0-999范围的归一化坐标:
- 0 = 图片的左/上边缘
- 999 = 图片的右/下边缘
- 坐标单位为千分比,不是像素,也不是0-100的百分比
- 格式: [x1, y1, x2, y2],其中(x1,y1)是左上角坐标,(x2,y2)是右下角坐标
- 示例: [0, 0, 500, 500] = 图片左上角四分之一区域
4. 保证精度:边界框紧贴目标元素,周围保留少量边距(约10-20单位),不要裁剪范围过大或过小。
## 源图片
<page_image_path>
## 需要裁剪的内容
对于下方列出的每个裁剪项,先执行定位(找到目标元素),再进行裁剪:
1. 名称: "slide<N>_<描述性名称>"
目标: "<outline.json中的visual_description字段内容>"
上下文: "<outline.json中的usage_reason字段内容>"
## 裁剪命令
确定每个目标的边界框[X1, Y1, X2, Y2]后,执行以下命令:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<裁剪项名称>" \
--out-dir "<out_dir>/crops"
```
## 验证
每次裁剪完成后,读取输出的图片进行视觉验证,确认裁剪的区域正确。如果裁剪遗漏了目标或者范围过大/过小,调整坐标后重新运行crop.py。
## 输出
将最终结果整理为列表返回:
- crop_name: <名称>, file: <输出文件名>, box: [X1, Y1, X2, Y2]将、、和裁剪详情替换为上下文中的实际值。
<page_image_path><SKILL_DIR><out_dir>crop.py脚本会输出JSON结果:
{"path": "/abs/path/slide3_method_crop.png"}收集所有子Agent的返回结果,构建映射关系:,用于后续HTML中引用资源。裁剪后的文件名为。
slide_index → [裁剪文件名, ...]<name>_crop.png如果裁剪任务来自独立页面,可以并行启动子Agent,等待所有子Agent执行完成后再进入下一阶段。
Phase 5 — Measure Cropped Image Dimensions
阶段5 — 测量裁剪后图片的尺寸
After cropping, get pixel dimensions:
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"Use aspect ratios to pick each slide's layout:
| Aspect ratio | Layout recommendation |
|---|---|
| < 0.7 (tall/narrow) | |
| 0.7 – 1.3 (square-ish) | |
| > 1.3 (wide) | Image on top or bottom, text above/below |
| > 2.0 (very wide, e.g. tables) | |
裁剪完成后,获取图片的像素尺寸:
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"根据宽高比选择每张幻灯片的布局:
| 宽高比 | 布局建议 |
|---|---|
| < 0.7(高/窄型) | 文本+图片左右排版 — 图片设置 |
| 0.7 – 1.3(接近正方形) | 文本+图片左右排版 — 图片占约50%宽度 |
| > 1.3(宽型) | 图片在上或在下,文本对应放在下方或上方 |
| > 2.0(超宽型,例如表格) | 全图布局 — 图片占满1280px宽度,下方配说明文字 |
Phase 6 — Generate Slides One by One
阶段6 — 逐张生成幻灯片
For each slide, write the HTML, save it to a temp file, then call .
generate_slide.pyStep A — Write HTML to
/tmp/slide_N.html- All must use relative paths:
<img src="...">crops/<name>_crop.png - Do NOT use absolute paths or URLs for cropped images
- Navigation is click-area based — no buttons needed:
- Clicking the left half of the slide navigates to the previous slide
- Clicking the right half of the slide navigates to the next slide
- On slide 1, left click does nothing; on the last slide, right click does nothing
- Keyboard /
←arrows also navigate→ - Implement with two transparent overlays covering each half, positioned absolute over the slide canvas
<div>
Step B — Save slide:
bash
python {SKILL_DIR}/scripts/generate_slide.py \
--html-file /tmp/slide_N.html \
--index N \
--total <total> \
--title "<presentation title>" \
--out-dir "<out_dir>/"Repeat until all slides are saved.
对于每张幻灯片,先编写HTML,保存到临时文件,然后调用。
generate_slide.py步骤A — 编写HTML保存到
/tmp/slide_N.html- 所有必须使用相对路径:
<img src="...">crops/<name>_crop.png - 裁剪后的图片不要使用绝对路径或URL
- 导航基于点击区域实现,不需要额外按钮:
- 点击幻灯片左半部分跳转到上一张幻灯片
- 点击幻灯片右半部分跳转到下一张幻灯片
- 第一张幻灯片点击左半部分无响应;最后一张幻灯片点击右半部分无响应
- 键盘/
←方向键也可以实现导航→ - 通过两个绝对定位的透明覆盖层实现,分别覆盖幻灯片的左右两半区域
<div>
步骤B — 保存幻灯片:
bash
python {SKILL_DIR}/scripts/generate_slide.py \
--html-file /tmp/slide_N.html \
--index N \
--total <总幻灯片数> \
--title "<演示文稿标题>" \
--out-dir "<out_dir>/"重复以上步骤直到所有幻灯片保存完成。
Phase 7 — Generate summary.md
阶段7 — 生成summary.md
Write in the same language as the slides ( from ).
<out_dir>/summary.mdlangoutline.jsonInclude:
- Document title and basic info (authors, venue, year if applicable)
- Brief abstract/overview (2–3 sentences)
- Per-slide breakdown table: slide number, title, 1–2 sentence summary
- Main contributions or takeaways (bullet list)
- Link to to open the first slide
slide_01.html
Example structure:
markdown
undefined使用和幻灯片相同的语言(对应中的字段)编写。
outline.jsonlang<out_dir>/summary.md内容包括:
- 文档标题和基础信息(如果适用,包含作者、发表场合、年份)
- 简短的摘要/概述(2-3句话)
- 每张幻灯片的明细表格:幻灯片序号、标题、1-2句话的内容摘要
- 核心贡献或要点(无序列表)
- 指向的链接,用于打开第一张幻灯片
slide_01.html
示例结构:
markdown
undefined[Presentation Title]
[演示文稿标题]
来源 / Source: [PDF filename] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10
来源 / Source: [PDF文件名] | 语言 / Language: Chinese | 幻灯片数 / Slides: 10
摘要
摘要
[2-3 sentence overview]
[2-3句话的概述内容]
幻灯片概览
幻灯片概览
| # | 标题 | 主要内容 |
|---|---|---|
| 1 | 标题页 | ... |
| ... |
| 序号 | 标题 | 主要内容 |
|---|---|---|
| 1 | 标题页 | ... |
| ... |
主要贡献
主要贡献
- ...
- ...
📂 打开演示文稿
📂 打开演示文稿
▶ 开始播放
---▶ 开始播放
---HTML Slide Spec
HTML幻灯片规范
Each slide is a standalone HTML file — full with embedded CSS only.
<html>…</html>Canvas: fixed , — nothing scrolls.
1280 × 720 pxoverflow: hiddenConsistent design across all slides:
- Choose a visual style that fits the document's domain and tone — no fixed palette or font required
- If the user specifies a style, follow it exactly; otherwise infer from the content (e.g. a ML paper → clean modern; a historical report → editorial serif; a product pitch → bold and branded)
- Same fonts, colors, and spacing system applied uniformly to every slide
- Every slide shows: slide title, page counter (bottom-right corner), presentation title (subtle footer)
Navigation on each slide:
- Two transparent click areas cover the full slide height: left 50% → previous slide, right 50% → next slide
- On slide 1 the left area is inert; on the last slide the right area is inert
- Keyboard /
←arrows also navigate→ - No visible buttons needed — optionally show a subtle /
‹hint at the edges that fades in on hover›
Layout patterns:
- — centered hero, large title, authors/venue below
title-card - — structured bullet points, max 5–6 items, generous whitespace
text-only - — image right or left, text opposite
text + image - — image fills canvas, minimal text overlay
full-image - — 2×2 or 3-column figures with captions
grid
Images:
- Use relative paths:
crops/<name>_crop.png - Add
style="object-fit: contain; max-width: 100%; max-height: 100%;" - Add captions below in small italic text
Do NOT:
- Use external JS frameworks or icon CDNs
- Use placeholder/stock images — only the cropped PDFs
- Generate generic purple-gradient-on-white slides
- Let content overflow the 720px height
每张幻灯片是独立的HTML文件 — 完整的结构,仅使用内嵌CSS。
<html>…</html>画布: 固定尺寸, — 无滚动内容。
1280 × 720 pxoverflow: hidden所有幻灯片保持一致的设计风格:
- 选择匹配文档领域和调性的视觉风格,不要求固定的配色或字体
- 如果用户指定了风格,严格遵循;否则根据内容推断风格(例如机器学习论文→简洁现代风;历史报告→编辑衬线字体风;产品推介→醒目品牌风)
- 所有幻灯片统一使用相同的字体、配色和间距体系
- 每张幻灯片都要显示:幻灯片标题、页码计数器(右下角)、演示文稿标题(低调的页脚样式)
每张幻灯片的导航功能:
- 两个透明点击区域覆盖整个幻灯片高度:左50%→上一张,右50%→下一张
- 第一张幻灯片的左半区域无响应;最后一张幻灯片的右半区域无响应
- 键盘/
←方向键也支持导航→ - 不需要显示可见按钮,可选择在边缘添加鼠标hover时淡入的低调/
‹提示›
布局模式:
- — 居中展示,大标题,下方显示作者/发表信息
title-card - — 结构化无序列表,最多5-6个条目,保留充足留白
text-only - — 图片在左或右,文本在另一侧
text + image - — 图片占满画布,仅保留最少的文字覆盖
full-image - — 2×2或3列的图表布局,配说明文字
grid
图片规范:
- 使用相对路径:
crops/<name>_crop.png - 添加属性
style="object-fit: contain; max-width: 100%; max-height: 100%;" - 下方添加小号斜体的说明文字
禁止操作:
- 使用外部JS框架或图标CDN
- 使用占位图/图库图片 — 仅使用PDF裁剪得到的图片
- 生成通用的白底色紫渐变幻灯片
- 内容超出720px高度限制
Quality Checklist
质量检查清单
- Output directory named
<pdf_stem>_<timestamp>/ - saved with valid SlidesPlan schema
outline.json - All crops saved to (local only, no cloud upload)
crops/ - Each slide fits within 1280×720, nothing overflows
- Consistent theme across all slides
- Crop images referenced via relative path
crops/<name>_crop.png - Slide number and presentation title visible on every slide
- Left/right click-area navigation works, keyboard arrows work
- written in the correct language, links to
summary.mdslide_01.html
- 输出目录命名为格式
<pdf_stem>_<timestamp>/ - 已保存,且符合SlidesPlan schema规范
outline.json - 所有裁剪后的图片都保存到目录(仅本地存储,不上传到云端)
crops/ - 每张幻灯片尺寸都在1280×720范围内,无内容溢出
- 所有幻灯片使用一致的主题风格
- 裁剪图片通过相对路径引用
crops/<name>_crop.png - 每张幻灯片都显示页码和演示文稿标题
- 左右点击区域导航正常,键盘方向键导航正常
- 使用正确的语言编写,包含指向
summary.md的链接slide_01.html
Language
语言规范
Match the PDF language. Chinese PDF → Chinese slides and summary. English → English. No mixing.
和PDF的语言保持一致。中文PDF→中文幻灯片和摘要;英文PDF→英文幻灯片和摘要,不要混合语言。