Loading...
Loading...
Convert a PDF (research paper, report, or any document) into a polished multi-slide HTML presentation with a structured outline JSON and summary markdown. Trigger this skill when the user mentions making slides or a PPT from a PDF — in Chinese or English.
npx skill4agent add zai-org/glm-skills glmv-pdf-to-pptoutline.jsonsummary.md{SKILL_DIR}/scripts/pip install pymupdf pillowcurl{WORKSPACE}/ppt/<pdf_stem>_<timestamp>/ppt/
└── <pdf_stem>_<timestamp>/
├── outline.json ← structured slide plan (SlidesPlan schema)
├── crops/ ← locally-saved cropped images
│ ├── slide3_method_crop.png
│ └── slide5_results_crop.png
├── slide_01.html
├── slide_02.html
├── ...
└── summary.md ← final summary document<pdf_stem><timestamp>YYYYMMDD_HHMMSS20240119_143022crops/crops/<name>.png$ARGUMENTSimport os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "ppt", f"{pdf_stem}_{timestamp}")mkdir -p "<out_dir>/crops"out_dirpdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120[{"page": 1, "path": "/abs/path/page_001.png"}, ...]page → path--pathcrop.py| Slide | Typical purpose |
|---|---|
| 1 | Title, authors, affiliation, venue/year |
| 2 | Motivation / Problem statement |
| 3 | Related Work (brief) |
| 4–N-2 | Method / Core contributions (one concept per slide) |
| N-1 | Results & Experiments |
| N | Conclusion & Future Work |
<out_dir>/outline.json{
"presentation_title": "Paper Title Here",
"lang": "Chinese",
"total_slides": 10,
"slides_plan": [
{
"slide_index": 1,
"title": "Slide Title",
"main_content": "Key points and text content for this slide",
"template_id": null,
"required_crops": [
{
"url": "<page_image_url_from_phase1>",
"visual_description": "Figure 3: architecture diagram showing encoder-decoder",
"usage_reason": "Illustrates the core model structure for slide 4"
}
]
}
]
}lang"Chinese""English"template_idnullrequired_crops[]urlpathvisual_descriptionusage_reasonoutline.json<out_dir>/outline.json{SKILL_DIR}/scripts/crop.pyoutline.jsonAgent tool call:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = left/top edge of the image
- 999 = right/bottom edge of the image
- These are thousandths, NOT pixels, NOT percentages (0–100)
- Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
- Example: [0, 0, 500, 500] = top-left quarter of the image
4. Be precise: tightly bound the target element with a small margin (~10–20 units)
around it. Do NOT crop too wide or too narrow.
## Source image
<page_image_path>
## Crops needed
For each crop below, first do grounding (locate the element), then crop:
1. Name: "slide<N>_<descriptive_name>"
Target: "<visual_description from outline.json>"
Context: "<usage_reason from outline.json>"
## Crop command
After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## Verification
After each crop, READ the output image to visually verify the correct region
was captured. If the crop missed the target or is too wide/narrow, adjust the
coordinates and re-run crop.py.
## Output
Report the final results as a list:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]<page_image_path><SKILL_DIR><out_dir>{"path": "/abs/path/slide3_method_crop.png"}slide_index → [crop filename, ...]<name>_crop.pngpython3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"| Aspect ratio | Layout recommendation |
|---|---|
| < 0.7 (tall/narrow) | |
| 0.7 – 1.3 (square-ish) | |
| > 1.3 (wide) | Image on top or bottom, text above/below |
| > 2.0 (very wide, e.g. tables) | |
generate_slide.py/tmp/slide_N.html<img src="...">crops/<name>_crop.png←→<div>python {SKILL_DIR}/scripts/generate_slide.py \
--html-file /tmp/slide_N.html \
--index N \
--total <total> \
--title "<presentation title>" \
--out-dir "<out_dir>/"<out_dir>/summary.mdlangoutline.jsonslide_01.html# [Presentation Title]
> **来源 / Source:** [PDF filename] | **语言 / Language:** Chinese | **幻灯片数 / Slides:** 10
## 摘要
[2-3 sentence overview]
## 幻灯片概览
| # | 标题 | 主要内容 |
|---|------|---------|
| 1 | 标题页 | ... |
...
## 主要贡献
- ...
## 📂 打开演示文稿
[▶ 开始播放](slide_01.html)<html>…</html>1280 × 720 pxoverflow: hidden←→‹›title-cardtext-onlytext + imagefull-imagegridcrops/<name>_crop.pngstyle="object-fit: contain; max-width: 100%; max-height: 100%;"<pdf_stem>_<timestamp>/outline.jsoncrops/crops/<name>_crop.pngsummary.mdslide_01.html