glmv-pdf-to-web
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePDF → Academic Project Website Skill
PDF → 学术项目网站技能
Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured is saved, images are cropped locally, and the final page is saved with .
outline.jsongenerate_web.pyScripts are in:
{SKILL_DIR}/scripts/将研究论文或技术文档PDF转换为精美的单页项目网站——即NeurIPS/CVPR/ICLR论文发布时使用的类型。页面以120 DPI在本地转换,保存结构化的,本地裁剪图片,最终页面通过保存。
outline.jsongenerate_web.py脚本存放路径:
{SKILL_DIR}/scripts/Dependencies
依赖
Python packages (install once):
bash
pip install pymupdf pillowSystem tools: (pre-installed on macOS/Linux).
curlPython包(仅需安装一次):
bash
pip install pymupdf pillow系统工具:(macOS/Linux系统已预装)。
curlWhen to Use
适用场景
Trigger when the user asks to create a webpage or project page from a PDF — phrases like:
"make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.
当用户要求从PDF创建网页或项目页面时触发,类似表述如下:
"make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", 或任何中英文的类似意图。
Output Directory Convention
输出目录约定
All output goes under :
{WORKSPACE}/web/<pdf_stem>_<timestamp>/web/
└── <pdf_stem>_<timestamp>/
├── outline.json ← structured web plan (WebPlan schema)
├── crops/ ← locally-saved cropped images
│ ├── fig_arch_crop.png
│ ├── table_results_crop.png
│ └── ...
└── index.html ← the website- = PDF filename without extension
<pdf_stem> - = format
<timestamp>YYYYMMDD_HHMMSS - HTML references images via relative path
crops/<name>_crop.png
所有输出都存放在目录下:
{WORKSPACE}/web/<pdf_stem>_<timestamp>/web/
└── <pdf_stem>_<timestamp>/
├── outline.json ← 结构化网站规划(WebPlan schema)
├── crops/ ← 本地保存的裁剪图片
│ ├── fig_arch_crop.png
│ ├── table_results_crop.png
│ └── ...
└── index.html ← 生成的网站- = 不带扩展名的PDF文件名
<pdf_stem> - = 格式为
<timestamp>YYYYMMDD_HHMMSS - HTML通过相对路径引用图片
crops/<name>_crop.png
Input
输入
$ARGUMENTS- If user provides a URL: download with curl first, then convert
- If user provides a local PDF path: convert directly
$ARGUMENTS- 如果用户提供URL:先使用curl下载,再进行转换
- 如果用户提供本地PDF路径:直接转换
Workflow
工作流程
Phase 0 — Create Output Directory
阶段0 — 创建输出目录
python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")bash
mkdir -p "<out_dir>/crops"python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")bash
mkdir -p "<out_dir>/crops"Phase 1 — Convert PDF Pages to Images (DPI 120)
阶段1 — 将PDF页面转换为图片(DPI 120)
If the input is a URL, download it first:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"Then convert (pass either the downloaded path or the original local path):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120Outputs JSON to stdout:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]Parse and store the full map.
page → path如果输入是URL,先下载:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"然后转换(传入下载后的路径或原始本地路径):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120输出JSON到标准输出:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]解析并存储完整的映射关系。
页面 → 路径Phase 2 — Read All Pages in Order
阶段2 — 按顺序读取所有页面
View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.
While reading, note:
- Title, authors, affiliations, venue, year
- Abstract text (verbatim)
- Key contributions
- Paper/Code/Dataset links (arXiv, GitHub, etc.)
- Figures, tables, diagrams — which pages, rough regions
- Teaser/hero figure if present
Do NOT plan sections yet — read everything first.
在规划前依次查看所有页面图片,目标是完全理解文档的内容、图表和结构。
阅读过程中记录:
- 标题、作者、所属机构、发表会议/期刊、年份
- 摘要文本(逐字复制)
- 核心贡献
- 论文/代码/数据集链接(arXiv、GitHub等)
- 图片、表格、示意图——所在页面、大致区域
- 预告/主视觉图(如果存在)
暂时不要规划章节,先通读全部内容。
Phase 3 — Plan Sections & Save outline.json
阶段3 — 规划章节并保存outline.json
Plan the website sections. Standard structure for academic papers (adapt as needed):
| Purpose |
|---|---|
| Title, authors, venue badge, link buttons |
| Full abstract text |
| 3–5 key contribution cards |
| Architecture figure + method explanation |
| Quantitative table + qualitative figures |
| Brief conclusion |
| BibTeX block |
For each section that needs an image, identify:
- Which page it comes from (the local page path from Phase 1)
- A description of what the visual shows and why it belongs in this section
Save as using exactly this schema:
<out_dir>/outline.jsonjson
{
"project_title": "Paper Title",
"lang": "English",
"authors": ["Author One", "Author Two"],
"sections_plan": [
{
"section_index": 1,
"section_id": "hero",
"title": "Hero",
"content": "Title, authors, venue, teaser figure description",
"required_images": [
{
"url": "<local_page_path_from_phase1>",
"visual_description": "Figure 1: teaser showing input-output examples",
"usage_reason": "Hero section visual to immediately show the paper's output"
}
]
}
]
}Field notes:
- :
langor"Chinese"— match the PDF language"English" - : empty array
required_imagesif section needs no images[] - : the local file path of the source page (from Phase 1
urlfield)path - For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4
Write using the Write tool to .
outline.json<out_dir>/outline.json规划网站章节。学术论文的标准结构(可根据需要调整):
| 用途 |
|---|---|
| 标题、作者、会议标识、链接按钮 |
| 完整摘要文本 |
| 3–5个核心贡献卡片 |
| 架构图 + 方法解释 |
| 量化表格 + 定性分析图表 |
| 简要结论 |
| BibTeX引用块 |
对于每个需要图片的章节,确认:
- 图片来自哪个页面(阶段1得到的本地页面路径)
- 视觉内容的描述,以及它属于该章节的原因
严格按照以下schema保存为:
<out_dir>/outline.jsonjson
{
"project_title": "Paper Title",
"lang": "English",
"authors": ["Author One", "Author Two"],
"sections_plan": [
{
"section_index": 1,
"section_id": "hero",
"title": "Hero",
"content": "Title, authors, venue, teaser figure description",
"required_images": [
{
"url": "<local_page_path_from_phase1>",
"visual_description": "Figure 1: teaser showing input-output examples",
"usage_reason": "Hero section visual to immediately show the paper's output"
}
]
}
]
}字段说明:
- :
lang或"Chinese"——与PDF语言一致"English" - :如果章节不需要图片则为空数组
required_images[] - :源页面的本地文件路径(来自阶段1的
url字段)path - 对于需要裁剪的图片,记录大致区域——精确裁剪框将在阶段4确定
使用Write工具将写入。
outline.json<out_dir>/outline.jsonPhase 4 — Crop Required Images (Grounding + Subagent)
阶段4 — 裁剪所需图片(定位 + 子Agent)
IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
{SKILL_DIR}/scripts/crop.pyRead . Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
outline.jsonUse the Agent tool like this:
Agent tool call:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = left/top edge of the image
- 999 = right/bottom edge of the image
- These are thousandths, NOT pixels, NOT percentages (0–100)
- Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
- Example: [0, 0, 500, 500] = top-left quarter of the image
4. Be precise: tightly bound the target element with a small margin (~10–20 units)
around it. Do NOT crop too wide or too narrow.
## Source image
<page_image_path>
## Crops needed
For each crop below, first do grounding (locate the element), then crop:
1. Name: "<descriptive_name>"
Target: "<visual_description from outline.json>"
Context: "<usage_reason from outline.json>"
## Crop command
After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## Verification
After each crop, READ the output image to visually verify the correct region
was captured. If the crop missed the target or is too wide/narrow, adjust the
coordinates and re-run crop.py.
## Output
Report the final results as a list:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]Replace , , , and crop details with actual values from your context.
<page_image_path><SKILL_DIR><out_dir>The crop.py script outputs JSON:
{"path": "/abs/path/<name>_crop.png"}Collect results from all subagents and build the mapping: to reference in HTML.
section_id → [crop filename, ...]Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.
重要提示:你必须使用Agent工具将所有裁剪工作委托给全新的子Agent完成。 到这个阶段你的上下文已经非常长(所有页面图片 + 大纲),这会降低视觉坐标的准确性。仅加载目标图片的全新子Agent可以生成更精确的坐标。
重要提示:所有图片裁剪必须使用提供的脚本。不要自行编写裁剪代码,不要直接使用PIL/Pillow,不要使用任何其他方法。
{SKILL_DIR}/scripts/crop.py读取,收集所有需要裁剪的内容,然后每个源页面启动一个子Agent(如果页面不同也可以每个裁剪任务启动一个)。子Agent使用定位式视觉定位——它查看图片,定位目标元素,输出0-999归一化坐标的精确边界框。
outline.json按如下方式使用Agent工具:
Agent工具调用:
description: "Grounding crop page N"
prompt: |
You are a visual grounding and cropping assistant. Your task is to precisely
locate specified visual elements in a page image and crop them out.
## Grounding method
Use visual grounding to locate each target:
1. Read the source image using the Read tool to view it
2. Identify the target element described below
3. Determine its bounding box as normalized coordinates in the 0–999 range:
- 0 = 图片左/上边缘
- 999 = 图片右/下边缘
- 这些是千分位单位,不是像素,也不是百分比(0-100)
- 格式:[x1, y1, x2, y2],其中(x1,y1)是左上角,(x2,y2)是右下角
- 示例:[0, 0, 500, 500] = 图片的左上四分之一区域
4. 保持精确:用约10-20单位的小边距紧密包裹目标元素,不要裁剪过宽或过窄。
## 源图片
<page_image_path>
## 需要裁剪的内容
对于下面的每个裁剪项,先进行视觉定位(找到元素),然后裁剪:
1. 名称:"<descriptive_name>"
目标:"<visual_description from outline.json>"
上下文:"<usage_reason from outline.json>"
## 裁剪命令
确定每个目标的边界框[X1, Y1, X2, Y2]后,运行:
```bash
python <SKILL_DIR>/scripts/crop.py \
--path "<page_image_path>" \
--box X1 Y1 X2 Y2 \
--name "<crop_name>" \
--out-dir "<out_dir>/crops"
```
## 验证
每次裁剪后,读取输出图片进行视觉验证,确认捕获了正确的区域。如果裁剪遗漏了目标或过宽/过窄,调整坐标并重新运行crop.py。
## 输出
按列表格式报告最终结果:
- crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]将、、和裁剪详情替换为你上下文中的实际值。
<page_image_path><SKILL_DIR><out_dir>crop.py脚本输出JSON:
{"path": "/abs/path/<name>_crop.png"}收集所有子Agent的结果,构建映射关系:,以便在HTML中引用。
section_id → [裁剪文件名, ...]尽可能并行启动独立页面的子Agent,等待所有子Agent完成后再继续。
Phase 5 — Measure Cropped Image Dimensions
阶段5 — 测量裁剪后图片的尺寸
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"| Aspect ratio | Layout recommendation |
|---|---|
| < 0.7 (tall/narrow) | |
| 0.7 – 1.3 (square-ish) | |
| > 1.3 (wide) | Full-width, |
| > 2.0 (very wide, e.g. tables) | Full-width with horizontal scroll fallback |
bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
if f.endswith('.png'):
w, h = Image.open(os.path.join(d, f)).size
sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"| 宽高比 | 布局建议 |
|---|---|
| < 0.7 (高/窄) | |
| 0.7 – 1.3 (接近正方形) | |
| > 1.3 (宽) | 全宽, |
| > 2.0 (极宽,例如表格) | 全宽,支持水平滚动 |
Phase 6 — Generate the Single-Page HTML
阶段6 — 生成单页HTML
Step A — Write HTML to
/tmp/website.html- All must use relative paths:
<img src="...">crops/<name>_crop.png - Do NOT use absolute paths
Step B — Save:
bash
python {SKILL_DIR}/scripts/generate_web.py \
--html-file /tmp/website.html \
--title "<paper title>" \
--out-dir "<out_dir>/"步骤A — 编写HTML到
/tmp/website.html- 所有必须使用相对路径:
<img src="...">crops/<name>_crop.png - 不要使用绝对路径
步骤B — 保存:
bash
python {SKILL_DIR}/scripts/generate_web.py \
--html-file /tmp/website.html \
--title "<paper title>" \
--out-dir "<out_dir>/"HTML Spec
HTML规范
A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.
Page layout:
- Max content width: , centered, comfortable side padding
900px - Sticky top nav with section anchor links + smooth scroll
- Looks good at 1200px wide; readable at 768px
Typography:
- Two Google Fonts: one for headings, one for body/UI
- Body: 17–18px, line-height 1.7
- Strong heading hierarchy (h1 >> h2 >> h3)
Visual style:
- If the user specifies a style, follow it exactly
- Otherwise, infer an appropriate aesthetic from the paper's domain and tone (e.g. CV/ML paper → clean modern academic; systems paper → dark technical; humanities → warm editorial serif)
- Define colors and fonts as CSS variables; no fixed palette or font choices are required
Section guidelines:
hero- Large title (2–3rem), authors list with affiliation superscripts, venue badge pill
- Link buttons: — grey out if no URL
[📄 Paper] [💻 Code] [🗄️ Dataset] - Teaser figure below (if found)
abstract- Verbatim text with subtle left border accent
contributions- Cards in a 2–3 column CSS grid, each with Unicode symbol + heading + description
method- Full-width architecture figure () + prose explanation
<figure><img><figcaption>
results- Quantitative table as real — use actual numbers from the PDF, best numbers bolded
<table> - Qualitative figures in a grid (2–4 images with captions)
conclusion- 2–3 paragraphs
citation- BibTeX block reconstructed from PDF metadata
<pre><code> - "Copy" button using vanilla JS
navigator.clipboard
Images:
- All use relative paths:
<img>crops/<name>_crop.png - Add and descriptive
loading="lazy"alt - Wrap in with
<figure><figcaption>
Animations (subtle only):
- Fade-in on scroll via + CSS transitions
IntersectionObserver - Hover states on buttons/cards
**单个独立HTML文件——仅嵌入CSS、最小化原生JS。无外部JS框架。可以使用Google Fonts CDN。
页面布局:
- 最大内容宽度:,居中,两侧留白充足
900px - 顶部固定导航栏,包含章节锚点链接 + 平滑滚动
- 1200px宽度下显示效果良好;768px宽度下可读性良好
排版:
- 两种Google Fonts:一种用于标题,一种用于正文/UI
- 正文:17–18px,行高1.7
- 清晰的标题层级(h1 >> h2 >> h3)
视觉风格:
- 如果用户指定了风格,严格遵循
- 否则,根据论文的领域和调性推断合适的美学风格(例如CV/ML论文→简洁现代学术风;系统论文→深色技术风;人文学科→温暖编辑衬线风)
- 将颜色和字体定义为CSS变量;不需要固定的调色板或字体选择
章节指南:
hero- 大标题(2–3rem)、作者列表带所属机构上标、会议标识标签
- 链接按钮:——如果没有对应URL则置灰
[📄 论文] [💻 代码] [🗄️ 数据集] - 下方放置主视觉图(如果找到)
abstract- 逐字的摘要文本,带细微的左侧边框装饰
contributions- 2–3列CSS网格布局的卡片,每个卡片包含Unicode符号 + 标题 + 描述
method- 全宽架构图()+ 文字说明
<figure><img><figcaption>
results- 量化表格使用真实的标签——使用PDF中的真实数值,最优数值加粗
<table> - 定性分析图表使用网格布局(2–4张带标题的图片)
conclusion- 2–3段文字
citation- 包裹的从PDF元数据重构的BibTeX块
<pre><code> - 使用原生JS实现的“复制”按钮
navigator.clipboard
图片:
- 所有使用相对路径:
<img>crops/<name>_crop.png - 添加和描述性
loading="lazy"属性alt - 用包裹并添加
<figure><figcaption>
动画(仅允许细微效果):
- 通过+ CSS过渡实现滚动时淡入效果
IntersectionObserver - 按钮/卡片的悬停状态
Quality Checklist
质量检查清单
- Output directory named
<pdf_stem>_<timestamp>/ - saved with valid WebPlan schema
outline.json - All crops saved to (local only)
crops/ - All metadata (title, authors, venue, year) from the PDF
- Abstract is verbatim
- Quantitative table has real numbers from the paper
- All crop images referenced via
crops/<name>_crop.png - BibTeX block accurate and copyable
- Nav anchors scroll to correct sections
- called and confirmed success
generate_web.py
- 输出目录命名为
<pdf_stem>_<timestamp>/ - 保存的符合WebPlan schema规范
outline.json - 所有裁剪图片保存到目录(仅本地)
crops/ - 所有元数据(标题、作者、会议、年份)均来自PDF
- 摘要逐字复制
- 量化表格使用论文中的真实数值
- 所有裁剪图片通过路径引用
crops/<name>_crop.png - BibTeX块准确且可复制
- 导航锚点滚动到正确的章节
- 已调用并确认执行成功
generate_web.py
Language
语言
Match the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.
与PDF语言保持一致。英文论文→英文网站。中文论文→中文网站。不要混合语言。