glmv-pdf-to-web

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PDF → Academic Project Website Skill

PDF → 学术项目网站技能

Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured

outline.json

is saved, images are cropped locally, and the final page is saved with

generate_web.py

Scripts are in:

{SKILL_DIR}/scripts/

将研究论文或技术文档PDF转换为精美的单页项目网站——即NeurIPS/CVPR/ICLR论文发布时使用的类型。页面以120 DPI在本地转换，保存结构化的

outline.json

，本地裁剪图片，最终页面通过

generate_web.py

保存。

脚本存放路径：

{SKILL_DIR}/scripts/

Dependencies

依赖

Python packages (install once):

bash

pip install pymupdf pillow

System tools:

curl

(pre-installed on macOS/Linux).

Python包（仅需安装一次）：

bash

pip install pymupdf pillow

系统工具：

curl

（macOS/Linux系统已预装）。

When to Use

适用场景

Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.

当用户要求从PDF创建网页或项目页面时触发，类似表述如下： "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", 或任何中英文的类似意图。

Output Directory Convention

输出目录约定

All output goes under

{WORKSPACE}/web/<pdf_stem>_<timestamp>/

web/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← structured web plan (WebPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← the website

```
<pdf_stem>
```
= PDF filename without extension
```
<timestamp>
```
= format
```
YYYYMMDD_HHMMSS
```
HTML references images via relative path
```
crops/<name>_crop.png
```

所有输出都存放在

{WORKSPACE}/web/<pdf_stem>_<timestamp>/

目录下：

web/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← 结构化网站规划（WebPlan schema）
    ├── crops/              ← 本地保存的裁剪图片
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← 生成的网站

```
<pdf_stem>
```
= 不带扩展名的PDF文件名
```
<timestamp>
```
= 格式为
```
YYYYMMDD_HHMMSS
```
HTML通过相对路径
```
crops/<name>_crop.png
```
引用图片

Input

输入

$ARGUMENTS

is the path to the PDF file (local) or an HTTP/HTTPS URL.

If user provides a URL: download with curl first, then convert
If user provides a local PDF path: convert directly

$ARGUMENTS

是PDF文件的本地路径或HTTP/HTTPS URL。

如果用户提供URL：先使用curl下载，再进行转换
如果用户提供本地PDF路径：直接转换

Workflow

工作流程

Phase 0 — Create Output Directory

阶段0 — 创建输出目录

python

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")

bash

mkdir -p "<out_dir>/crops"

python

import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")

bash

mkdir -p "<out_dir>/crops"

Phase 1 — Convert PDF Pages to Images (DPI 120)

阶段1 — 将PDF页面转换为图片（DPI 120）

If the input is a URL, download it first:

bash

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

Then convert (pass either the downloaded path or the original local path):

bash

python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120

Outputs JSON to stdout:

json

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

Parse and store the full

page → path

map.

如果输入是URL，先下载：

bash

pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"

然后转换（传入下载后的路径或原始本地路径）：

bash

python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120

输出JSON到标准输出：

json

[{"page": 1, "path": "/abs/path/page_001.png"}, ...]

解析并存储完整的

页面 → 路径

映射关系。

Phase 2 — Read All Pages in Order

阶段2 — 按顺序读取所有页面

View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.

While reading, note:

Title, authors, affiliations, venue, year
Abstract text (verbatim)
Key contributions
Paper/Code/Dataset links (arXiv, GitHub, etc.)
Figures, tables, diagrams — which pages, rough regions
Teaser/hero figure if present

Do NOT plan sections yet — read everything first.

在规划前依次查看所有页面图片，目标是完全理解文档的内容、图表和结构。

阅读过程中记录：

标题、作者、所属机构、发表会议/期刊、年份
摘要文本（逐字复制）
核心贡献
论文/代码/数据集链接（arXiv、GitHub等）
图片、表格、示意图——所在页面、大致区域
预告/主视觉图（如果存在）

暂时不要规划章节，先通读全部内容。

Phase 3 — Plan Sections & Save outline.json

阶段3 — 规划章节并保存outline.json

Plan the website sections. Standard structure for academic papers (adapt as needed):

`section_id`	Purpose
`hero`	Title, authors, venue badge, link buttons
`abstract`	Full abstract text
`contributions`	3–5 key contribution cards
`method`	Architecture figure + method explanation
`results`	Quantitative table + qualitative figures
`conclusion`	Brief conclusion
`citation`	BibTeX block

For each section that needs an image, identify:

Which page it comes from (the local page path from Phase 1)
A description of what the visual shows and why it belongs in this section

Save as
<out_dir>/outline.json
using exactly this schema:

json

{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "<local_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}

Field notes:

```
lang
```
:
```
"Chinese"
```
or
```
"English"
```
— match the PDF language
```
required_images
```
: empty array
```
[]
```
if section needs no images
```
url
```
: the local file path of the source page (from Phase 1
```
path
```
field)
For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4

Write

outline.json

using the Write tool to

<out_dir>/outline.json

规划网站章节。学术论文的标准结构（可根据需要调整）：

`section_id`	用途
`hero`	标题、作者、会议标识、链接按钮
`abstract`	完整摘要文本
`contributions`	3–5个核心贡献卡片
`method`	架构图 + 方法解释
`results`	量化表格 + 定性分析图表
`conclusion`	简要结论
`citation`	BibTeX引用块

对于每个需要图片的章节，确认：

图片来自哪个页面（阶段1得到的本地页面路径）
视觉内容的描述，以及它属于该章节的原因

严格按照以下schema保存为
<out_dir>/outline.json
：

json

{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "<local_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}

字段说明：

```
lang
```
：
```
"Chinese"
```
或
```
"English"
```
——与PDF语言一致
```
required_images
```
：如果章节不需要图片则为空数组
```
[]
```
```
url
```
：源页面的本地文件路径（来自阶段1的
```
path
```
字段）
对于需要裁剪的图片，记录大致区域——精确裁剪框将在阶段4确定

使用Write工具将

outline.json

写入

<out_dir>/outline.json

。

Phase 4 — Crop Required Images (Grounding + Subagent)

阶段4 — 裁剪所需图片（定位 + 子Agent）

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.

IMPORTANT: You MUST use the provided
{SKILL_DIR}/scripts/crop.py
script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.

Read

outline.json

. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.

Use the Agent tool like this:

Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    <page_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "<descriptive_name>"
       Target: "<visual_description from outline.json>"
       Context: "<usage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]

Replace

<page_image_path>

<SKILL_DIR>

<out_dir>

, and crop details with actual values from your context.

The crop.py script outputs JSON:

{"path": "/abs/path/<name>_crop.png"}

Collect results from all subagents and build the mapping:

section_id → [crop filename, ...]

to reference in HTML.

Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.

重要提示：你必须使用Agent工具将所有裁剪工作委托给全新的子Agent完成。 到这个阶段你的上下文已经非常长（所有页面图片 + 大纲），这会降低视觉坐标的准确性。仅加载目标图片的全新子Agent可以生成更精确的坐标。

重要提示：所有图片裁剪必须使用提供的
{SKILL_DIR}/scripts/crop.py
脚本。不要自行编写裁剪代码，不要直接使用PIL/Pillow，不要使用任何其他方法。

读取

outline.json

，收集所有需要裁剪的内容，然后每个源页面启动一个子Agent（如果页面不同也可以每个裁剪任务启动一个）。子Agent使用定位式视觉定位——它查看图片，定位目标元素，输出0-999归一化坐标的精确边界框。

按如下方式使用Agent工具：

Agent工具调用：
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = 图片左/上边缘
       - 999 = 图片右/下边缘
       - 这些是千分位单位，不是像素，也不是百分比（0-100）
       - 格式：[x1, y1, x2, y2]，其中(x1,y1)是左上角，(x2,y2)是右下角
       - 示例：[0, 0, 500, 500] = 图片的左上四分之一区域
    4. 保持精确：用约10-20单位的小边距紧密包裹目标元素，不要裁剪过宽或过窄。

    ## 源图片
    <page_image_path>

    ## 需要裁剪的内容

    对于下面的每个裁剪项，先进行视觉定位（找到元素），然后裁剪：

    1. 名称："<descriptive_name>"
       目标："<visual_description from outline.json>"
       上下文："<usage_reason from outline.json>"

    ## 裁剪命令

    确定每个目标的边界框[X1, Y1, X2, Y2]后，运行：
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## 验证

    每次裁剪后，读取输出图片进行视觉验证，确认捕获了正确的区域。如果裁剪遗漏了目标或过宽/过窄，调整坐标并重新运行crop.py。

    ## 输出

    按列表格式报告最终结果：
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]

将

<page_image_path>

、

<SKILL_DIR>

、

<out_dir>

和裁剪详情替换为你上下文中的实际值。

crop.py脚本输出JSON：

{"path": "/abs/path/<name>_crop.png"}

收集所有子Agent的结果，构建映射关系：

section_id → [裁剪文件名, ...]

，以便在HTML中引用。

尽可能并行启动独立页面的子Agent，等待所有子Agent完成后再继续。

Phase 5 — Measure Cropped Image Dimensions

阶段5 — 测量裁剪后图片的尺寸

bash

python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"

Aspect ratio	Layout recommendation
< 0.7 (tall/narrow)	`max-width: 400–500px` , centered
0.7 – 1.3 (square-ish)	`max-width: 600–700px`
> 1.3 (wide)	Full-width, `max-width: 100%`
> 2.0 (very wide, e.g. tables)	Full-width with horizontal scroll fallback

bash

python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"

宽高比	布局建议
< 0.7 (高/窄)	`max-width: 400–500px` ，居中
0.7 – 1.3 (接近正方形)	`max-width: 600–700px`
> 1.3 (宽)	全宽， `max-width: 100%`
> 2.0 (极宽，例如表格)	全宽，支持水平滚动

Phase 6 — Generate the Single-Page HTML

阶段6 — 生成单页HTML

Step A — Write HTML to

/tmp/website.html

All
```
<img src="...">
```
must use relative paths:
```
crops/<name>_crop.png
```
Do NOT use absolute paths

Step B — Save:

bash

python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "<paper title>" \
    --out-dir "<out_dir>/"

步骤A — 编写HTML到

/tmp/website.html

所有
```
<img src="...">
```
必须使用相对路径：
```
crops/<name>_crop.png
```
不要使用绝对路径

步骤B — 保存：

bash

python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "<paper title>" \
    --out-dir "<out_dir>/"

HTML Spec

HTML规范

A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.

Page layout:

Max content width:
```
900px
```
, centered, comfortable side padding
Sticky top nav with section anchor links + smooth scroll
Looks good at 1200px wide; readable at 768px

Typography:

Two Google Fonts: one for headings, one for body/UI
Body: 17–18px, line-height 1.7
Strong heading hierarchy (h1 >> h2 >> h3)

Visual style:

If the user specifies a style, follow it exactly
Otherwise, infer an appropriate aesthetic from the paper's domain and tone (e.g. CV/ML paper → clean modern academic; systems paper → dark technical; humanities → warm editorial serif)
Define colors and fonts as CSS variables; no fixed palette or font choices are required

Section guidelines:

hero

Large title (2–3rem), authors list with affiliation superscripts, venue badge pill

Link buttons:

[📄 Paper] [💻 Code] [🗄️ Dataset]

— grey out if no URL

Teaser figure below (if found)

abstract

Verbatim text with subtle left border accent

contributions

Cards in a 2–3 column CSS grid, each with Unicode symbol + heading + description

method

Full-width architecture figure (
```
<figure><img><figcaption>
```
) + prose explanation

results

Quantitative table as real
```
<table>
```
— use actual numbers from the PDF, best numbers bolded
Qualitative figures in a grid (2–4 images with captions)

conclusion

2–3 paragraphs

citation

```
<pre><code>
```
BibTeX block reconstructed from PDF metadata
"Copy" button using
```
navigator.clipboard
```
vanilla JS

Images:

All
```
<img>
```
use relative paths:
```
crops/<name>_crop.png
```
Add
```
loading="lazy"
```
and descriptive
```
alt
```
Wrap in
```
<figure>
```
with
```
<figcaption>
```

Animations (subtle only):

Fade-in on scroll via
```
IntersectionObserver
```
+ CSS transitions
Hover states on buttons/cards

**单个独立HTML文件——仅嵌入CSS、最小化原生JS。无外部JS框架。可以使用Google Fonts CDN。

页面布局：

最大内容宽度：
```
900px
```
，居中，两侧留白充足
顶部固定导航栏，包含章节锚点链接 + 平滑滚动
1200px宽度下显示效果良好；768px宽度下可读性良好

排版：

两种Google Fonts：一种用于标题，一种用于正文/UI
正文：17–18px，行高1.7
清晰的标题层级（h1 >> h2 >> h3）

视觉风格：

如果用户指定了风格，严格遵循
否则，根据论文的领域和调性推断合适的美学风格（例如CV/ML论文→简洁现代学术风；系统论文→深色技术风；人文学科→温暖编辑衬线风）
将颜色和字体定义为CSS变量；不需要固定的调色板或字体选择

章节指南：

hero

大标题（2–3rem）、作者列表带所属机构上标、会议标识标签
链接按钮：
```
[📄 论文] [💻 代码] [🗄️ 数据集]
```
——如果没有对应URL则置灰
下方放置主视觉图（如果找到）

abstract

逐字的摘要文本，带细微的左侧边框装饰

contributions

2–3列CSS网格布局的卡片，每个卡片包含Unicode符号 + 标题 + 描述

method

全宽架构图（
```
<figure><img><figcaption>
```
）+ 文字说明

results

量化表格使用真实的
```
<table>
```
标签——使用PDF中的真实数值，最优数值加粗
定性分析图表使用网格布局（2–4张带标题的图片）

conclusion

2–3段文字

citation

```
<pre><code>
```
包裹的从PDF元数据重构的BibTeX块
使用
```
navigator.clipboard
```
原生JS实现的“复制”按钮

图片：

所有
```
<img>
```
使用相对路径：
```
crops/<name>_crop.png
```
添加
```
loading="lazy"
```
和描述性
```
alt
```
属性
用
```
<figure>
```
包裹并添加
```
<figcaption>
```

动画（仅允许细微效果）：

通过
```
IntersectionObserver
```
+ CSS过渡实现滚动时淡入效果
按钮/卡片的悬停状态

Quality Checklist

质量检查清单

Output directory named
```
<pdf_stem>_<timestamp>/
```
```
outline.json
```
saved with valid WebPlan schema
All crops saved to
```
crops/
```
(local only)
All metadata (title, authors, venue, year) from the PDF
Abstract is verbatim
Quantitative table has real numbers from the paper
All crop images referenced via
```
crops/<name>_crop.png
```
BibTeX block accurate and copyable
Nav anchors scroll to correct sections
```
generate_web.py
```
called and confirmed success

输出目录命名为
```
<pdf_stem>_<timestamp>/
```
保存的
```
outline.json
```
符合WebPlan schema规范
所有裁剪图片保存到
```
crops/
```
目录（仅本地）
所有元数据（标题、作者、会议、年份）均来自PDF
摘要逐字复制
量化表格使用论文中的真实数值
所有裁剪图片通过
```
crops/<name>_crop.png
```
路径引用
BibTeX块准确且可复制
导航锚点滚动到正确的章节
已调用
```
generate_web.py
```
并确认执行成功

Language

语言

Match the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.

与PDF语言保持一致。英文论文→英文网站。中文论文→中文网站。不要混合语言。