glmv-pdf-to-web

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PDF → Academic Project Website Skill

PDF → 学术项目网站技能

Convert a research paper or technical document PDF into a polished single-page project website — the kind used for NeurIPS/CVPR/ICLR paper releases. Pages are converted locally at DPI 120, a structured
outline.json
is saved, images are cropped locally, and the final page is saved with
generate_web.py
.
Scripts are in:
{SKILL_DIR}/scripts/
将研究论文或技术文档PDF转换为精美的单页项目网站——即NeurIPS/CVPR/ICLR论文发布时使用的类型。页面以120 DPI在本地转换,保存结构化的
outline.json
,本地裁剪图片,最终页面通过
generate_web.py
保存。
脚本存放路径:
{SKILL_DIR}/scripts/

Dependencies

依赖

Python packages (install once):
bash
pip install pymupdf pillow
System tools:
curl
(pre-installed on macOS/Linux).
Python包(仅需安装一次):
bash
pip install pymupdf pillow
系统工具:
curl
(macOS/Linux系统已预装)。

When to Use

适用场景

Trigger when the user asks to create a webpage or project page from a PDF — phrases like: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", or any similar intent in Chinese or English.
当用户要求从PDF创建网页或项目页面时触发,类似表述如下: "make a project page from a PDF", "create a paper website", "build an academic website for this paper", "论文主页", "做项目主页", "根据pdf做网页", "把论文做成主页", 或任何中英文的类似意图。

Output Directory Convention

输出目录约定

All output goes under
{WORKSPACE}/web/<pdf_stem>_<timestamp>/
:
web/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← structured web plan (WebPlan schema)
    ├── crops/              ← locally-saved cropped images
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← the website
  • <pdf_stem>
    = PDF filename without extension
  • <timestamp>
    = format
    YYYYMMDD_HHMMSS
  • HTML references images via relative path
    crops/<name>_crop.png
所有输出都存放在
{WORKSPACE}/web/<pdf_stem>_<timestamp>/
目录下:
web/
└── <pdf_stem>_<timestamp>/
    ├── outline.json        ← 结构化网站规划(WebPlan schema)
    ├── crops/              ← 本地保存的裁剪图片
    │   ├── fig_arch_crop.png
    │   ├── table_results_crop.png
    │   └── ...
    └── index.html          ← 生成的网站
  • <pdf_stem>
    = 不带扩展名的PDF文件名
  • <timestamp>
    = 格式为
    YYYYMMDD_HHMMSS
  • HTML通过相对路径
    crops/<name>_crop.png
    引用图片

Input

输入

$ARGUMENTS
is the path to the PDF file (local) or an HTTP/HTTPS URL.
  • If user provides a URL: download with curl first, then convert
  • If user provides a local PDF path: convert directly

$ARGUMENTS
是PDF文件的本地路径或HTTP/HTTPS URL。
  • 如果用户提供URL:先使用curl下载,再进行转换
  • 如果用户提供本地PDF路径:直接转换

Workflow

工作流程

Phase 0 — Create Output Directory

阶段0 — 创建输出目录

python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
bash
mkdir -p "<out_dir>/crops"

python
import os, datetime
pdf_stem = os.path.splitext(os.path.basename(pdf_path))[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = os.path.join(workspace, "web", f"{pdf_stem}_{timestamp}")
bash
mkdir -p "<out_dir>/crops"

Phase 1 — Convert PDF Pages to Images (DPI 120)

阶段1 — 将PDF页面转换为图片(DPI 120)

If the input is a URL, download it first:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
Then convert (pass either the downloaded path or the original local path):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
Outputs JSON to stdout:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
Parse and store the full
page → path
map.

如果输入是URL,先下载:
bash
pdf_stem=$(basename "$ARGUMENTS" .pdf)
curl -L -o "/tmp/${pdf_stem}.pdf" "$ARGUMENTS"
然后转换(传入下载后的路径或原始本地路径):
bash
python {SKILL_DIR}/scripts/pdf_to_images.py "<pdf_path>" --dpi 120
输出JSON到标准输出:
json
[{"page": 1, "path": "/abs/path/page_001.png"}, ...]
解析并存储完整的
页面 → 路径
映射关系。

Phase 2 — Read All Pages in Order

阶段2 — 按顺序读取所有页面

View all page images sequentially before planning. Goal: pure understanding of the document's content, figures, and structure.
While reading, note:
  • Title, authors, affiliations, venue, year
  • Abstract text (verbatim)
  • Key contributions
  • Paper/Code/Dataset links (arXiv, GitHub, etc.)
  • Figures, tables, diagrams — which pages, rough regions
  • Teaser/hero figure if present
Do NOT plan sections yet — read everything first.

在规划前依次查看所有页面图片,目标是完全理解文档的内容、图表和结构。
阅读过程中记录:
  • 标题、作者、所属机构、发表会议/期刊、年份
  • 摘要文本(逐字复制)
  • 核心贡献
  • 论文/代码/数据集链接(arXiv、GitHub等)
  • 图片、表格、示意图——所在页面、大致区域
  • 预告/主视觉图(如果存在)
暂时不要规划章节,先通读全部内容。

Phase 3 — Plan Sections & Save outline.json

阶段3 — 规划章节并保存outline.json

Plan the website sections. Standard structure for academic papers (adapt as needed):
section_id
Purpose
hero
Title, authors, venue badge, link buttons
abstract
Full abstract text
contributions
3–5 key contribution cards
method
Architecture figure + method explanation
results
Quantitative table + qualitative figures
conclusion
Brief conclusion
citation
BibTeX block
For each section that needs an image, identify:
  • Which page it comes from (the local page path from Phase 1)
  • A description of what the visual shows and why it belongs in this section
Save as
<out_dir>/outline.json
using exactly this schema:
json
{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "<local_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}
Field notes:
  • lang
    :
    "Chinese"
    or
    "English"
    — match the PDF language
  • required_images
    : empty array
    []
    if section needs no images
  • url
    : the local file path of the source page (from Phase 1
    path
    field)
  • For images that need cropping, note the approximate region — exact crop boxes are determined in Phase 4
Write
outline.json
using the Write tool to
<out_dir>/outline.json
.

规划网站章节。学术论文的标准结构(可根据需要调整):
section_id
用途
hero
标题、作者、会议标识、链接按钮
abstract
完整摘要文本
contributions
3–5个核心贡献卡片
method
架构图 + 方法解释
results
量化表格 + 定性分析图表
conclusion
简要结论
citation
BibTeX引用块
对于每个需要图片的章节,确认:
  • 图片来自哪个页面(阶段1得到的本地页面路径)
  • 视觉内容的描述,以及它属于该章节的原因
严格按照以下schema保存为
<out_dir>/outline.json
json
{
  "project_title": "Paper Title",
  "lang": "English",
  "authors": ["Author One", "Author Two"],
  "sections_plan": [
    {
      "section_index": 1,
      "section_id": "hero",
      "title": "Hero",
      "content": "Title, authors, venue, teaser figure description",
      "required_images": [
        {
          "url": "<local_page_path_from_phase1>",
          "visual_description": "Figure 1: teaser showing input-output examples",
          "usage_reason": "Hero section visual to immediately show the paper's output"
        }
      ]
    }
  ]
}
字段说明:
  • lang
    "Chinese"
    "English"
    ——与PDF语言一致
  • required_images
    :如果章节不需要图片则为空数组
    []
  • url
    :源页面的本地文件路径(来自阶段1的
    path
    字段)
  • 对于需要裁剪的图片,记录大致区域——精确裁剪框将在阶段4确定
使用Write工具将
outline.json
写入
<out_dir>/outline.json

Phase 4 — Crop Required Images (Grounding + Subagent)

阶段4 — 裁剪所需图片(定位 + 子Agent)

IMPORTANT: You MUST delegate ALL cropping to a clean subagent using the Agent tool. By this phase your context is very long (all page images + outline), which degrades visual coordinate accuracy. A fresh subagent with only the target image produces much more precise coordinates.
IMPORTANT: You MUST use the provided
{SKILL_DIR}/scripts/crop.py
script for ALL image cropping. Do NOT write your own cropping code, do NOT use PIL/Pillow directly, do NOT use any other method.
Read
outline.json
. Collect all crops needed, then launch one subagent per source page (or one per crop if pages differ). The subagent uses grounding-style localization — it views the image, locates the target element, and outputs a precise bounding box in normalized 0–999 coordinates.
Use the Agent tool like this:
Agent tool call:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = left/top edge of the image
       - 999 = right/bottom edge of the image
       - These are thousandths, NOT pixels, NOT percentages (0–100)
       - Format: [x1, y1, x2, y2] where (x1,y1) is top-left, (x2,y2) is bottom-right
       - Example: [0, 0, 500, 500] = top-left quarter of the image
    4. Be precise: tightly bound the target element with a small margin (~10–20 units)
       around it. Do NOT crop too wide or too narrow.

    ## Source image
    <page_image_path>

    ## Crops needed

    For each crop below, first do grounding (locate the element), then crop:

    1. Name: "<descriptive_name>"
       Target: "<visual_description from outline.json>"
       Context: "<usage_reason from outline.json>"

    ## Crop command

    After determining the bounding box [X1, Y1, X2, Y2] for each target, run:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## Verification

    After each crop, READ the output image to visually verify the correct region
    was captured. If the crop missed the target or is too wide/narrow, adjust the
    coordinates and re-run crop.py.

    ## Output

    Report the final results as a list:
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]
Replace
<page_image_path>
,
<SKILL_DIR>
,
<out_dir>
, and crop details with actual values from your context.
The crop.py script outputs JSON:
{"path": "/abs/path/<name>_crop.png"}
Collect results from all subagents and build the mapping:
section_id → [crop filename, ...]
to reference in HTML.
Launch subagents for independent pages in parallel when possible. Wait for all to complete before proceeding.

重要提示:你必须使用Agent工具将所有裁剪工作委托给全新的子Agent完成。 到这个阶段你的上下文已经非常长(所有页面图片 + 大纲),这会降低视觉坐标的准确性。仅加载目标图片的全新子Agent可以生成更精确的坐标。
重要提示:所有图片裁剪必须使用提供的
{SKILL_DIR}/scripts/crop.py
脚本。不要自行编写裁剪代码,不要直接使用PIL/Pillow,不要使用任何其他方法。
读取
outline.json
,收集所有需要裁剪的内容,然后每个源页面启动一个子Agent(如果页面不同也可以每个裁剪任务启动一个)。子Agent使用定位式视觉定位——它查看图片,定位目标元素,输出0-999归一化坐标的精确边界框。
按如下方式使用Agent工具:
Agent工具调用:
  description: "Grounding crop page N"
  prompt: |
    You are a visual grounding and cropping assistant. Your task is to precisely
    locate specified visual elements in a page image and crop them out.

    ## Grounding method

    Use visual grounding to locate each target:
    1. Read the source image using the Read tool to view it
    2. Identify the target element described below
    3. Determine its bounding box as normalized coordinates in the 0–999 range:
       - 0 = 图片左/上边缘
       - 999 = 图片右/下边缘
       - 这些是千分位单位,不是像素,也不是百分比(0-100)
       - 格式:[x1, y1, x2, y2],其中(x1,y1)是左上角,(x2,y2)是右下角
       - 示例:[0, 0, 500, 500] = 图片的左上四分之一区域
    4. 保持精确:用约10-20单位的小边距紧密包裹目标元素,不要裁剪过宽或过窄。

    ## 源图片
    <page_image_path>

    ## 需要裁剪的内容

    对于下面的每个裁剪项,先进行视觉定位(找到元素),然后裁剪:

    1. 名称:"<descriptive_name>"
       目标:"<visual_description from outline.json>"
       上下文:"<usage_reason from outline.json>"

    ## 裁剪命令

    确定每个目标的边界框[X1, Y1, X2, Y2]后,运行:
    ```bash
    python <SKILL_DIR>/scripts/crop.py \
        --path "<page_image_path>" \
        --box X1 Y1 X2 Y2 \
        --name "<crop_name>" \
        --out-dir "<out_dir>/crops"
    ```

    ## 验证

    每次裁剪后,读取输出图片进行视觉验证,确认捕获了正确的区域。如果裁剪遗漏了目标或过宽/过窄,调整坐标并重新运行crop.py。

    ## 输出

    按列表格式报告最终结果:
    - crop_name: <name>, file: <output_filename>, box: [X1, Y1, X2, Y2]
<page_image_path>
<SKILL_DIR>
<out_dir>
和裁剪详情替换为你上下文中的实际值。
crop.py脚本输出JSON:
{"path": "/abs/path/<name>_crop.png"}
收集所有子Agent的结果,构建映射关系:
section_id → [裁剪文件名, ...]
,以便在HTML中引用。
尽可能并行启动独立页面的子Agent,等待所有子Agent完成后再继续。

Phase 5 — Measure Cropped Image Dimensions

阶段5 — 测量裁剪后图片的尺寸

bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
Aspect ratioLayout recommendation
< 0.7 (tall/narrow)
max-width: 400–500px
, centered
0.7 – 1.3 (square-ish)
max-width: 600–700px
> 1.3 (wide)Full-width,
max-width: 100%
> 2.0 (very wide, e.g. tables)Full-width with horizontal scroll fallback

bash
python3 -c "
from PIL import Image; import os, json
d = '<out_dir>/crops'
sizes = {}
for f in sorted(os.listdir(d)):
    if f.endswith('.png'):
        w, h = Image.open(os.path.join(d, f)).size
        sizes[f] = {'width': w, 'height': h, 'aspect': round(w/h, 2)}
print(json.dumps(sizes, indent=2))
"
宽高比布局建议
< 0.7 (高/窄)
max-width: 400–500px
,居中
0.7 – 1.3 (接近正方形)
max-width: 600–700px
> 1.3 (宽)全宽,
max-width: 100%
> 2.0 (极宽,例如表格)全宽,支持水平滚动

Phase 6 — Generate the Single-Page HTML

阶段6 — 生成单页HTML

Step A — Write HTML to
/tmp/website.html
  • All
    <img src="...">
    must use relative paths:
    crops/<name>_crop.png
  • Do NOT use absolute paths
Step B — Save:
bash
python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "<paper title>" \
    --out-dir "<out_dir>/"

步骤A — 编写HTML
/tmp/website.html
  • 所有
    <img src="...">
    必须使用相对路径
    crops/<name>_crop.png
  • 不要使用绝对路径
步骤B — 保存:
bash
python {SKILL_DIR}/scripts/generate_web.py \
    --html-file /tmp/website.html \
    --title "<paper title>" \
    --out-dir "<out_dir>/"

HTML Spec

HTML规范

A single self-contained HTML file — embedded CSS, minimal vanilla JS only. No external JS frameworks. Google Fonts CDN is fine.
Page layout:
  • Max content width:
    900px
    , centered, comfortable side padding
  • Sticky top nav with section anchor links + smooth scroll
  • Looks good at 1200px wide; readable at 768px
Typography:
  • Two Google Fonts: one for headings, one for body/UI
  • Body: 17–18px, line-height 1.7
  • Strong heading hierarchy (h1 >> h2 >> h3)
Visual style:
  • If the user specifies a style, follow it exactly
  • Otherwise, infer an appropriate aesthetic from the paper's domain and tone (e.g. CV/ML paper → clean modern academic; systems paper → dark technical; humanities → warm editorial serif)
  • Define colors and fonts as CSS variables; no fixed palette or font choices are required
Section guidelines:
hero
:
  • Large title (2–3rem), authors list with affiliation superscripts, venue badge pill
  • Link buttons:
    [📄 Paper] [💻 Code] [🗄️ Dataset]
    — grey out if no URL
  • Teaser figure below (if found)
abstract
:
  • Verbatim text with subtle left border accent
contributions
:
  • Cards in a 2–3 column CSS grid, each with Unicode symbol + heading + description
method
:
  • Full-width architecture figure (
    <figure><img><figcaption>
    ) + prose explanation
results
:
  • Quantitative table as real
    <table>
    — use actual numbers from the PDF, best numbers bolded
  • Qualitative figures in a grid (2–4 images with captions)
conclusion
:
  • 2–3 paragraphs
citation
:
  • <pre><code>
    BibTeX block reconstructed from PDF metadata
  • "Copy" button using
    navigator.clipboard
    vanilla JS
Images:
  • All
    <img>
    use relative paths:
    crops/<name>_crop.png
  • Add
    loading="lazy"
    and descriptive
    alt
  • Wrap in
    <figure>
    with
    <figcaption>
Animations (subtle only):
  • Fade-in on scroll via
    IntersectionObserver
    + CSS transitions
  • Hover states on buttons/cards


**单个独立HTML文件——仅嵌入CSS、最小化原生JS。无外部JS框架。可以使用Google Fonts CDN。
页面布局:
  • 最大内容宽度:
    900px
    ,居中,两侧留白充足
  • 顶部固定导航栏,包含章节锚点链接 + 平滑滚动
  • 1200px宽度下显示效果良好;768px宽度下可读性良好
排版:
  • 两种Google Fonts:一种用于标题,一种用于正文/UI
  • 正文:17–18px,行高1.7
  • 清晰的标题层级(h1 >> h2 >> h3)
视觉风格:
  • 如果用户指定了风格,严格遵循
  • 否则,根据论文的领域和调性推断合适的美学风格(例如CV/ML论文→简洁现代学术风;系统论文→深色技术风;人文学科→温暖编辑衬线风)
  • 将颜色和字体定义为CSS变量;不需要固定的调色板或字体选择
章节指南:
hero
:
  • 大标题(2–3rem)、作者列表带所属机构上标、会议标识标签
  • 链接按钮:
    [📄 论文] [💻 代码] [🗄️ 数据集]
    ——如果没有对应URL则置灰
  • 下方放置主视觉图(如果找到)
abstract
:
  • 逐字的摘要文本,带细微的左侧边框装饰
contributions
:
  • 2–3列CSS网格布局的卡片,每个卡片包含Unicode符号 + 标题 + 描述
method
:
  • 全宽架构图(
    <figure><img><figcaption>
    )+ 文字说明
results
:
  • 量化表格使用真实的
    <table>
    标签——使用PDF中的真实数值,最优数值加粗
  • 定性分析图表使用网格布局(2–4张带标题的图片)
conclusion
:
  • 2–3段文字
citation
:
  • <pre><code>
    包裹的从PDF元数据重构的BibTeX块
  • 使用
    navigator.clipboard
    原生JS实现的“复制”按钮
图片:
  • 所有
    <img>
    使用相对路径:
    crops/<name>_crop.png
  • 添加
    loading="lazy"
    和描述性
    alt
    属性
  • <figure>
    包裹并添加
    <figcaption>
动画(仅允许细微效果):
  • 通过
    IntersectionObserver
    + CSS过渡实现滚动时淡入效果
  • 按钮/卡片的悬停状态


Quality Checklist

质量检查清单

  • Output directory named
    <pdf_stem>_<timestamp>/
  • outline.json
    saved with valid WebPlan schema
  • All crops saved to
    crops/
    (local only)
  • All metadata (title, authors, venue, year) from the PDF
  • Abstract is verbatim
  • Quantitative table has real numbers from the paper
  • All crop images referenced via
    crops/<name>_crop.png
  • BibTeX block accurate and copyable
  • Nav anchors scroll to correct sections
  • generate_web.py
    called and confirmed success

  • 输出目录命名为
    <pdf_stem>_<timestamp>/
  • 保存的
    outline.json
    符合WebPlan schema规范
  • 所有裁剪图片保存到
    crops/
    目录(仅本地)
  • 所有元数据(标题、作者、会议、年份)均来自PDF
  • 摘要逐字复制
  • 量化表格使用论文中的真实数值
  • 所有裁剪图片通过
    crops/<name>_crop.png
    路径引用
  • BibTeX块准确且可复制
  • 导航锚点滚动到正确的章节
  • 已调用
    generate_web.py
    并确认执行成功

Language

语言

Match the PDF language. English paper → English website. Chinese paper → Chinese. No mixing.
与PDF语言保持一致。英文论文→英文网站。中文论文→中文网站。不要混合语言。