arxiv-figures

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

arXiv Figure Optimizer

arXiv图片优化工具

Purpose

用途

Analyze, optimize, and convert figures in a TeX/LaTeX project to meet arXiv requirements and size constraints. Produces correctly formatted, efficiently compressed figures that compile without errors.
Companion skills:
  • arxiv-preflight
    — full submission validation
  • arxiv-package
    — tarball packaging
分析、优化并转换TeX/LaTeX项目中的图片,使其满足arXiv的要求和大小限制。生成格式正确、压缩高效的图片,确保编译无错误。
配套技能:
  • arxiv-preflight
    — 完整提交验证
  • arxiv-package
    — 压缩包打包

Format Rules

格式规则

By Processor

按处理器分类

ProcessorAcceptedRejected
LaTeX (DVI mode)
.ps
,
.eps
.pdf
,
.png
,
.jpg
PDFLaTeX
.pdf
,
.png
,
.jpg
.ps
,
.eps
处理器支持格式不支持格式
LaTeX(DVI模式)
.ps
,
.eps
.pdf
,
.png
,
.jpg
PDFLaTeX
.pdf
,
.png
,
.jpg
.ps
,
.eps

By Content Type

按内容类型分类

ContentOptimal FormatRationale
PhotographsJPEGLossy compression suits continuous tone
Line drawings / diagramsPDF (vector)Scalable, sharp at any resolution
Plots with text labelsPDF (vector)Text remains crisp and searchable
Screenshots / raster artPNGLossless compression for sharp edges
Mixed photo + textPNG or PDFDepends on dominant content
内容类型最优格式理由
照片JPEG有损压缩适合连续色调内容
线条图/示意图PDF(矢量)可缩放,任意分辨率下都清晰
带文本标签的图表PDF(矢量)文本保持清晰且可搜索
截图/光栅图PNG无损压缩适合清晰边缘内容
照片+文本混合内容PNG或PDF取决于主要内容类型

Workflow

工作流程

1. Inventory

1. 盘点

Scan the project for all figures:
  • Parse
    \includegraphics
    calls from all
    .tex
    files
  • Identify the TeX processor (DVI vs PDFLaTeX) from document preamble or build config
  • For each figure: record path, format, file size, dimensions (pixels or vector bounds)
  • Flag missing figures, wrong-format figures, oversized figures
扫描项目中的所有图片:
  • 解析所有
    .tex
    文件中的
    \includegraphics
    调用
  • 从文档序言或构建配置中识别TeX处理器(DVI或PDFLaTeX)
  • 为每张图片记录路径、格式、文件大小、尺寸(像素或矢量边界)
  • 标记缺失的图片、格式错误的图片、过大的图片

2. Analyze

2. 分析

For each figure, determine:
  1. Format compliance — does the format match the processor?
  2. File size — flag individual figures >2MB, total >15MB
  3. Resolution — PNG/JPEG: flag >34 Megapixels (arXiv warning threshold since Feb 2026)
  4. Content type — photograph vs diagram vs plot (determines optimal format)
  5. Redundant metadata — PNG: ICC profiles, alpha channels, EXIF, interlacing
  6. EPS efficiency — verbose PostScript from plotting programs (common with matplotlib, R, MATLAB)
针对每张图片,确定以下内容:
  1. 格式合规性 — 格式是否匹配处理器要求?
  2. 文件大小 — 标记单个图片>2MB、总大小>15MB的情况
  3. 分辨率 — PNG/JPEG:标记分辨率>3400万像素(自2026年2月起的arXiv警告阈值)
  4. 内容类型 — 照片、示意图还是图表(决定最优格式)
  5. 冗余元数据 — PNG:ICC配置文件、Alpha通道、EXIF、隔行扫描
  6. EPS效率 — 绘图程序生成的冗余PostScript(matplotlib、R、MATLAB中常见)

3. Optimize

3. 优化

Apply transformations in order of impact:
Format Conversion (when format violates processor requirements)
bash
undefined
按影响优先级应用转换操作:
格式转换(当格式不符合处理器要求时)
bash
undefined

EPS → PDF (for PDFLaTeX)

EPS → PDF(适用于PDFLaTeX)

epstopdf figure.eps
epstopdf figure.eps

or

ps2pdf -dEPSCrop figure.eps figure.pdf
ps2pdf -dEPSCrop figure.eps figure.pdf

PDF/PNG/JPG → EPS (for DVI mode)

PDF/PNG/JPG → EPS(适用于DVI模式)

convert figure.png figure.eps

**Size Reduction — Vector Figures**
```bash
convert figure.png figure.eps

**矢量图片大小压缩**
```bash

Distill verbose EPS

精简冗余EPS

eps2eps input.eps output.eps
eps2eps input.eps output.eps

or convert to PDF

或转换为PDF

ps2pdf -dEPSCrop input.eps output.pdf

**Size Reduction — Raster Figures**
```bash
ps2pdf -dEPSCrop input.eps output.pdf

**光栅图片大小压缩**
```bash

Strip PNG metadata, remove alpha, optimize compression

清除PNG元数据、移除Alpha通道、优化压缩率

convert input.png -strip -alpha remove -define png:compression-level=9 output.png
convert input.png -strip -alpha remove -define png:compression-level=9 output.png

Reduce oversized PNG resolution (keep ≤300 DPI at print size)

降低过大PNG的分辨率(打印尺寸下保持≤300 DPI)

convert input.png -resize 3000x3000> -strip output.png
convert input.png -resize 3000x3000> -strip output.png

JPEG quality optimization (80-90 is visually lossless for most content)

JPEG质量优化(80-90对大多数内容来说视觉上无损失)

convert input.jpg -quality 85 -strip output.jpg
convert input.jpg -quality 85 -strip output.jpg

Downsample oversized JPEG

降低过大JPEG的分辨率

convert input.jpg -resize 3000x3000> -quality 85 -strip output.jpg

**PNG Optimization** (avoid arXiv warnings)
- Remove palette indexing if unnecessary
- Remove alpha channel if background is solid
- Strip ICC color profiles
- Remove metadata chunks
- Disable interlacing

**EPS BoundingBox Fix** (prevents `Missing number, treated as zero`)
- Verify `%%BoundingBox` appears near top of file, not only at end
- If only `%%BoundingBox: (atend)`, extract actual values and place at top
convert input.jpg -resize 3000x3000> -quality 85 -strip output.jpg

**PNG优化**(避免arXiv警告)
- 必要时移除调色板索引
- 若背景为纯色则移除Alpha通道
- 清除ICC颜色配置文件
- 移除元数据块
- 禁用隔行扫描

**EPS边界框修复**(防止`Missing number, treated as zero`错误)
- 验证`%%BoundingBox`是否出现在文件顶部附近,而非仅在末尾
- 若仅存在`%%BoundingBox: (atend)`,提取实际值并放置在文件顶部

4. Update TeX Source

4. 更新TeX源文件

If figures were renamed or reformatted:
  1. Update
    \includegraphics
    paths
  2. Remove explicit extensions where possible (allows processor flexibility)
  3. Verify
    \graphicspath
    settings if used
若图片被重命名或转换格式:
  1. 更新
    \includegraphics
    的路径
  2. 尽可能移除显式扩展名(提升处理器兼容性)
  3. 若使用
    \graphicspath
    则验证其设置

5. Verify

5. 验证

After optimization:
  1. Attempt local compilation to verify all figures render
  2. Compare visual output of optimized vs original figures
  3. Report size reduction per figure and total
优化完成后:
  1. 尝试本地编译以验证所有图片可正常渲染
  2. 对比优化后与原始图片的视觉输出
  3. 报告每张图片及整体的大小压缩情况

6. Report

6. 生成报告

markdown
undefined
markdown
undefined

Figure Optimization Report

图片优化报告

Processor: [detected] Total figures: [count] Size before: [total MB] Size after: [total MB] Reduction: [percentage]
处理器: [检测到的处理器] 图片总数: [数量] 优化前总大小: [总MB数] 优化后总大小: [总MB数] 压缩率: [百分比]

Changes Made

已执行的修改

FigureOriginalOptimizedSize BeforeSize AfterAction
fig1fig1.epsfig1.pdf12.3 MB0.4 MBEPS→PDF conversion
fig2fig2.pngfig2.png8.1 MB1.2 MBStrip metadata, downsample
图片原始信息优化后信息优化前大小优化后大小操作
fig1fig1.epsfig1.pdf12.3 MB0.4 MBEPS→PDF格式转换
fig2fig2.pngfig2.png8.1 MB1.2 MB清除元数据、降低分辨率

Warnings

警告

[Any remaining issues — e.g., figures still above thresholds]
undefined
[剩余问题——例如,仍超过阈值的图片]
undefined

Tools Reference

工具参考

ToolInstallUse Case
ImageMagick (
convert
)
System packageFormat conversion, resizing, stripping
Ghostscript (
ps2pdf
,
eps2eps
)
System packageEPS/PS optimization and conversion
epstopdf
TeX LiveEPS → PDF conversion
pdfcrop
TeX LiveTrim PDF whitespace
optipng
System packagePNG lossless optimization
pngquant
System packagePNG lossy size reduction
jpegoptim
System packageJPEG lossless optimization
工具安装方式使用场景
ImageMagick (
convert
)
系统包格式转换、调整大小、清除元数据
Ghostscript (
ps2pdf
,
eps2eps
)
系统包EPS/PS优化与转换
epstopdf
TeX LiveEPS → PDF转换
pdfcrop
TeX Live裁剪PDF空白区域
optipng
系统包PNG无损优化
pngquant
系统包PNG有损压缩
jpegoptim
系统包JPEG无损优化

Core Principles

核心原则

  • Never degrade visual quality below print-readable. Optimization means removing waste (metadata, unnecessary resolution, verbose encoding), not destroying information.
  • Match format to processor. A figure in the wrong format blocks compilation. This is the highest priority fix.
  • Preserve vector where possible. Converting vector to raster is a one-way quality loss. Only do this when the vector version is pathologically large (>10MB) and cannot be distilled.
  • Report everything. The user decides which optimizations to accept. Show before/after sizes and explain each transformation.
  • 绝不降低图片至打印可读以下的视觉质量。优化指清除冗余内容(元数据、不必要的分辨率、冗余编码),而非破坏信息。
  • 格式匹配处理器。格式错误的图片会导致编译失败,这是最高优先级的修复项。
  • 尽可能保留矢量格式。将矢量格式转换为光栅格式是不可逆的质量损失。仅当矢量版本异常庞大(>10MB)且无法精简时才执行此操作。
  • 全面报告所有操作。由用户决定是否接受优化方案。展示优化前后的大小并解释每项转换操作。