arxiv-figures

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

arXiv Figure Optimizer

arXiv图片优化工具

Purpose

用途

Analyze, optimize, and convert figures in a TeX/LaTeX project to meet arXiv requirements and size constraints. Produces correctly formatted, efficiently compressed figures that compile without errors.

Companion skills:

```
arxiv-preflight
```
— full submission validation
```
arxiv-package
```
— tarball packaging

分析、优化并转换TeX/LaTeX项目中的图片，使其满足arXiv的要求和大小限制。生成格式正确、压缩高效的图片，确保编译无错误。

配套技能：

```
arxiv-preflight
```
— 完整提交验证
```
arxiv-package
```
— 压缩包打包

Format Rules

格式规则

By Processor

按处理器分类

Processor	Accepted	Rejected
LaTeX (DVI mode)	`.ps` , `.eps`	`.pdf` , `.png` , `.jpg`
PDFLaTeX	`.pdf` , `.png` , `.jpg`	`.ps` , `.eps`

处理器	支持格式	不支持格式
LaTeX（DVI模式）	`.ps` , `.eps`	`.pdf` , `.png` , `.jpg`
PDFLaTeX	`.pdf` , `.png` , `.jpg`	`.ps` , `.eps`

By Content Type

按内容类型分类

Content	Optimal Format	Rationale
Photographs	JPEG	Lossy compression suits continuous tone
Line drawings / diagrams	PDF (vector)	Scalable, sharp at any resolution
Plots with text labels	PDF (vector)	Text remains crisp and searchable
Screenshots / raster art	PNG	Lossless compression for sharp edges
Mixed photo + text	PNG or PDF	Depends on dominant content

内容类型	最优格式	理由
照片	JPEG	有损压缩适合连续色调内容
线条图/示意图	PDF（矢量）	可缩放，任意分辨率下都清晰
带文本标签的图表	PDF（矢量）	文本保持清晰且可搜索
截图/光栅图	PNG	无损压缩适合清晰边缘内容
照片+文本混合内容	PNG或PDF	取决于主要内容类型

Workflow

工作流程

1. Inventory

1. 盘点

Scan the project for all figures:

Parse
```
\includegraphics
```
calls from all
```
.tex
```
files
Identify the TeX processor (DVI vs PDFLaTeX) from document preamble or build config
For each figure: record path, format, file size, dimensions (pixels or vector bounds)
Flag missing figures, wrong-format figures, oversized figures

扫描项目中的所有图片：

解析所有
```
.tex
```
文件中的
```
\includegraphics
```
调用
从文档序言或构建配置中识别TeX处理器（DVI或PDFLaTeX）
为每张图片记录路径、格式、文件大小、尺寸（像素或矢量边界）
标记缺失的图片、格式错误的图片、过大的图片

2. Analyze

2. 分析

For each figure, determine:

Format compliance — does the format match the processor?
File size — flag individual figures >2MB, total >15MB
Resolution — PNG/JPEG: flag >34 Megapixels (arXiv warning threshold since Feb 2026)
Content type — photograph vs diagram vs plot (determines optimal format)
Redundant metadata — PNG: ICC profiles, alpha channels, EXIF, interlacing
EPS efficiency — verbose PostScript from plotting programs (common with matplotlib, R, MATLAB)

针对每张图片，确定以下内容：

格式合规性 — 格式是否匹配处理器要求？
文件大小 — 标记单个图片>2MB、总大小>15MB的情况
分辨率 — PNG/JPEG：标记分辨率>3400万像素（自2026年2月起的arXiv警告阈值）
内容类型 — 照片、示意图还是图表（决定最优格式）
冗余元数据 — PNG：ICC配置文件、Alpha通道、EXIF、隔行扫描
EPS效率 — 绘图程序生成的冗余PostScript（matplotlib、R、MATLAB中常见）

3. Optimize

3. 优化

Apply transformations in order of impact:

Format Conversion (when format violates processor requirements)

bash

undefined

按影响优先级应用转换操作：

格式转换（当格式不符合处理器要求时）

bash

undefined

EPS → PDF (for PDFLaTeX)

EPS → PDF（适用于PDFLaTeX）

epstopdf figure.eps

or

或

ps2pdf -dEPSCrop figure.eps figure.pdf

PDF/PNG/JPG → EPS (for DVI mode)

PDF/PNG/JPG → EPS（适用于DVI模式）

convert figure.png figure.eps


**Size Reduction — Vector Figures**
```bash

convert figure.png figure.eps


**矢量图片大小压缩**
```bash

Distill verbose EPS

精简冗余EPS

eps2eps input.eps output.eps

or convert to PDF

或转换为PDF

ps2pdf -dEPSCrop input.eps output.pdf


**Size Reduction — Raster Figures**
```bash

ps2pdf -dEPSCrop input.eps output.pdf


**光栅图片大小压缩**
```bash

Strip PNG metadata, remove alpha, optimize compression

清除PNG元数据、移除Alpha通道、优化压缩率

convert input.png -strip -alpha remove -define png:compression-level=9 output.png

Reduce oversized PNG resolution (keep ≤300 DPI at print size)

降低过大PNG的分辨率（打印尺寸下保持≤300 DPI）

convert input.png -resize 3000x3000> -strip output.png

JPEG quality optimization (80-90 is visually lossless for most content)

JPEG质量优化（80-90对大多数内容来说视觉上无损失）

convert input.jpg -quality 85 -strip output.jpg

Downsample oversized JPEG

降低过大JPEG的分辨率

convert input.jpg -resize 3000x3000> -quality 85 -strip output.jpg


**PNG Optimization** (avoid arXiv warnings)
- Remove palette indexing if unnecessary
- Remove alpha channel if background is solid
- Strip ICC color profiles
- Remove metadata chunks
- Disable interlacing

**EPS BoundingBox Fix** (prevents `Missing number, treated as zero`)
- Verify `%%BoundingBox` appears near top of file, not only at end
- If only `%%BoundingBox: (atend)`, extract actual values and place at top

convert input.jpg -resize 3000x3000> -quality 85 -strip output.jpg


**PNG优化**（避免arXiv警告）
- 必要时移除调色板索引
- 若背景为纯色则移除Alpha通道
- 清除ICC颜色配置文件
- 移除元数据块
- 禁用隔行扫描

**EPS边界框修复**（防止`Missing number, treated as zero`错误）
- 验证`%%BoundingBox`是否出现在文件顶部附近，而非仅在末尾
- 若仅存在`%%BoundingBox: (atend)`，提取实际值并放置在文件顶部

4. Update TeX Source

4. 更新TeX源文件

If figures were renamed or reformatted:

Update
```
\includegraphics
```
paths
Remove explicit extensions where possible (allows processor flexibility)
Verify
```
\graphicspath
```
settings if used

若图片被重命名或转换格式：

更新
```
\includegraphics
```
的路径
尽可能移除显式扩展名（提升处理器兼容性）
若使用
```
\graphicspath
```
则验证其设置

5. Verify

5. 验证

After optimization:

Attempt local compilation to verify all figures render
Compare visual output of optimized vs original figures
Report size reduction per figure and total

优化完成后：

尝试本地编译以验证所有图片可正常渲染
对比优化后与原始图片的视觉输出
报告每张图片及整体的大小压缩情况

6. Report

6. 生成报告

markdown

undefined

markdown

undefined

Figure Optimization Report

图片优化报告

Processor: [detected] Total figures: [count] Size before: [total MB] Size after: [total MB] Reduction: [percentage]

处理器： [检测到的处理器] 图片总数： [数量] 优化前总大小： [总MB数] 优化后总大小： [总MB数] 压缩率： [百分比]

Changes Made

已执行的修改

Figure	Original	Optimized	Size Before	Size After	Action
fig1	fig1.eps	fig1.pdf	12.3 MB	0.4 MB	EPS→PDF conversion
fig2	fig2.png	fig2.png	8.1 MB	1.2 MB	Strip metadata, downsample

图片	原始信息	优化后信息	优化前大小	优化后大小	操作
fig1	fig1.eps	fig1.pdf	12.3 MB	0.4 MB	EPS→PDF格式转换
fig2	fig2.png	fig2.png	8.1 MB	1.2 MB	清除元数据、降低分辨率

Warnings

警告

[Any remaining issues — e.g., figures still above thresholds]

undefined

[剩余问题——例如，仍超过阈值的图片]

undefined

Tools Reference

工具参考

Tool	Install	Use Case
ImageMagick ( `convert` )	System package	Format conversion, resizing, stripping
Ghostscript ( `ps2pdf` , `eps2eps` )	System package	EPS/PS optimization and conversion
`epstopdf`	TeX Live	EPS → PDF conversion
`pdfcrop`	TeX Live	Trim PDF whitespace
`optipng`	System package	PNG lossless optimization
`pngquant`	System package	PNG lossy size reduction
`jpegoptim`	System package	JPEG lossless optimization

工具	安装方式	使用场景
ImageMagick ( `convert` )	系统包	格式转换、调整大小、清除元数据
Ghostscript ( `ps2pdf` , `eps2eps` )	系统包	EPS/PS优化与转换
`epstopdf`	TeX Live	EPS → PDF转换
`pdfcrop`	TeX Live	裁剪PDF空白区域
`optipng`	系统包	PNG无损优化
`pngquant`	系统包	PNG有损压缩
`jpegoptim`	系统包	JPEG无损优化

Core Principles

核心原则

Never degrade visual quality below print-readable. Optimization means removing waste (metadata, unnecessary resolution, verbose encoding), not destroying information.
Match format to processor. A figure in the wrong format blocks compilation. This is the highest priority fix.
Preserve vector where possible. Converting vector to raster is a one-way quality loss. Only do this when the vector version is pathologically large (>10MB) and cannot be distilled.
Report everything. The user decides which optimizations to accept. Show before/after sizes and explain each transformation.

绝不降低图片至打印可读以下的视觉质量。优化指清除冗余内容（元数据、不必要的分辨率、冗余编码），而非破坏信息。
格式匹配处理器。格式错误的图片会导致编译失败，这是最高优先级的修复项。
尽可能保留矢量格式。将矢量格式转换为光栅格式是不可逆的质量损失。仅当矢量版本异常庞大（>10MB）且无法精简时才执行此操作。
全面报告所有操作。由用户决定是否接受优化方案。展示优化前后的大小并解释每项转换操作。