Paper Illustration: Multi-Stage Claude-Supervised Figure Generation

论文插图：多阶段Claude监督式图表生成

Generate publication-quality illustrations using a multi-stage workflow with Claude as the STRICT supervisor/reviewer.

借助多阶段工作流，以Claude作为严格的监督者/审核者，生成达到出版级别的插图。

Core Design Philosophy

核心设计理念

┌──────────────────────────────────────────────────────────────────────────┐
│                    MULTI-STAGE ITERATIVE WORKFLOW                        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Request                                                           │
│       │                                                                  │
│       ▼                                                                  │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 1: Parse request, create initial prompt     │
│   │  (Planner)  │                                                        │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 2: Optimize layout description               │
│   │ (gemini-3-pro)│      - Refine component positioning                    │
│   │  Layout     │      - Optimize spacing and grouping                   │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 3: CVPR/NeurIPS style verification          │
│   │ (gemini-3-pro)│      - Check color palette compliance                  │
│   │  Style      │      - Verify arrow and font standards                 │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │ Paperbanana │ ◄─── Step 4: Render final image                       │
│   │ (gemini-3-  │      - High-quality image generation                   │
│   │ pro-image)  │      - Internal codename: Nano Banana Pro              │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 5: STRICT visual review + SCORE (1-10)      │
│   │  (Reviewer) │      - Verify EVERY arrow direction                    │
│   │   STRICT!   │      - Verify EVERY block content                      │
│   └──────┬──────┘      - Verify aesthetics & visual appeal               │
│          │                                                               │
│          ▼                                                               │
│   Score ≥ 9? ──YES──► Accept & Output                                    │
│          │                                                               │
│          NO                                                              │
│          │                                                               │
│          ▼                                                               │
│   Generate SPECIFIC improvement feedback ──► Loop back to Step 2        │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                    MULTI-STAGE ITERATIVE WORKFLOW                        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Request                                                           │
│       │                                                                  │
│       ▼                                                                  │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 1: Parse request, create initial prompt     │
│   │  (Planner)  │                                                        │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 2: Optimize layout description               │
│   │ (gemini-3-pro)│      - Refine component positioning                    │
│   │  Layout     │      - Optimize spacing and grouping                   │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 3: CVPR/NeurIPS style verification          │
│   │ (gemini-3-pro)│      - Check color palette compliance                  │
│   │  Style      │      - Verify arrow and font standards                 │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │ Paperbanana │ ◄─── Step 4: Render final image                       │
│   │ (gemini-3-  │      - High-quality image generation                   │
│   │ pro-image)  │      - Internal codename: Nano Banana Pro              │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 5: STRICT visual review + SCORE (1-10)      │
│   │  (Reviewer) │      - Verify EVERY arrow direction                    │
│   │   STRICT!   │      - Verify EVERY block content                      │
│   └──────┬──────┘      - Verify aesthetics & visual appeal               │
│          │                                                               │
│          ▼                                                               │
│   Score ≥ 9? ──YES──► Accept & Output                                    │
│          │                                                               │
│          NO                                                              │
│          │                                                               │
│          ▼                                                               │
│   Generate SPECIFIC improvement feedback ──► Loop back to Step 2        │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Constants

常量定义

IMAGE_MODEL =
gemini-3-pro-image-preview
— Paperbanana (Nano Banana Pro) for image rendering
REASONING_MODEL =
gemini-3-pro-preview
— Gemini for layout optimization and style checking
MAX_ITERATIONS = 5 — Maximum refinement rounds
TARGET_SCORE = 9 — Minimum acceptable score (1-10) — RAISED FOR QUALITY
OUTPUT_DIR =
figures/ai_generated/
— Output directory
API_KEY_ENV =
GEMINI_API_KEY
— Environment variable

IMAGE_MODEL =
gemini-3-pro-image-preview
— 用于图像渲染的Paperbanana（内部代号：Nano Banana Pro）
REASONING_MODEL =
gemini-3-pro-preview
— 用于布局优化和风格检查的Gemini
MAX_ITERATIONS = 5 — 最大优化轮次
TARGET_SCORE = 9 — 最低可接受分数（1-10）——为保证质量调高标准
OUTPUT_DIR =
figures/ai_generated/
— 输出目录
API_KEY_ENV =
GEMINI_API_KEY
— 环境变量

CVPR/ICLR/NeurIPS Top-Tier Conference Style Guide

CVPR/ICLR/NeurIPS顶会风格指南

What "CVPR Style" Actually Means:

“CVPR风格”的实际定义：

Visual Standards

视觉标准

Clean white background — No decorative patterns or gradients (unless subtle)
Sans-serif fonts — Arial, Helvetica, or Computer Modern; minimum 14pt
Subtle color palette — Not rainbow colors; use 3-5 coordinated colors
Print-friendly — Must be readable in grayscale (many reviewers print papers)
Professional borders — Thin (2-3px), solid colors, not flashy

纯白背景 — 无装饰性图案或渐变（除非非常淡雅）
无衬线字体 — Arial、Helvetica或Computer Modern；最小字号14pt
淡雅配色方案 — 避免彩虹色系；使用3-5种协调配色
适合打印 — 灰度模式下仍需清晰可读（许多审稿人会打印论文）
专业边框 — 细边框（2-3px）、纯色，避免花哨设计

Layout Standards

布局标准

Horizontal flow — Left-to-right is the standard for pipelines
Clear grouping — Use subtle background boxes to group related modules
Consistent sizing — Similar components should have similar sizes
Balanced whitespace — Not cramped, not sparse

横向流 — 流水线图标准为从左到右
清晰分组 — 使用淡雅背景框将相关模块分组
尺寸统一 — 同类组件尺寸应保持一致
留白均衡 — 既不过于拥挤，也不过于松散

Arrow Standards (MOST CRITICAL)

箭头标准（最关键）

Thick strokes — 4-6px minimum (thin arrows disappear when printed)
Clear arrowheads — Large, filled triangular heads
Dark colors — Black or dark gray (#333333); avoid colored arrows
Labeled — Every arrow should indicate what data flows through it
No crossings — Reorganize layout to avoid arrow crossings
CORRECT DIRECTION — Arrows must point to the RIGHT target!

粗线条 — 最小4-6px（细箭头打印后会消失）
清晰箭头头 — 大尺寸实心三角箭头
深色 — 黑色或深灰色（#333333）；避免彩色箭头
带标签 — 每个箭头都应标注流经的数据内容
无交叉 — 调整布局避免箭头交叉
方向正确 — 箭头必须指向正确的目标！

Visual Appeal (科研风格 - Professional Academic Style)

视觉吸引力（科研风格 - 专业学术风格）

目标：既不保守也不花哨，找到平衡点

✅ 应该有的视觉元素：

Subtle gradient fills — 淡雅的渐变填充（同色系从浅到深），不是炫彩
Rounded corners — 圆角矩形（6-10px radius），现代感但不夸张
Clear visual hierarchy — 通过大小、颜色深浅区分层次
Consistent color coding — 统一的配色方案（3-4种主色）
Internal structure — 大模块内部显示子组件（如Encoder内部的layer结构）
Professional typography — 清晰的标签，适当的字号层次

淡雅渐变填充 — 同色系从浅到深的淡雅渐变，而非炫彩渐变
圆角矩形 — 圆角半径6-10px，兼具现代感但不过度夸张
清晰视觉层次 — 通过尺寸、颜色深浅区分层级
统一配色编码 — 统一的配色方案（3-4种主色）
内部结构 — 大模块内显示子组件结构（如Encoder内部的layer结构）
专业排版 — 清晰的标签，合理的字号层次

✅ 配色建议（学术专业）：

Inputs: 柔和的绿色系 (#10B981 / #34D399)
Encoders: 专业的蓝色系 (#2563EB / #3B82F6)
Fusion: 优雅的紫色系 (#7C3AED / #8B5CF6)
Outputs: 温暖的橙色系 (#EA580C / #F97316)
Arrows: 黑色或深灰 (#333333 / #1F2937)
Background: 纯白 (#FFFFFF)，不要花纹

输入层：柔和绿色系 (#10B981 / #34D399)
编码器：专业蓝色系 (#2563EB / #3B82F6)
融合层：优雅紫色系 (#7C3AED / #8B5CF6)
输出层：温暖橙色系 (#EA580C / #F97316)
箭头：黑色或深灰 (#333333 / #1F2937)
背景：纯白 (#FFFFFF)，无花纹

❌ 要避免的过度装饰：

❌ Rainbow color schemes (彩虹配色)
❌ Heavy drop shadows (重阴影效果)
❌ 3D effects / perspective (3D透视)
❌ Excessive gradients (夸张的多色渐变)
❌ Clip art / cartoon icons (卡通图标)
❌ Decorative patterns in background (背景花纹)
❌ Glowing effects (发光效果)
❌ Too many small icons (过多小图标)

❌ 彩虹配色方案
❌ 厚重阴影效果
❌ 3D效果/透视
❌ 夸张的多色渐变
❌ 剪贴画/卡通图标
❌ 背景装饰图案
❌ 发光效果
❌ 过多小图标

✓ 理想的视觉效果：

一眼看上去专业、清晰
有适度的视觉吸引力，但不抢眼
符合CVPR/NeurIPS论文的审美标准
打印友好（灰度模式下也能清晰辨认）
像精心设计的学术图表，而不是PPT模板

一眼看上去专业、清晰
有适度的视觉吸引力，但不抢眼
符合CVPR/NeurIPS论文的审美标准
适合打印（灰度模式下也能清晰辨认）
像精心设计的学术图表，而非PPT模板

What to AVOID (CRITICAL)

绝对避免的问题（关键）

❌ Rainbow color schemes (too many colors)
❌ Thin, hairline arrows (arrows must be THICK)
❌ Unlabeled connections
❌ Plain boring rectangles (add some visual interest)
❌ Over-decorated with shadows/glows/icons (too flashy)
❌ Small text that's unreadable when printed
❌ WRONG arrow directions — This is UNACCEPTABLE!

❌ 彩虹配色（颜色过多）
❌ 纤细箭头（箭头必须加粗）
❌ 未标注的连接
❌ 完全平淡的方块（需增加一定设计感）
❌ 过度装饰（阴影/发光/图标过多）（过于花哨）
❌ 字号过小导致打印后无法阅读
❌ 箭头方向错误 — 绝对不可接受！

Scope

适用范围

Figure Type	Quality	Examples
Architecture diagrams	Excellent	Model architecture, pipeline, encoder-decoder
Method illustrations	Excellent	Conceptual diagrams, algorithm flowcharts
Conceptual figures	Good	Comparison diagrams, taxonomy trees

Not for: Statistical plots (use

/paper-figure

), photo-realistic images

图表类型	生成质量	示例
架构图	优秀	模型架构、流水线、编码器-解码器
方法示意图	优秀	概念图、算法流程图
概念图表	良好	对比图、分类树

不适用： 统计图表（请使用

/paper-figure

）、照片级写实图像

Workflow: MUST EXECUTE ALL STEPS

工作流：必须执行所有步骤

Step 0: Pre-flight Check

步骤0：预检查

bash

undefined

bash

undefined

Check API key

if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY not set" echo "Get your key from: https://aistudio.google.com/app/apikey" echo "Set it: export GEMINI_API_KEY='your-key'" exit 1 fi

Create output directory

mkdir -p figures/ai_generated

undefined

mkdir -p figures/ai_generated

undefined

Step 1: Claude Plans the Figure (YOU ARE HERE)

步骤1：Claude规划图表（当前步骤）

CRITICAL: Claude must first analyze the user's request and create a detailed prompt.

Parse the input: $ARGUMENTS

Claude's task:

Understand what figure the user wants
Identify all components, connections, data flow
Create a detailed, structured prompt for Gemini
Include style requirements AND visual appeal requirements

Prompt Template for Claude to generate:

Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.

关键：Claude必须先分析用户需求，生成详细提示词。

解析输入：$ARGUMENTS

Claude的任务：

理解用户需要的图表类型
识别所有组件、连接关系、数据流
为Gemini生成详细结构化的提示词
包含风格要求和视觉吸引力要求

Claude需生成的提示词模板：

Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.

Visual Style: 科研风格 (Academic Professional Style)

目标：平衡 — 既不保守也不花哨

DO (应该有):

Subtle gradients — 同色系淡雅渐变（如 #2563EB → #3B82F6），不是多色炫彩
Rounded corners — 圆角矩形（6-10px），现代感
Clear visual hierarchy — 通过大小、深浅区分层次
Internal structure — 大模块内显示子组件结构
Consistent color coding — 统一的3-4色方案
Professional polish — 精致但不夸张

Subtle gradients — 同色系淡雅渐变（如 #2563EB → #3B82F6），不是多色炫彩
Rounded corners — 圆角矩形（6-10px），现代感
Clear visual hierarchy — 通过大小、深浅区分层次
Internal structure — 大模块内显示子组件结构
Consistent color coding — 统一的3-4色方案
Professional polish — 精致但不夸张

DON'T (不要有):

❌ Rainbow/multi-color gradients (彩虹渐变)
❌ Heavy drop shadows (重阴影)
❌ 3D effects / perspective (3D效果)
❌ Glowing effects (发光效果)
❌ Excessive decorative icons (过多装饰图标)
❌ Plain boring rectangles (完全平淡的方块)

❌ Rainbow/multi-color gradients (彩虹渐变)
❌ Heavy drop shadows (重阴影)
❌ 3D effects / perspective (3D效果)
❌ Glowing effects (发光效果)
❌ Excessive decorative icons (过多装饰图标)
❌ Plain boring rectangles (完全平淡的方块)

理想效果：

像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力

Figure Type

[Architecture Diagram / Pipeline / Comparison / etc.]

Components to Include (BE SPECIFIC ABOUT CONTENT)

[Component 1]:
- Label: "[exact text]"
- Sub-label: "[smaller text below]"
- Position: [left/center/right, top/middle/bottom]
- Style: [border color, fill, internal structure]

[Component 1]:
- Label: "[exact text]"
- Sub-label: "[smaller text below]"
- Position: [left/center/right, top/middle/bottom]
- Style: [border color, fill, internal structure]

Layout

Direction: [left-to-right / top-to-bottom]
Spacing: [tight / normal / loose]
Grouping: [how components should be grouped]

Direction: [left-to-right / top-to-bottom]
Spacing: [tight / normal / loose]
Grouping: [how components should be grouped]

Connections (BE EXPLICIT ABOUT DIRECTION)

EXACT arrow specifications:

[Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
[Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!

EXACT arrow specifications:

[Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
[Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!

Style Requirements (CVPR/ICLR/NeurIPS Standard)

Visual Style

Color palette: Professional academic colors
- Inputs: Green (#10B981)
- Encoders: Blue (#2563EB)
- Fusion modules: Purple (#7C3AED)
- Outputs: Orange (#EA580C)
Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
Background: Clean white, no patterns
Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
Subtle shadows for depth effect
Print-friendly (must work in grayscale)

Color palette: Professional academic colors
- Inputs: Green (#10B981)
- Encoders: Blue (#2563EB)
- Fusion modules: Purple (#7C3AED)
- Outputs: Orange (#EA580C)
Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
Background: Clean white, no patterns
Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
Subtle shadows for depth effect
Print-friendly (must work in grayscale)

CRITICAL: Arrow & Data Flow Requirements

ALL arrows must be VERY THICK - minimum 5-6px stroke width
ALL arrows must have CLEAR arrowheads - large, visible triangular heads
ALL arrows must be BLACK or DARK GRAY - not colored
Label EVERY arrow with what data flows through it
VERIFY arrow direction - each arrow MUST point to the correct target
No ambiguous connections - every arrow should have a clear source and destination

ALL arrows must be VERY THICK - minimum 5-6px stroke width
ALL arrows must have CLEAR arrowheads - large, visible triangular heads
ALL arrows must be BLACK or DARK GRAY - not colored
Label EVERY arrow with what data flows through it
VERIFY arrow direction - each arrow MUST point to the correct target
No ambiguous connections - every arrow should have a clear source and destination

Logic Clarity Requirements

Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
No crossing arrows - reorganize layout to avoid arrow crossings
Consistent direction - maintain left-to-right or top-to-bottom flow throughout
Group related components - use subtle background boxes or spacing to group modules
Clear hierarchy - main components larger, sub-components smaller

Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
No crossing arrows - reorganize layout to avoid arrow crossings
Consistent direction - maintain left-to-right or top-to-bottom flow throughout
Group related components - use subtle background boxes or spacing to group modules
Clear hierarchy - main components larger, sub-components smaller

Additional Requirements

[Any specific requirements from user]

undefined

[Any specific requirements from user]

undefined

Step 2: Gemini Layout Optimization (gemini-3-pro)

步骤2：Gemini布局优化（gemini-3-pro）

Claude sends the initial prompt to Gemini (gemini-3-pro) for layout optimization.

bash

#!/bin/bash

Claude将初始提示词发送给Gemini（gemini-3-pro）进行布局优化。

bash

#!/bin/bash

Step 2: Optimize layout using Gemini gemini-3-pro

This step refines component positioning and spacing

set -e

OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"

set -e

OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"

The initial prompt from Claude

INITIAL_PROMPT='[Claude fills in the detailed prompt here]'

Layout optimization request

LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.

Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:

$INITIAL_PROMPT

Provide:

Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
Spacing Recommendations: Specific spacing between components
Grouping Strategy: Which components should be visually grouped together
Arrow Routing: Optimal paths for arrows to avoid crossings
Visual Hierarchy: Size recommendations for main vs sub-components

Output a DETAILED layout specification that will be used for rendering."

LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.

Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:

$INITIAL_PROMPT

Provide:

Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
Spacing Recommendations: Specific spacing between components
Grouping Strategy: Which components should be visually grouped together
Arrow Routing: Optimal paths for arrows to avoid crossings
Visual Hierarchy: Size recommendations for main vs sub-components

Output a DETAILED layout specification that will be used for rendering."

Build JSON payload

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}] } with open("/tmp/gemini_layout_request.json", "w") as f: json.dump(payload, f, indent=2) print("Layout request created") PYTHON

Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy)

RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)

Extract layout description

LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting layout') ")

echo "=== Layout Optimization Complete ===" echo "$LAYOUT_DESCRIPTION" echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"

undefined

LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting layout') ")

echo "=== Layout Optimization Complete ===" echo "$LAYOUT_DESCRIPTION" echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"

undefined

Step 3: Gemini Style Verification (gemini-3-pro)

步骤3：Gemini风格验证（gemini-3-pro）

Claude sends the optimized layout to Gemini for CVPR/NeurIPS style verification.

bash

#!/bin/bash

Claude将优化后的布局发送给Gemini进行CVPR/NeurIPS风格验证。

bash

#!/bin/bash

Step 3: Verify and enhance style compliance using Gemini gemini-3-pro

API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"

Read layout from previous step

LAYOUT=$(cat figures/ai_generated/layout_description.txt)

Style verification request

STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.

Review and ENHANCE this figure specification for top-tier conference compliance:

$LAYOUT

Ensure compliance with:

Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
Font Standards: Sans-serif, minimum 14pt, readable in print
Visual Appeal (科研风格):
- ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
- ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients

Output an ENHANCED figure specification with explicit style instructions for rendering."

STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.

Review and ENHANCE this figure specification for top-tier conference compliance:

$LAYOUT

Ensure compliance with:

Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
Font Standards: Sans-serif, minimum 14pt, readable in print
Visual Appeal (科研风格):
- ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
- ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients

Output an ENHANCED figure specification with explicit style instructions for rendering."

Build JSON payload

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}] } with open("/tmp/gemini_style_request.json", "w") as f: json.dump(payload, f, indent=2) print("Style request created") PYTHON

Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy)

RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)

Extract style-enhanced specification

STYLE_SPEC=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting style spec') ")

echo "=== Style Verification Complete ===" echo "$STYLE_SPEC" echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"

undefined

STYLE_SPEC=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting style spec') ")

echo "=== Style Verification Complete ===" echo "$STYLE_SPEC" echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"

undefined

Step 4: Paperbanana Image Rendering (gemini-3-pro-image-preview)

步骤4：Paperbanana图像渲染（gemini-3-pro-image-preview）

Claude sends the optimized, style-verified specification to Paperbanana for rendering.

bash

#!/bin/bash

Claude将经过优化和风格验证的规范发送给Paperbanana进行渲染。

bash

#!/bin/bash

Step 4: Render image using Paperbanana (gemini-3-pro-image-preview)

Internal codename: Nano Banana Pro

Use DIRECT connection (no proxy) - proxy causes SSL errors

set -e

OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY"

set -e

OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

API_KEY="${GEMINI_API_KEY}" URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY"

Read the style-enhanced specification from previous step

STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt)

Add rendering instructions

RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:

$STYLE_SPEC

RENDERING REQUIREMENTS:

Output a clean, professional diagram suitable for CVPR/NeurIPS submission
Use vector-quality rendering with sharp edges and clear text
Ensure all elements are properly aligned and spaced
The diagram should be immediately understandable at a glance"

RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:

$STYLE_SPEC

RENDERING REQUIREMENTS:

Output a clean, professional diagram suitable for CVPR/NeurIPS submission
Use vector-quality rendering with sharp edges and clear text
Ensure all elements are properly aligned and spaced
The diagram should be immediately understandable at a glance"

Build JSON payload using Python for proper escaping

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } with open("/tmp/gemini_request.json", "w") as f: json.dump(payload, f, indent=2) print("JSON payload created") PYTHON

Call Paperbanana API WITHOUT proxy (direct connection works better)

RESPONSE=$(curl -s --max-time 180
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)

Check for error

if echo "$RESPONSE" | grep -q '"error"'; then echo "API Error:" echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE" exit 1 fi

Extract and save image

echo "$RESPONSE" | python3 << 'PYTHON' import sys, json, base64 from pathlib import Path

output_dir = Path("figures/ai_generated") data = json.load(sys.stdin)

try: parts = data['candidates'][0]['content']['parts'] iteration = 1 # Claude increments this each iteration

for part in parts:
    if 'text' in part:
        print(f"\n[Paperbanana]: {part['text'][:200]}...")
    elif 'inlineData' in part:
        img_data = base64.b64decode(part['inlineData']['data'])
        img_path = output_dir / f"figure_v{iteration}.png"
        with open(img_path, "wb") as f:
            f.write(img_data)
        print(f"\n✅ Image saved: {img_path}")
        print(f"   Size: {len(img_data)/1024:.1f} KB")

except Exception as e: print(f"Parse error: {e}") print(f"Raw response: {str(data)[:500]}") PYTHON

undefined

echo "$RESPONSE" | python3 << 'PYTHON' import sys, json, base64 from pathlib import Path

output_dir = Path("figures/ai_generated") data = json.load(sys.stdin)

try: parts = data['candidates'][0]['content']['parts'] iteration = 1 # Claude increments this each iteration

for part in parts:
    if 'text' in part:
        print(f"\n[Paperbanana]: {part['text'][:200]}...")
    elif 'inlineData' in part:
        img_data = base64.b64decode(part['inlineData']['data'])
        img_path = output_dir / f"figure_v{iteration}.png"
        with open(img_path, "wb") as f:
            f.write(img_data)
        print(f"\n✅ Image saved: {img_path}")
        print(f"   Size: {len(img_data)/1024:.1f} KB")

except Exception as e: print(f"Parse error: {e}") print(f"Raw response: {str(data)[:500]}") PYTHON

undefined

Step 5: Claude STRICT Visual Review & Scoring (MANDATORY)

步骤5：Claude严格视觉审核与评分（必须执行）

Claude MUST read the generated image and perform a STRICT review:

Visual Analysis: What does the image show in detail?
Strengths: What's good about it?
STRICT Verification: Check EVERY item below
Score: Rate 1-10 (10 = perfect) — BE STRICT!

STRICT Review Template:

markdown

undefined

Claude必须查看生成的图像并执行严格审核：

视觉分析：详细描述图像内容
优势：图像的可取之处
严格验证：检查以下每一项
评分：1-10分（10分=完美）——必须严格评分！

严格审核模板：

markdown

undefined

Claude's STRICT Review of Figure v{N}

What I See

[Describe the generated image in DETAIL - every block, every arrow]

Strengths

[Strength 1]
[Strength 2]

[Strength 1]
[Strength 2]

═══════════════════════════════════════════════════════════════

STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9)

═══════════════════════════════════════════════════════════════

A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6)

Check EACH arrow:

Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?

Check EACH arrow:

Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?

B. Block Content Verification (any failure = score ≤ 7)

Check EACH block:

Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
Block 7 "[Name]": Has correct label? Has sub-label? Content correct?

Check EACH block:

Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
Block 7 "[Name]": Has correct label? Has sub-label? Content correct?

C. Arrow Visibility (any failure = score ≤ 7)

ALL arrows are THICK (≥5px visible stroke)
ALL arrows have CLEAR arrowheads (large triangular heads)
ALL arrows are BLACK or DARK GRAY (not light colors)
NO arrows are too thin or invisible

ALL arrows are THICK (≥5px visible stroke)
ALL arrows have CLEAR arrowheads (large triangular heads)
ALL arrows are BLACK or DARK GRAY (not light colors)
NO arrows are too thin or invisible

D. Arrow Labels (any failure = score ≤ 7)

EVERY arrow has a text label
Labels are readable (not too small)
Labels correctly describe the data flowing

EVERY arrow has a text label
Labels are readable (not too small)
Labels correctly describe the data flowing

E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8)

有适度视觉吸引力 — 有subtle渐变或圆角，但不夸张
不是平淡方块 — 有一定设计感
不过度装饰 — 没有重阴影、发光效果、彩虹配色
专业学术风格 — 像CVPR论文中的图表，不是PPT模板
Internal structure visible — 大模块内部显示子组件结构
Color palette: 3-4种协调色 — 不是彩虹，也不是纯黑白

有适度视觉吸引力 — 有subtle渐变或圆角，但不夸张
不是平淡方块 — 有一定设计感
不过度装饰 — 没有重阴影、发光效果、彩虹配色
专业学术风格 — 像CVPR论文中的图表，不是PPT模板
Internal structure visible — 大模块内部显示子组件结构
Color palette: 3-4种协调色 — 不是彩虹，也不是纯黑白

E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found)

NO heavy drop shadows (重阴影 = too flashy)
NO glowing effects (发光效果 = too flashy)
NO rainbow gradients (彩虹渐变 = unprofessional)
NO excessive decorative icons (过多装饰图标 = distracting)

NO heavy drop shadows (重阴影 = too flashy)
NO glowing effects (发光效果 = too flashy)
NO rainbow gradients (彩虹渐变 = unprofessional)
NO excessive decorative icons (过多装饰图标 = distracting)

F. Layout & Flow (any failure = score ≤ 7)

Clean horizontal left-to-right flow
No arrow crossings
Data flow traceable in 5 seconds
Balanced spacing (not cramped, not sparse)

Clean horizontal left-to-right flow
No arrow crossings
Data flow traceable in 5 seconds
Balanced spacing (not cramped, not sparse)

G. Style Compliance

CVPR/NeurIPS professional style
Color palette appropriate (not rainbow)
Font readable
Print-friendly (grayscale test)

CVPR/NeurIPS professional style
Color palette appropriate (not rainbow)
Font readable
Print-friendly (grayscale test)

═══════════════════════════════════════════════════════════════

Issues Found (BE SPECIFIC)

[Issue 1]: [EXACTLY what is wrong] → [How to fix]
[Issue 2]: [EXACTLY what is wrong] → [How to fix]
[Issue 3]: [EXACTLY what is wrong] → [How to fix]

[Issue 1]: [EXACTLY what is wrong] → [How to fix]
[Issue 2]: [EXACTLY what is wrong] → [How to fix]
[Issue 3]: [EXACTLY what is wrong] → [How to fix]

Score: X/10

STRICT Score Breakdown Guide:

10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
7: Usable but has clear problems. 箭头或内容有问题。
6: Has arrow direction errors (箭头指向错误) OR missing major components.
1-5: Major issues. Unacceptable.

10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
7: Usable but has clear problems. 箭头或内容有问题。
6: Has arrow direction errors (箭头指向错误) OR missing major components.
1-5: Major issues. Unacceptable.

Visual Style Scoring (视觉风格评分):

太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10

太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10

Verdict

[ ] ACCEPT (score ≥ 9 AND all critical checks pass) [ ] REFINE (score < 9 OR any critical check fails)

If REFINE: List the EXACT issues that must be fixed

undefined

[ ] ACCEPT (score ≥ 9 AND all critical checks pass) [ ] REFINE (score < 9 OR any critical check fails)

If REFINE: List the EXACT issues that must be fixed

undefined

Step 6: Decision Point

步骤6：决策点

IF score >= 9 AND all critical checks pass:
    → Accept figure, generate LaTeX snippet, DONE
ELSE IF iteration < MAX_ITERATIONS:
    → Generate SPECIFIC improvement prompt based on EXACT issues
    → Go to Step 2 (Gemini Layout) with refined prompt
ELSE:
    → Max iterations reached, show best version
    → Ask user if they want to continue or accept

IF score >= 9 AND all critical checks pass:
    → 接受图表，生成LaTeX代码片段，完成流程
ELSE IF iteration < MAX_ITERATIONS:
    → 根据具体问题生成针对性优化提示词
    → 携带优化后的提示词返回步骤2（Gemini布局优化）
ELSE:
    → 已达最大迭代次数，展示最优版本
    → 询问用户是否继续优化或接受当前版本

Step 7: Generate Improvement Prompt (for refinement)

步骤7：生成优化提示词（用于迭代优化）

Claude generates TARGETED improvement prompt with EXACT issues:

Refine this academic diagram. This is iteration {N}.

Claude需基于具体问题生成针对性的优化提示词：

Refine this academic diagram. This is iteration {N}.

═══════════════════════════════════════════════════════════════

CRITICAL: Fix These EXACT Issues (from previous review)

═══════════════════════════════════════════════════════════════

Arrow Direction Errors (MUST FIX):

EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.

EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.

Missing Arrow Labels (MUST FIX):

Arrow from [A] to [B] is missing label "[data type]"
...

Arrow from [A] to [B] is missing label "[data type]"
...

Block Content Issues (MUST FIX):

Block "[Name]" has wrong label. Should be "[correct label]"
...

Block "[Name]" has wrong label. Should be "[correct label]"
...

Visual Appeal Issues (SHOULD FIX):

Blocks are too plain. Add [gradients/shadows/internal structure]
...

Blocks are too plain. Add [gradients/shadows/internal structure]
...

Keep These Good Elements:

[What to preserve from previous version]

[What to preserve from previous version]

Generate the improved figure with ALL issues fixed.

undefined

undefined

Step 8: Final Output

步骤8：最终输出

When figure is accepted (score ≥ 9):

latex

% === AI-Generated Figure ===
\begin{figure*}[t]
    \centering
    \includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
    \caption{[Caption based on user's original request].}
    \label{fig:[label]}
\end{figure*}

当图表被接受（分数≥9）时：

latex

% === AI-Generated Figure ===
\begin{figure*}[t]
    \centering
    \includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
    \caption{[Caption based on user's original request].}
    \label{fig:[label]}
\end{figure*}

Key Rules (MUST FOLLOW - STRICT)

核心规则（必须严格遵守）

NEVER skip the review step — Always read and STRICTLY score the image
NEVER accept score < 9 — Keep refining until excellence
VERIFY EVERY ARROW DIRECTION — Wrong direction = automatic fail (score ≤ 6)
VERIFY EVERY BLOCK CONTENT — Wrong content = automatic fail (score ≤ 7)
BE SPECIFIC in feedback — "Arrow from A to B points to wrong target C" not "arrow is wrong"
SAVE all iterations — Keep version history for comparison
Claude is the STRICT boss — Accept only excellence, not "good enough"
ARROW CORRECTNESS IS NON-NEGOTIABLE — Any wrong arrow direction = reject
VISUAL APPEAL MATTERS — Plain boring figures = score ≤ 8
Target score is 9 — Not 8, not "good enough"
USE MULTI-STAGE WORKFLOW — Claude → Gemini Layout → Gemini Style → Paperbanana → Claude Review
USE CORRECT MODELS — gemini-3-pro for reasoning, gemini-3-pro-image-preview for rendering

绝不跳过审核步骤 — 必须查看图像并严格评分
绝不接受分数<9的图表 — 持续优化直至达到优秀标准
验证每一个箭头方向 — 方向错误直接判定不合格（分数≤6）
验证每一个组件内容 — 内容错误直接判定不合格（分数≤7）
反馈需具体 — 需说明“从A到B的箭头指向了错误的目标C”，而非“箭头有问题”
保存所有迭代版本 — 保留版本历史用于对比
Claude是严格的审核者 — 只接受优秀成果，而非“足够好”
箭头正确性不可妥协 — 任何箭头方向错误都需拒绝
视觉吸引力很重要 — 完全平淡的图表分数≤8
目标分数为9 — 不是8，也不是“足够好”
使用多阶段工作流 — Claude → Gemini布局优化 → Gemini风格验证 → Paperbanana渲染 → Claude审核
使用正确模型 — gemini-3-pro用于推理，gemini-3-pro-image-preview用于渲染

Output Structure

输出结构

figures/ai_generated/
├── layout_description.txt  # Step 2: Gemini layout optimization output
├── style_spec.txt          # Step 3: Gemini style verification output
├── figure_v1.png           # Iteration 1 (Paperbanana render)
├── figure_v2.png           # Iteration 2
├── figure_v3.png           # Iteration 3
├── figure_final.png        # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex       # LaTeX snippet
└── review_log.json         # All review scores and STRICT feedback

figures/ai_generated/
├── layout_description.txt  # Step 2: Gemini layout optimization output
├── style_spec.txt          # Step 3: Gemini style verification output
├── figure_v1.png           # Iteration 1 (Paperbanana render)
├── figure_v2.png           # Iteration 2
├── figure_v3.png           # Iteration 3
├── figure_final.png        # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex       # LaTeX snippet
└── review_log.json         # All review scores and STRICT feedback

Model Summary

模型汇总

Stage	Model	Purpose
Step 1	Claude	Parse request, create initial prompt
Step 2	gemini-3-pro	Layout optimization (positioning, spacing, grouping)
Step 3	gemini-3-pro	CVPR/NeurIPS style verification
Step 4	gemini-3-pro-image-preview (Paperbanana)	High-quality image rendering
Step 5	Claude	STRICT visual review and scoring

阶段	模型	用途
步骤1	Claude	解析需求，生成初始提示词
步骤2	gemini-3-pro	布局优化（组件位置、间距、分组）
步骤3	gemini-3-pro	CVPR/NeurIPS风格验证
步骤4	gemini-3-pro-image-preview (Paperbanana)	高质量图像渲染
步骤5	Claude	严格视觉审核与评分

paper-illustration

Original

Translation

Paper Illustration: Multi-Stage Claude-Supervised Figure Generation

论文插图：多阶段Claude监督式图表生成

Core Design Philosophy

核心设计理念

Constants

常量定义

CVPR/ICLR/NeurIPS Top-Tier Conference Style Guide

CVPR/ICLR/NeurIPS顶会风格指南

Visual Standards

视觉标准

Layout Standards

布局标准

Arrow Standards (MOST CRITICAL)

箭头标准（最关键）

Visual Appeal (科研风格 - Professional Academic Style)

视觉吸引力（科研风格 - 专业学术风格）

✅ 应该有的视觉元素：

✅ 应该有的视觉元素：

✅ 配色建议（学术专业）：

✅ 配色建议（学术专业）：

❌ 要避免的过度装饰：

❌ 要避免的过度装饰：

✓ 理想的视觉效果：

✓ 理想的视觉效果：

What to AVOID (CRITICAL)

绝对避免的问题（关键）

Scope

适用范围

Workflow: MUST EXECUTE ALL STEPS

工作流：必须执行所有步骤

Step 0: Pre-flight Check

步骤0：预检查

Check API key

Check API key

Create output directory

Create output directory

Step 1: Claude Plans the Figure (YOU ARE HERE)

步骤1：Claude规划图表（当前步骤）

Visual Style: 科研风格 (Academic Professional Style)

Visual Style: 科研风格 (Academic Professional Style)

目标：平衡 — 既不保守也不花哨

目标：平衡 — 既不保守也不花哨

DO (应该有):

DO (应该有):

DON'T (不要有):

DON'T (不要有):

理想效果：

理想效果：

Figure Type

Figure Type

Components to Include (BE SPECIFIC ABOUT CONTENT)

Components to Include (BE SPECIFIC ABOUT CONTENT)

Layout

Layout

Connections (BE EXPLICIT ABOUT DIRECTION)

Connections (BE EXPLICIT ABOUT DIRECTION)

Style Requirements (CVPR/ICLR/NeurIPS Standard)

Style Requirements (CVPR/ICLR/NeurIPS Standard)

Visual Style

Visual Style

CRITICAL: Arrow & Data Flow Requirements

CRITICAL: Arrow & Data Flow Requirements

Logic Clarity Requirements

Logic Clarity Requirements

Additional Requirements

Additional Requirements

Step 2: Gemini Layout Optimization (gemini-3-pro)

步骤2：Gemini布局优化（gemini-3-pro）

Step 2: Optimize layout using Gemini gemini-3-pro

Step 2: Optimize layout using Gemini gemini-3-pro

This step refines component positioning and spacing

This step refines component positioning and spacing

The initial prompt from Claude

The initial prompt from Claude

Layout optimization request

Layout optimization request

Build JSON payload