paper-illustration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePaper Illustration: Multi-Stage Claude-Supervised Figure Generation
论文插图:多阶段Claude监督式图表生成
Generate publication-quality illustrations using a multi-stage workflow with Claude as the STRICT supervisor/reviewer.
借助多阶段工作流,以Claude作为严格的监督者/审核者,生成达到出版级别的插图。
Core Design Philosophy
核心设计理念
┌──────────────────────────────────────────────────────────────────────────┐
│ MULTI-STAGE ITERATIVE WORKFLOW │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ User Request │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Claude │ ◄─── Step 1: Parse request, create initial prompt │
│ │ (Planner) │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Gemini │ ◄─── Step 2: Optimize layout description │
│ │ (gemini-3-pro)│ - Refine component positioning │
│ │ Layout │ - Optimize spacing and grouping │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Gemini │ ◄─── Step 3: CVPR/NeurIPS style verification │
│ │ (gemini-3-pro)│ - Check color palette compliance │
│ │ Style │ - Verify arrow and font standards │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Paperbanana │ ◄─── Step 4: Render final image │
│ │ (gemini-3- │ - High-quality image generation │
│ │ pro-image) │ - Internal codename: Nano Banana Pro │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Claude │ ◄─── Step 5: STRICT visual review + SCORE (1-10) │
│ │ (Reviewer) │ - Verify EVERY arrow direction │
│ │ STRICT! │ - Verify EVERY block content │
│ └──────┬──────┘ - Verify aesthetics & visual appeal │
│ │ │
│ ▼ │
│ Score ≥ 9? ──YES──► Accept & Output │
│ │ │
│ NO │
│ │ │
│ ▼ │
│ Generate SPECIFIC improvement feedback ──► Loop back to Step 2 │
│ │
└──────────────────────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────────────────────┐
│ MULTI-STAGE ITERATIVE WORKFLOW │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ User Request │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Claude │ ◄─── Step 1: Parse request, create initial prompt │
│ │ (Planner) │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Gemini │ ◄─── Step 2: Optimize layout description │
│ │ (gemini-3-pro)│ - Refine component positioning │
│ │ Layout │ - Optimize spacing and grouping │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Gemini │ ◄─── Step 3: CVPR/NeurIPS style verification │
│ │ (gemini-3-pro)│ - Check color palette compliance │
│ │ Style │ - Verify arrow and font standards │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Paperbanana │ ◄─── Step 4: Render final image │
│ │ (gemini-3- │ - High-quality image generation │
│ │ pro-image) │ - Internal codename: Nano Banana Pro │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Claude │ ◄─── Step 5: STRICT visual review + SCORE (1-10) │
│ │ (Reviewer) │ - Verify EVERY arrow direction │
│ │ STRICT! │ - Verify EVERY block content │
│ └──────┬──────┘ - Verify aesthetics & visual appeal │
│ │ │
│ ▼ │
│ Score ≥ 9? ──YES──► Accept & Output │
│ │ │
│ NO │
│ │ │
│ ▼ │
│ Generate SPECIFIC improvement feedback ──► Loop back to Step 2 │
│ │
└──────────────────────────────────────────────────────────────────────────┘Constants
常量定义
- IMAGE_MODEL = — Paperbanana (Nano Banana Pro) for image rendering
gemini-3-pro-image-preview - REASONING_MODEL = — Gemini for layout optimization and style checking
gemini-3-pro-preview - MAX_ITERATIONS = 5 — Maximum refinement rounds
- TARGET_SCORE = 9 — Minimum acceptable score (1-10) — RAISED FOR QUALITY
- OUTPUT_DIR = — Output directory
figures/ai_generated/ - API_KEY_ENV = — Environment variable
GEMINI_API_KEY
- IMAGE_MODEL = — 用于图像渲染的Paperbanana(内部代号:Nano Banana Pro)
gemini-3-pro-image-preview - REASONING_MODEL = — 用于布局优化和风格检查的Gemini
gemini-3-pro-preview - MAX_ITERATIONS = 5 — 最大优化轮次
- TARGET_SCORE = 9 — 最低可接受分数(1-10)——为保证质量调高标准
- OUTPUT_DIR = — 输出目录
figures/ai_generated/ - API_KEY_ENV = — 环境变量
GEMINI_API_KEY
CVPR/ICLR/NeurIPS Top-Tier Conference Style Guide
CVPR/ICLR/NeurIPS顶会风格指南
What "CVPR Style" Actually Means:
“CVPR风格”的实际定义:
Visual Standards
视觉标准
- Clean white background — No decorative patterns or gradients (unless subtle)
- Sans-serif fonts — Arial, Helvetica, or Computer Modern; minimum 14pt
- Subtle color palette — Not rainbow colors; use 3-5 coordinated colors
- Print-friendly — Must be readable in grayscale (many reviewers print papers)
- Professional borders — Thin (2-3px), solid colors, not flashy
- 纯白背景 — 无装饰性图案或渐变(除非非常淡雅)
- 无衬线字体 — Arial、Helvetica或Computer Modern;最小字号14pt
- 淡雅配色方案 — 避免彩虹色系;使用3-5种协调配色
- 适合打印 — 灰度模式下仍需清晰可读(许多审稿人会打印论文)
- 专业边框 — 细边框(2-3px)、纯色,避免花哨设计
Layout Standards
布局标准
- Horizontal flow — Left-to-right is the standard for pipelines
- Clear grouping — Use subtle background boxes to group related modules
- Consistent sizing — Similar components should have similar sizes
- Balanced whitespace — Not cramped, not sparse
- 横向流 — 流水线图标准为从左到右
- 清晰分组 — 使用淡雅背景框将相关模块分组
- 尺寸统一 — 同类组件尺寸应保持一致
- 留白均衡 — 既不过于拥挤,也不过于松散
Arrow Standards (MOST CRITICAL)
箭头标准(最关键)
- Thick strokes — 4-6px minimum (thin arrows disappear when printed)
- Clear arrowheads — Large, filled triangular heads
- Dark colors — Black or dark gray (#333333); avoid colored arrows
- Labeled — Every arrow should indicate what data flows through it
- No crossings — Reorganize layout to avoid arrow crossings
- CORRECT DIRECTION — Arrows must point to the RIGHT target!
- 粗线条 — 最小4-6px(细箭头打印后会消失)
- 清晰箭头头 — 大尺寸实心三角箭头
- 深色 — 黑色或深灰色(#333333);避免彩色箭头
- 带标签 — 每个箭头都应标注流经的数据内容
- 无交叉 — 调整布局避免箭头交叉
- 方向正确 — 箭头必须指向正确的目标!
Visual Appeal (科研风格 - Professional Academic Style)
视觉吸引力(科研风格 - 专业学术风格)
目标:既不保守也不花哨,找到平衡点
目标:既不保守也不花哨,找到平衡点
✅ 应该有的视觉元素:
✅ 应该有的视觉元素:
- Subtle gradient fills — 淡雅的渐变填充(同色系从浅到深),不是炫彩
- Rounded corners — 圆角矩形(6-10px radius),现代感但不夸张
- Clear visual hierarchy — 通过大小、颜色深浅区分层次
- Consistent color coding — 统一的配色方案(3-4种主色)
- Internal structure — 大模块内部显示子组件(如Encoder内部的layer结构)
- Professional typography — 清晰的标签,适当的字号层次
- 淡雅渐变填充 — 同色系从浅到深的淡雅渐变,而非炫彩渐变
- 圆角矩形 — 圆角半径6-10px,兼具现代感但不过度夸张
- 清晰视觉层次 — 通过尺寸、颜色深浅区分层级
- 统一配色编码 — 统一的配色方案(3-4种主色)
- 内部结构 — 大模块内显示子组件结构(如Encoder内部的layer结构)
- 专业排版 — 清晰的标签,合理的字号层次
✅ 配色建议(学术专业):
✅ 配色建议(学术专业):
- Inputs: 柔和的绿色系 (#10B981 / #34D399)
- Encoders: 专业的蓝色系 (#2563EB / #3B82F6)
- Fusion: 优雅的紫色系 (#7C3AED / #8B5CF6)
- Outputs: 温暖的橙色系 (#EA580C / #F97316)
- Arrows: 黑色或深灰 (#333333 / #1F2937)
- Background: 纯白 (#FFFFFF),不要花纹
- 输入层:柔和绿色系 (#10B981 / #34D399)
- 编码器:专业蓝色系 (#2563EB / #3B82F6)
- 融合层:优雅紫色系 (#7C3AED / #8B5CF6)
- 输出层:温暖橙色系 (#EA580C / #F97316)
- 箭头:黑色或深灰 (#333333 / #1F2937)
- 背景:纯白 (#FFFFFF),无花纹
❌ 要避免的过度装饰:
❌ 要避免的过度装饰:
- ❌ Rainbow color schemes (彩虹配色)
- ❌ Heavy drop shadows (重阴影效果)
- ❌ 3D effects / perspective (3D透视)
- ❌ Excessive gradients (夸张的多色渐变)
- ❌ Clip art / cartoon icons (卡通图标)
- ❌ Decorative patterns in background (背景花纹)
- ❌ Glowing effects (发光效果)
- ❌ Too many small icons (过多小图标)
- ❌ 彩虹配色方案
- ❌ 厚重阴影效果
- ❌ 3D效果/透视
- ❌ 夸张的多色渐变
- ❌ 剪贴画/卡通图标
- ❌ 背景装饰图案
- ❌ 发光效果
- ❌ 过多小图标
✓ 理想的视觉效果:
✓ 理想的视觉效果:
- 一眼看上去专业、清晰
- 有适度的视觉吸引力,但不抢眼
- 符合CVPR/NeurIPS论文的审美标准
- 打印友好(灰度模式下也能清晰辨认)
- 像精心设计的学术图表,而不是PPT模板
- 一眼看上去专业、清晰
- 有适度的视觉吸引力,但不抢眼
- 符合CVPR/NeurIPS论文的审美标准
- 适合打印(灰度模式下也能清晰辨认)
- 像精心设计的学术图表,而非PPT模板
What to AVOID (CRITICAL)
绝对避免的问题(关键)
- ❌ Rainbow color schemes (too many colors)
- ❌ Thin, hairline arrows (arrows must be THICK)
- ❌ Unlabeled connections
- ❌ Plain boring rectangles (add some visual interest)
- ❌ Over-decorated with shadows/glows/icons (too flashy)
- ❌ Small text that's unreadable when printed
- ❌ WRONG arrow directions — This is UNACCEPTABLE!
- ❌ 彩虹配色(颜色过多)
- ❌ 纤细箭头(箭头必须加粗)
- ❌ 未标注的连接
- ❌ 完全平淡的方块(需增加一定设计感)
- ❌ 过度装饰(阴影/发光/图标过多)(过于花哨)
- ❌ 字号过小导致打印后无法阅读
- ❌ 箭头方向错误 — 绝对不可接受!
Scope
适用范围
| Figure Type | Quality | Examples |
|---|---|---|
| Architecture diagrams | Excellent | Model architecture, pipeline, encoder-decoder |
| Method illustrations | Excellent | Conceptual diagrams, algorithm flowcharts |
| Conceptual figures | Good | Comparison diagrams, taxonomy trees |
Not for: Statistical plots (use ), photo-realistic images
/paper-figure| 图表类型 | 生成质量 | 示例 |
|---|---|---|
| 架构图 | 优秀 | 模型架构、流水线、编码器-解码器 |
| 方法示意图 | 优秀 | 概念图、算法流程图 |
| 概念图表 | 良好 | 对比图、分类树 |
不适用: 统计图表(请使用)、照片级写实图像
/paper-figureWorkflow: MUST EXECUTE ALL STEPS
工作流:必须执行所有步骤
Step 0: Pre-flight Check
步骤0:预检查
bash
undefinedbash
undefinedCheck API key
Check API key
if [ -z "$GEMINI_API_KEY" ]; then
echo "ERROR: GEMINI_API_KEY not set"
echo "Get your key from: https://aistudio.google.com/app/apikey"
echo "Set it: export GEMINI_API_KEY='your-key'"
exit 1
fi
if [ -z "$GEMINI_API_KEY" ]; then
echo "ERROR: GEMINI_API_KEY not set"
echo "Get your key from: https://aistudio.google.com/app/apikey"
echo "Set it: export GEMINI_API_KEY='your-key'"
exit 1
fi
Create output directory
Create output directory
mkdir -p figures/ai_generated
undefinedmkdir -p figures/ai_generated
undefinedStep 1: Claude Plans the Figure (YOU ARE HERE)
步骤1:Claude规划图表(当前步骤)
CRITICAL: Claude must first analyze the user's request and create a detailed prompt.
Parse the input: $ARGUMENTS
Claude's task:
- Understand what figure the user wants
- Identify all components, connections, data flow
- Create a detailed, structured prompt for Gemini
- Include style requirements AND visual appeal requirements
Prompt Template for Claude to generate:
Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.关键:Claude必须先分析用户需求,生成详细提示词。
解析输入:$ARGUMENTS
Claude的任务:
- 理解用户需要的图表类型
- 识别所有组件、连接关系、数据流
- 为Gemini生成详细结构化的提示词
- 包含风格要求和视觉吸引力要求
Claude需生成的提示词模板:
Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.Visual Style: 科研风格 (Academic Professional Style)
Visual Style: 科研风格 (Academic Professional Style)
目标:平衡 — 既不保守也不花哨
目标:平衡 — 既不保守也不花哨
DO (应该有):
DO (应该有):
- Subtle gradients — 同色系淡雅渐变(如 #2563EB → #3B82F6),不是多色炫彩
- Rounded corners — 圆角矩形(6-10px),现代感
- Clear visual hierarchy — 通过大小、深浅区分层次
- Internal structure — 大模块内显示子组件结构
- Consistent color coding — 统一的3-4色方案
- Professional polish — 精致但不夸张
- Subtle gradients — 同色系淡雅渐变(如 #2563EB → #3B82F6),不是多色炫彩
- Rounded corners — 圆角矩形(6-10px),现代感
- Clear visual hierarchy — 通过大小、深浅区分层次
- Internal structure — 大模块内显示子组件结构
- Consistent color coding — 统一的3-4色方案
- Professional polish — 精致但不夸张
DON'T (不要有):
DON'T (不要有):
- ❌ Rainbow/multi-color gradients (彩虹渐变)
- ❌ Heavy drop shadows (重阴影)
- ❌ 3D effects / perspective (3D效果)
- ❌ Glowing effects (发光效果)
- ❌ Excessive decorative icons (过多装饰图标)
- ❌ Plain boring rectangles (完全平淡的方块)
- ❌ Rainbow/multi-color gradients (彩虹渐变)
- ❌ Heavy drop shadows (重阴影)
- ❌ 3D effects / perspective (3D效果)
- ❌ Glowing effects (发光效果)
- ❌ Excessive decorative icons (过多装饰图标)
- ❌ Plain boring rectangles (完全平淡的方块)
理想效果:
理想效果:
像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力
像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力
Figure Type
Figure Type
[Architecture Diagram / Pipeline / Comparison / etc.]
[Architecture Diagram / Pipeline / Comparison / etc.]
Components to Include (BE SPECIFIC ABOUT CONTENT)
Components to Include (BE SPECIFIC ABOUT CONTENT)
- [Component 1]:
- Label: "[exact text]"
- Sub-label: "[smaller text below]"
- Position: [left/center/right, top/middle/bottom]
- Style: [border color, fill, internal structure]
- [Component 1]:
- Label: "[exact text]"
- Sub-label: "[smaller text below]"
- Position: [left/center/right, top/middle/bottom]
- Style: [border color, fill, internal structure]
Layout
Layout
- Direction: [left-to-right / top-to-bottom]
- Spacing: [tight / normal / loose]
- Grouping: [how components should be grouped]
- Direction: [left-to-right / top-to-bottom]
- Spacing: [tight / normal / loose]
- Grouping: [how components should be grouped]
Connections (BE EXPLICIT ABOUT DIRECTION)
Connections (BE EXPLICIT ABOUT DIRECTION)
EXACT arrow specifications:
- [Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
- [Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!
EXACT arrow specifications:
- [Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
- [Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!
Style Requirements (CVPR/ICLR/NeurIPS Standard)
Style Requirements (CVPR/ICLR/NeurIPS Standard)
Visual Style
Visual Style
- Color palette: Professional academic colors
- Inputs: Green (#10B981)
- Encoders: Blue (#2563EB)
- Fusion modules: Purple (#7C3AED)
- Outputs: Orange (#EA580C)
- Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
- Background: Clean white, no patterns
- Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
- Subtle shadows for depth effect
- Print-friendly (must work in grayscale)
- Color palette: Professional academic colors
- Inputs: Green (#10B981)
- Encoders: Blue (#2563EB)
- Fusion modules: Purple (#7C3AED)
- Outputs: Orange (#EA580C)
- Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
- Background: Clean white, no patterns
- Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
- Subtle shadows for depth effect
- Print-friendly (must work in grayscale)
CRITICAL: Arrow & Data Flow Requirements
CRITICAL: Arrow & Data Flow Requirements
- ALL arrows must be VERY THICK - minimum 5-6px stroke width
- ALL arrows must have CLEAR arrowheads - large, visible triangular heads
- ALL arrows must be BLACK or DARK GRAY - not colored
- Label EVERY arrow with what data flows through it
- VERIFY arrow direction - each arrow MUST point to the correct target
- No ambiguous connections - every arrow should have a clear source and destination
- ALL arrows must be VERY THICK - minimum 5-6px stroke width
- ALL arrows must have CLEAR arrowheads - large, visible triangular heads
- ALL arrows must be BLACK or DARK GRAY - not colored
- Label EVERY arrow with what data flows through it
- VERIFY arrow direction - each arrow MUST point to the correct target
- No ambiguous connections - every arrow should have a clear source and destination
Logic Clarity Requirements
Logic Clarity Requirements
- Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
- No crossing arrows - reorganize layout to avoid arrow crossings
- Consistent direction - maintain left-to-right or top-to-bottom flow throughout
- Group related components - use subtle background boxes or spacing to group modules
- Clear hierarchy - main components larger, sub-components smaller
- Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
- No crossing arrows - reorganize layout to avoid arrow crossings
- Consistent direction - maintain left-to-right or top-to-bottom flow throughout
- Group related components - use subtle background boxes or spacing to group modules
- Clear hierarchy - main components larger, sub-components smaller
Additional Requirements
Additional Requirements
[Any specific requirements from user]
undefined[Any specific requirements from user]
undefinedStep 2: Gemini Layout Optimization (gemini-3-pro)
步骤2:Gemini布局优化(gemini-3-pro)
Claude sends the initial prompt to Gemini (gemini-3-pro) for layout optimization.
bash
#!/bin/bashClaude将初始提示词发送给Gemini(gemini-3-pro)进行布局优化。
bash
#!/bin/bashStep 2: Optimize layout using Gemini gemini-3-pro
Step 2: Optimize layout using Gemini gemini-3-pro
This step refines component positioning and spacing
This step refines component positioning and spacing
set -e
OUTPUT_DIR="figures/ai_generated"
mkdir -p "$OUTPUT_DIR"
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"
set -e
OUTPUT_DIR="figures/ai_generated"
mkdir -p "$OUTPUT_DIR"
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"
The initial prompt from Claude
The initial prompt from Claude
INITIAL_PROMPT='[Claude fills in the detailed prompt here]'
INITIAL_PROMPT='[Claude fills in the detailed prompt here]'
Layout optimization request
Layout optimization request
LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.
Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:
$INITIAL_PROMPT
Provide:
- Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
- Spacing Recommendations: Specific spacing between components
- Grouping Strategy: Which components should be visually grouped together
- Arrow Routing: Optimal paths for arrows to avoid crossings
- Visual Hierarchy: Size recommendations for main vs sub-components
Output a DETAILED layout specification that will be used for rendering."
LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.
Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:
$INITIAL_PROMPT
Provide:
- Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
- Spacing Recommendations: Specific spacing between components
- Grouping Strategy: Which components should be visually grouped together
- Arrow Routing: Optimal paths for arrows to avoid crossings
- Visual Hierarchy: Size recommendations for main vs sub-components
Output a DETAILED layout specification that will be used for rendering."
Build JSON payload
Build JSON payload
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}]
}
with open("/tmp/gemini_layout_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("Layout request created")
PYTHON
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}]
}
with open("/tmp/gemini_layout_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("Layout request created")
PYTHON
Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy)
Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)
Extract layout description
Extract layout description
LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
try:
print(data['candidates'][0]['content']['parts'][0]['text'])
except:
print('Error extracting layout')
")
echo "=== Layout Optimization Complete ==="
echo "$LAYOUT_DESCRIPTION"
echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"
undefinedLAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
try:
print(data['candidates'][0]['content']['parts'][0]['text'])
except:
print('Error extracting layout')
")
echo "=== Layout Optimization Complete ==="
echo "$LAYOUT_DESCRIPTION"
echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"
undefinedStep 3: Gemini Style Verification (gemini-3-pro)
步骤3:Gemini风格验证(gemini-3-pro)
Claude sends the optimized layout to Gemini for CVPR/NeurIPS style verification.
bash
#!/bin/bashClaude将优化后的布局发送给Gemini进行CVPR/NeurIPS风格验证。
bash
#!/bin/bashStep 3: Verify and enhance style compliance using Gemini gemini-3-pro
Step 3: Verify and enhance style compliance using Gemini gemini-3-pro
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=$API_KEY"
Read layout from previous step
Read layout from previous step
LAYOUT=$(cat figures/ai_generated/layout_description.txt)
LAYOUT=$(cat figures/ai_generated/layout_description.txt)
Style verification request
Style verification request
STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.
Review and ENHANCE this figure specification for top-tier conference compliance:
$LAYOUT
Ensure compliance with:
- Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
- Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
- Font Standards: Sans-serif, minimum 14pt, readable in print
- Visual Appeal (科研风格):
- ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
- ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients
Output an ENHANCED figure specification with explicit style instructions for rendering."
STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.
Review and ENHANCE this figure specification for top-tier conference compliance:
$LAYOUT
Ensure compliance with:
- Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
- Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
- Font Standards: Sans-serif, minimum 14pt, readable in print
- Visual Appeal (科研风格):
- ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
- ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients
Output an ENHANCED figure specification with explicit style instructions for rendering."
Build JSON payload
Build JSON payload
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}]
}
with open("/tmp/gemini_style_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("Style request created")
PYTHON
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}]
}
with open("/tmp/gemini_style_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("Style request created")
PYTHON
Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy)
Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)
Extract style-enhanced specification
Extract style-enhanced specification
STYLE_SPEC=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
try:
print(data['candidates'][0]['content']['parts'][0]['text'])
except:
print('Error extracting style spec')
")
echo "=== Style Verification Complete ==="
echo "$STYLE_SPEC"
echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"
undefinedSTYLE_SPEC=$(echo "$RESPONSE" | python3 -c "
import sys, json
data = json.load(sys.stdin)
try:
print(data['candidates'][0]['content']['parts'][0]['text'])
except:
print('Error extracting style spec')
")
echo "=== Style Verification Complete ==="
echo "$STYLE_SPEC"
echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"
undefinedStep 4: Paperbanana Image Rendering (gemini-3-pro-image-preview)
步骤4:Paperbanana图像渲染(gemini-3-pro-image-preview)
Claude sends the optimized, style-verified specification to Paperbanana for rendering.
bash
#!/bin/bashClaude将经过优化和风格验证的规范发送给Paperbanana进行渲染。
bash
#!/bin/bashStep 4: Render image using Paperbanana (gemini-3-pro-image-preview)
Step 4: Render image using Paperbanana (gemini-3-pro-image-preview)
Internal codename: Nano Banana Pro
Internal codename: Nano Banana Pro
Use DIRECT connection (no proxy) - proxy causes SSL errors
Use DIRECT connection (no proxy) - proxy causes SSL errors
set -e
OUTPUT_DIR="figures/ai_generated"
mkdir -p "$OUTPUT_DIR"
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY"
set -e
OUTPUT_DIR="figures/ai_generated"
mkdir -p "$OUTPUT_DIR"
API_KEY="${GEMINI_API_KEY}"
URL="https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY"
Read the style-enhanced specification from previous step
Read the style-enhanced specification from previous step
STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt)
STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt)
Add rendering instructions
Add rendering instructions
RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:
$STYLE_SPEC
RENDERING REQUIREMENTS:
- Output a clean, professional diagram suitable for CVPR/NeurIPS submission
- Use vector-quality rendering with sharp edges and clear text
- Ensure all elements are properly aligned and spaced
- The diagram should be immediately understandable at a glance"
RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:
$STYLE_SPEC
RENDERING REQUIREMENTS:
- Output a clean, professional diagram suitable for CVPR/NeurIPS submission
- Use vector-quality rendering with sharp edges and clear text
- Ensure all elements are properly aligned and spaced
- The diagram should be immediately understandable at a glance"
Build JSON payload using Python for proper escaping
Build JSON payload using Python for proper escaping
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}],
"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
with open("/tmp/gemini_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("JSON payload created")
PYTHON
python3 << PYTHON
import json
payload = {
"contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}],
"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
with open("/tmp/gemini_request.json", "w") as f:
json.dump(payload, f, indent=2)
print("JSON payload created")
PYTHON
Call Paperbanana API WITHOUT proxy (direct connection works better)
Call Paperbanana API WITHOUT proxy (direct connection works better)
RESPONSE=$(curl -s --max-time 180
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)
RESPONSE=$(curl -s --max-time 180
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)
Check for error
Check for error
if echo "$RESPONSE" | grep -q '"error"'; then
echo "API Error:"
echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE"
exit 1
fi
if echo "$RESPONSE" | grep -q '"error"'; then
echo "API Error:"
echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE"
exit 1
fi
Extract and save image
Extract and save image
echo "$RESPONSE" | python3 << 'PYTHON'
import sys, json, base64
from pathlib import Path
output_dir = Path("figures/ai_generated")
data = json.load(sys.stdin)
try:
parts = data['candidates'][0]['content']['parts']
iteration = 1 # Claude increments this each iteration
for part in parts:
if 'text' in part:
print(f"\n[Paperbanana]: {part['text'][:200]}...")
elif 'inlineData' in part:
img_data = base64.b64decode(part['inlineData']['data'])
img_path = output_dir / f"figure_v{iteration}.png"
with open(img_path, "wb") as f:
f.write(img_data)
print(f"\n✅ Image saved: {img_path}")
print(f" Size: {len(img_data)/1024:.1f} KB")except Exception as e:
print(f"Parse error: {e}")
print(f"Raw response: {str(data)[:500]}")
PYTHON
undefinedecho "$RESPONSE" | python3 << 'PYTHON'
import sys, json, base64
from pathlib import Path
output_dir = Path("figures/ai_generated")
data = json.load(sys.stdin)
try:
parts = data['candidates'][0]['content']['parts']
iteration = 1 # Claude increments this each iteration
for part in parts:
if 'text' in part:
print(f"\n[Paperbanana]: {part['text'][:200]}...")
elif 'inlineData' in part:
img_data = base64.b64decode(part['inlineData']['data'])
img_path = output_dir / f"figure_v{iteration}.png"
with open(img_path, "wb") as f:
f.write(img_data)
print(f"\n✅ Image saved: {img_path}")
print(f" Size: {len(img_data)/1024:.1f} KB")except Exception as e:
print(f"Parse error: {e}")
print(f"Raw response: {str(data)[:500]}")
PYTHON
undefinedStep 5: Claude STRICT Visual Review & Scoring (MANDATORY)
步骤5:Claude严格视觉审核与评分(必须执行)
Claude MUST read the generated image and perform a STRICT review:
- Visual Analysis: What does the image show in detail?
- Strengths: What's good about it?
- STRICT Verification: Check EVERY item below
- Score: Rate 1-10 (10 = perfect) — BE STRICT!
STRICT Review Template:
markdown
undefinedClaude必须查看生成的图像并执行严格审核:
- 视觉分析:详细描述图像内容
- 优势:图像的可取之处
- 严格验证:检查以下每一项
- 评分:1-10分(10分=完美)——必须严格评分!
严格审核模板:
markdown
undefinedClaude's STRICT Review of Figure v{N}
Claude's STRICT Review of Figure v{N}
What I See
What I See
[Describe the generated image in DETAIL - every block, every arrow]
[Describe the generated image in DETAIL - every block, every arrow]
Strengths
Strengths
- [Strength 1]
- [Strength 2]
- [Strength 1]
- [Strength 2]
═══════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════
STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9)
STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9)
═══════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════
A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6)
A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6)
Check EACH arrow:
- Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?
Check EACH arrow:
- Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
- Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?
B. Block Content Verification (any failure = score ≤ 7)
B. Block Content Verification (any failure = score ≤ 7)
Check EACH block:
- Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 7 "[Name]": Has correct label? Has sub-label? Content correct?
Check EACH block:
- Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
- Block 7 "[Name]": Has correct label? Has sub-label? Content correct?
C. Arrow Visibility (any failure = score ≤ 7)
C. Arrow Visibility (any failure = score ≤ 7)
- ALL arrows are THICK (≥5px visible stroke)
- ALL arrows have CLEAR arrowheads (large triangular heads)
- ALL arrows are BLACK or DARK GRAY (not light colors)
- NO arrows are too thin or invisible
- ALL arrows are THICK (≥5px visible stroke)
- ALL arrows have CLEAR arrowheads (large triangular heads)
- ALL arrows are BLACK or DARK GRAY (not light colors)
- NO arrows are too thin or invisible
D. Arrow Labels (any failure = score ≤ 7)
D. Arrow Labels (any failure = score ≤ 7)
- EVERY arrow has a text label
- Labels are readable (not too small)
- Labels correctly describe the data flowing
- EVERY arrow has a text label
- Labels are readable (not too small)
- Labels correctly describe the data flowing
E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8)
E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8)
- 有适度视觉吸引力 — 有subtle渐变或圆角,但不夸张
- 不是平淡方块 — 有一定设计感
- 不过度装饰 — 没有重阴影、发光效果、彩虹配色
- 专业学术风格 — 像CVPR论文中的图表,不是PPT模板
- Internal structure visible — 大模块内部显示子组件结构
- Color palette: 3-4种协调色 — 不是彩虹,也不是纯黑白
- 有适度视觉吸引力 — 有subtle渐变或圆角,但不夸张
- 不是平淡方块 — 有一定设计感
- 不过度装饰 — 没有重阴影、发光效果、彩虹配色
- 专业学术风格 — 像CVPR论文中的图表,不是PPT模板
- Internal structure visible — 大模块内部显示子组件结构
- Color palette: 3-4种协调色 — 不是彩虹,也不是纯黑白
E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found)
E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found)
- NO heavy drop shadows (重阴影 = too flashy)
- NO glowing effects (发光效果 = too flashy)
- NO rainbow gradients (彩虹渐变 = unprofessional)
- NO excessive decorative icons (过多装饰图标 = distracting)
- NO heavy drop shadows (重阴影 = too flashy)
- NO glowing effects (发光效果 = too flashy)
- NO rainbow gradients (彩虹渐变 = unprofessional)
- NO excessive decorative icons (过多装饰图标 = distracting)
F. Layout & Flow (any failure = score ≤ 7)
F. Layout & Flow (any failure = score ≤ 7)
- Clean horizontal left-to-right flow
- No arrow crossings
- Data flow traceable in 5 seconds
- Balanced spacing (not cramped, not sparse)
- Clean horizontal left-to-right flow
- No arrow crossings
- Data flow traceable in 5 seconds
- Balanced spacing (not cramped, not sparse)
G. Style Compliance
G. Style Compliance
- CVPR/NeurIPS professional style
- Color palette appropriate (not rainbow)
- Font readable
- Print-friendly (grayscale test)
- CVPR/NeurIPS professional style
- Color palette appropriate (not rainbow)
- Font readable
- Print-friendly (grayscale test)
═══════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════
Issues Found (BE SPECIFIC)
Issues Found (BE SPECIFIC)
- [Issue 1]: [EXACTLY what is wrong] → [How to fix]
- [Issue 2]: [EXACTLY what is wrong] → [How to fix]
- [Issue 3]: [EXACTLY what is wrong] → [How to fix]
- [Issue 1]: [EXACTLY what is wrong] → [How to fix]
- [Issue 2]: [EXACTLY what is wrong] → [How to fix]
- [Issue 3]: [EXACTLY what is wrong] → [How to fix]
Score: X/10
Score: X/10
STRICT Score Breakdown Guide:
STRICT Score Breakdown Guide:
- 10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
- 9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
- 8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
- 7: Usable but has clear problems. 箭头或内容有问题。
- 6: Has arrow direction errors (箭头指向错误) OR missing major components.
- 1-5: Major issues. Unacceptable.
- 10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
- 9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
- 8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
- 7: Usable but has clear problems. 箭头或内容有问题。
- 6: Has arrow direction errors (箭头指向错误) OR missing major components.
- 1-5: Major issues. Unacceptable.
Visual Style Scoring (视觉风格评分):
Visual Style Scoring (视觉风格评分):
- 太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
- 太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
- 恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10
- 太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
- 太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
- 恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10
Verdict
Verdict
[ ] ACCEPT (score ≥ 9 AND all critical checks pass)
[ ] REFINE (score < 9 OR any critical check fails)
If REFINE: List the EXACT issues that must be fixed
undefined[ ] ACCEPT (score ≥ 9 AND all critical checks pass)
[ ] REFINE (score < 9 OR any critical check fails)
If REFINE: List the EXACT issues that must be fixed
undefinedStep 6: Decision Point
步骤6:决策点
IF score >= 9 AND all critical checks pass:
→ Accept figure, generate LaTeX snippet, DONE
ELSE IF iteration < MAX_ITERATIONS:
→ Generate SPECIFIC improvement prompt based on EXACT issues
→ Go to Step 2 (Gemini Layout) with refined prompt
ELSE:
→ Max iterations reached, show best version
→ Ask user if they want to continue or acceptIF score >= 9 AND all critical checks pass:
→ 接受图表,生成LaTeX代码片段,完成流程
ELSE IF iteration < MAX_ITERATIONS:
→ 根据具体问题生成针对性优化提示词
→ 携带优化后的提示词返回步骤2(Gemini布局优化)
ELSE:
→ 已达最大迭代次数,展示最优版本
→ 询问用户是否继续优化或接受当前版本Step 7: Generate Improvement Prompt (for refinement)
步骤7:生成优化提示词(用于迭代优化)
Claude generates TARGETED improvement prompt with EXACT issues:
Refine this academic diagram. This is iteration {N}.Claude需基于具体问题生成针对性的优化提示词:
Refine this academic diagram. This is iteration {N}.═══════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════
CRITICAL: Fix These EXACT Issues (from previous review)
CRITICAL: Fix These EXACT Issues (from previous review)
═══════════════════════════════════════════════════════════════
═══════════════════════════════════════════════════════════════
Arrow Direction Errors (MUST FIX):
Arrow Direction Errors (MUST FIX):
- EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.
- EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.
Missing Arrow Labels (MUST FIX):
Missing Arrow Labels (MUST FIX):
- Arrow from [A] to [B] is missing label "[data type]"
- ...
- Arrow from [A] to [B] is missing label "[data type]"
- ...
Block Content Issues (MUST FIX):
Block Content Issues (MUST FIX):
- Block "[Name]" has wrong label. Should be "[correct label]"
- ...
- Block "[Name]" has wrong label. Should be "[correct label]"
- ...
Visual Appeal Issues (SHOULD FIX):
Visual Appeal Issues (SHOULD FIX):
- Blocks are too plain. Add [gradients/shadows/internal structure]
- ...
- Blocks are too plain. Add [gradients/shadows/internal structure]
- ...
Keep These Good Elements:
Keep These Good Elements:
- [What to preserve from previous version]
- [What to preserve from previous version]
Generate the improved figure with ALL issues fixed.
Generate the improved figure with ALL issues fixed.
undefinedundefinedStep 8: Final Output
步骤8:最终输出
When figure is accepted (score ≥ 9):
latex
% === AI-Generated Figure ===
\begin{figure*}[t]
\centering
\includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
\caption{[Caption based on user's original request].}
\label{fig:[label]}
\end{figure*}当图表被接受(分数≥9)时:
latex
% === AI-Generated Figure ===
\begin{figure*}[t]
\centering
\includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
\caption{[Caption based on user's original request].}
\label{fig:[label]}
\end{figure*}Key Rules (MUST FOLLOW - STRICT)
核心规则(必须严格遵守)
- NEVER skip the review step — Always read and STRICTLY score the image
- NEVER accept score < 9 — Keep refining until excellence
- VERIFY EVERY ARROW DIRECTION — Wrong direction = automatic fail (score ≤ 6)
- VERIFY EVERY BLOCK CONTENT — Wrong content = automatic fail (score ≤ 7)
- BE SPECIFIC in feedback — "Arrow from A to B points to wrong target C" not "arrow is wrong"
- SAVE all iterations — Keep version history for comparison
- Claude is the STRICT boss — Accept only excellence, not "good enough"
- ARROW CORRECTNESS IS NON-NEGOTIABLE — Any wrong arrow direction = reject
- VISUAL APPEAL MATTERS — Plain boring figures = score ≤ 8
- Target score is 9 — Not 8, not "good enough"
- USE MULTI-STAGE WORKFLOW — Claude → Gemini Layout → Gemini Style → Paperbanana → Claude Review
- USE CORRECT MODELS — gemini-3-pro for reasoning, gemini-3-pro-image-preview for rendering
- 绝不跳过审核步骤 — 必须查看图像并严格评分
- 绝不接受分数<9的图表 — 持续优化直至达到优秀标准
- 验证每一个箭头方向 — 方向错误直接判定不合格(分数≤6)
- 验证每一个组件内容 — 内容错误直接判定不合格(分数≤7)
- 反馈需具体 — 需说明“从A到B的箭头指向了错误的目标C”,而非“箭头有问题”
- 保存所有迭代版本 — 保留版本历史用于对比
- Claude是严格的审核者 — 只接受优秀成果,而非“足够好”
- 箭头正确性不可妥协 — 任何箭头方向错误都需拒绝
- 视觉吸引力很重要 — 完全平淡的图表分数≤8
- 目标分数为9 — 不是8,也不是“足够好”
- 使用多阶段工作流 — Claude → Gemini布局优化 → Gemini风格验证 → Paperbanana渲染 → Claude审核
- 使用正确模型 — gemini-3-pro用于推理,gemini-3-pro-image-preview用于渲染
Output Structure
输出结构
figures/ai_generated/
├── layout_description.txt # Step 2: Gemini layout optimization output
├── style_spec.txt # Step 3: Gemini style verification output
├── figure_v1.png # Iteration 1 (Paperbanana render)
├── figure_v2.png # Iteration 2
├── figure_v3.png # Iteration 3
├── figure_final.png # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex # LaTeX snippet
└── review_log.json # All review scores and STRICT feedbackfigures/ai_generated/
├── layout_description.txt # Step 2: Gemini layout optimization output
├── style_spec.txt # Step 3: Gemini style verification output
├── figure_v1.png # Iteration 1 (Paperbanana render)
├── figure_v2.png # Iteration 2
├── figure_v3.png # Iteration 3
├── figure_final.png # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex # LaTeX snippet
└── review_log.json # All review scores and STRICT feedbackModel Summary
模型汇总
| Stage | Model | Purpose |
|---|---|---|
| Step 1 | Claude | Parse request, create initial prompt |
| Step 2 | gemini-3-pro | Layout optimization (positioning, spacing, grouping) |
| Step 3 | gemini-3-pro | CVPR/NeurIPS style verification |
| Step 4 | gemini-3-pro-image-preview (Paperbanana) | High-quality image rendering |
| Step 5 | Claude | STRICT visual review and scoring |
| 阶段 | 模型 | 用途 |
|---|---|---|
| 步骤1 | Claude | 解析需求,生成初始提示词 |
| 步骤2 | gemini-3-pro | 布局优化(组件位置、间距、分组) |
| 步骤3 | gemini-3-pro | CVPR/NeurIPS风格验证 |
| 步骤4 | gemini-3-pro-image-preview (Paperbanana) | 高质量图像渲染 |
| 步骤5 | Claude | 严格视觉审核与评分 |