paper-illustration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paper Illustration: Multi-Stage Claude-Supervised Figure Generation

论文插图:多阶段Claude监督式图表生成

Generate publication-quality illustrations using a multi-stage workflow with Claude as the STRICT supervisor/reviewer.
借助多阶段工作流,以Claude作为严格的监督者/审核者,生成达到出版级别的插图。

Core Design Philosophy

核心设计理念

┌──────────────────────────────────────────────────────────────────────────┐
│                    MULTI-STAGE ITERATIVE WORKFLOW                        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Request                                                           │
│       │                                                                  │
│       ▼                                                                  │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 1: Parse request, create initial prompt     │
│   │  (Planner)  │                                                        │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 2: Optimize layout description               │
│   │ (gemini-3-pro)│      - Refine component positioning                    │
│   │  Layout     │      - Optimize spacing and grouping                   │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 3: CVPR/NeurIPS style verification          │
│   │ (gemini-3-pro)│      - Check color palette compliance                  │
│   │  Style      │      - Verify arrow and font standards                 │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │ Paperbanana │ ◄─── Step 4: Render final image                       │
│   │ (gemini-3-  │      - High-quality image generation                   │
│   │ pro-image)  │      - Internal codename: Nano Banana Pro              │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 5: STRICT visual review + SCORE (1-10)      │
│   │  (Reviewer) │      - Verify EVERY arrow direction                    │
│   │   STRICT!   │      - Verify EVERY block content                      │
│   └──────┬──────┘      - Verify aesthetics & visual appeal               │
│          │                                                               │
│          ▼                                                               │
│   Score ≥ 9? ──YES──► Accept & Output                                    │
│          │                                                               │
│          NO                                                              │
│          │                                                               │
│          ▼                                                               │
│   Generate SPECIFIC improvement feedback ──► Loop back to Step 2        │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│                    MULTI-STAGE ITERATIVE WORKFLOW                        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Request                                                           │
│       │                                                                  │
│       ▼                                                                  │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 1: Parse request, create initial prompt     │
│   │  (Planner)  │                                                        │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 2: Optimize layout description               │
│   │ (gemini-3-pro)│      - Refine component positioning                    │
│   │  Layout     │      - Optimize spacing and grouping                   │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Gemini    │ ◄─── Step 3: CVPR/NeurIPS style verification          │
│   │ (gemini-3-pro)│      - Check color palette compliance                  │
│   │  Style      │      - Verify arrow and font standards                 │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │ Paperbanana │ ◄─── Step 4: Render final image                       │
│   │ (gemini-3-  │      - High-quality image generation                   │
│   │ pro-image)  │      - Internal codename: Nano Banana Pro              │
│   └──────┬──────┘                                                        │
│          │                                                               │
│          ▼                                                               │
│   ┌─────────────┐                                                        │
│   │   Claude    │ ◄─── Step 5: STRICT visual review + SCORE (1-10)      │
│   │  (Reviewer) │      - Verify EVERY arrow direction                    │
│   │   STRICT!   │      - Verify EVERY block content                      │
│   └──────┬──────┘      - Verify aesthetics & visual appeal               │
│          │                                                               │
│          ▼                                                               │
│   Score ≥ 9? ──YES──► Accept & Output                                    │
│          │                                                               │
│          NO                                                              │
│          │                                                               │
│          ▼                                                               │
│   Generate SPECIFIC improvement feedback ──► Loop back to Step 2        │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Constants

常量定义

  • IMAGE_MODEL =
    gemini-3-pro-image-preview
    — Paperbanana (Nano Banana Pro) for image rendering
  • REASONING_MODEL =
    gemini-3-pro-preview
    — Gemini for layout optimization and style checking
  • MAX_ITERATIONS = 5 — Maximum refinement rounds
  • TARGET_SCORE = 9 — Minimum acceptable score (1-10) — RAISED FOR QUALITY
  • OUTPUT_DIR =
    figures/ai_generated/
    — Output directory
  • API_KEY_ENV =
    GEMINI_API_KEY
    — Environment variable
  • IMAGE_MODEL =
    gemini-3-pro-image-preview
    — 用于图像渲染的Paperbanana(内部代号:Nano Banana Pro)
  • REASONING_MODEL =
    gemini-3-pro-preview
    — 用于布局优化和风格检查的Gemini
  • MAX_ITERATIONS = 5 — 最大优化轮次
  • TARGET_SCORE = 9 — 最低可接受分数(1-10)——为保证质量调高标准
  • OUTPUT_DIR =
    figures/ai_generated/
    — 输出目录
  • API_KEY_ENV =
    GEMINI_API_KEY
    — 环境变量

CVPR/ICLR/NeurIPS Top-Tier Conference Style Guide

CVPR/ICLR/NeurIPS顶会风格指南

What "CVPR Style" Actually Means:
“CVPR风格”的实际定义:

Visual Standards

视觉标准

  • Clean white background — No decorative patterns or gradients (unless subtle)
  • Sans-serif fonts — Arial, Helvetica, or Computer Modern; minimum 14pt
  • Subtle color palette — Not rainbow colors; use 3-5 coordinated colors
  • Print-friendly — Must be readable in grayscale (many reviewers print papers)
  • Professional borders — Thin (2-3px), solid colors, not flashy
  • 纯白背景 — 无装饰性图案或渐变(除非非常淡雅)
  • 无衬线字体 — Arial、Helvetica或Computer Modern;最小字号14pt
  • 淡雅配色方案 — 避免彩虹色系;使用3-5种协调配色
  • 适合打印 — 灰度模式下仍需清晰可读(许多审稿人会打印论文)
  • 专业边框 — 细边框(2-3px)、纯色,避免花哨设计

Layout Standards

布局标准

  • Horizontal flow — Left-to-right is the standard for pipelines
  • Clear grouping — Use subtle background boxes to group related modules
  • Consistent sizing — Similar components should have similar sizes
  • Balanced whitespace — Not cramped, not sparse
  • 横向流 — 流水线图标准为从左到右
  • 清晰分组 — 使用淡雅背景框将相关模块分组
  • 尺寸统一 — 同类组件尺寸应保持一致
  • 留白均衡 — 既不过于拥挤,也不过于松散

Arrow Standards (MOST CRITICAL)

箭头标准(最关键)

  • Thick strokes — 4-6px minimum (thin arrows disappear when printed)
  • Clear arrowheads — Large, filled triangular heads
  • Dark colors — Black or dark gray (#333333); avoid colored arrows
  • Labeled — Every arrow should indicate what data flows through it
  • No crossings — Reorganize layout to avoid arrow crossings
  • CORRECT DIRECTION — Arrows must point to the RIGHT target!
  • 粗线条 — 最小4-6px(细箭头打印后会消失)
  • 清晰箭头头 — 大尺寸实心三角箭头
  • 深色 — 黑色或深灰色(#333333);避免彩色箭头
  • 带标签 — 每个箭头都应标注流经的数据内容
  • 无交叉 — 调整布局避免箭头交叉
  • 方向正确 — 箭头必须指向正确的目标!

Visual Appeal (科研风格 - Professional Academic Style)

视觉吸引力(科研风格 - 专业学术风格)

目标:既不保守也不花哨,找到平衡点
目标:既不保守也不花哨,找到平衡点

✅ 应该有的视觉元素:

✅ 应该有的视觉元素:

  • Subtle gradient fills — 淡雅的渐变填充(同色系从浅到深),不是炫彩
  • Rounded corners — 圆角矩形(6-10px radius),现代感但不夸张
  • Clear visual hierarchy — 通过大小、颜色深浅区分层次
  • Consistent color coding — 统一的配色方案(3-4种主色)
  • Internal structure — 大模块内部显示子组件(如Encoder内部的layer结构)
  • Professional typography — 清晰的标签,适当的字号层次
  • 淡雅渐变填充 — 同色系从浅到深的淡雅渐变,而非炫彩渐变
  • 圆角矩形 — 圆角半径6-10px,兼具现代感但不过度夸张
  • 清晰视觉层次 — 通过尺寸、颜色深浅区分层级
  • 统一配色编码 — 统一的配色方案(3-4种主色)
  • 内部结构 — 大模块内显示子组件结构(如Encoder内部的layer结构)
  • 专业排版 — 清晰的标签,合理的字号层次

✅ 配色建议(学术专业):

✅ 配色建议(学术专业):

  • Inputs: 柔和的绿色系 (#10B981 / #34D399)
  • Encoders: 专业的蓝色系 (#2563EB / #3B82F6)
  • Fusion: 优雅的紫色系 (#7C3AED / #8B5CF6)
  • Outputs: 温暖的橙色系 (#EA580C / #F97316)
  • Arrows: 黑色或深灰 (#333333 / #1F2937)
  • Background: 纯白 (#FFFFFF),不要花纹
  • 输入层:柔和绿色系 (#10B981 / #34D399)
  • 编码器:专业蓝色系 (#2563EB / #3B82F6)
  • 融合层:优雅紫色系 (#7C3AED / #8B5CF6)
  • 输出层:温暖橙色系 (#EA580C / #F97316)
  • 箭头:黑色或深灰 (#333333 / #1F2937)
  • 背景:纯白 (#FFFFFF),无花纹

❌ 要避免的过度装饰:

❌ 要避免的过度装饰:

  • ❌ Rainbow color schemes (彩虹配色)
  • ❌ Heavy drop shadows (重阴影效果)
  • ❌ 3D effects / perspective (3D透视)
  • ❌ Excessive gradients (夸张的多色渐变)
  • ❌ Clip art / cartoon icons (卡通图标)
  • ❌ Decorative patterns in background (背景花纹)
  • ❌ Glowing effects (发光效果)
  • ❌ Too many small icons (过多小图标)
  • ❌ 彩虹配色方案
  • ❌ 厚重阴影效果
  • ❌ 3D效果/透视
  • ❌ 夸张的多色渐变
  • ❌ 剪贴画/卡通图标
  • ❌ 背景装饰图案
  • ❌ 发光效果
  • ❌ 过多小图标

✓ 理想的视觉效果:

✓ 理想的视觉效果:

  • 一眼看上去专业、清晰
  • 适度的视觉吸引力,但不抢眼
  • 符合CVPR/NeurIPS论文的审美标准
  • 打印友好(灰度模式下也能清晰辨认)
  • 精心设计的学术图表,而不是PPT模板
  • 一眼看上去专业、清晰
  • 适度的视觉吸引力,但不抢眼
  • 符合CVPR/NeurIPS论文的审美标准
  • 适合打印(灰度模式下也能清晰辨认)
  • 精心设计的学术图表,而非PPT模板

What to AVOID (CRITICAL)

绝对避免的问题(关键)

  • ❌ Rainbow color schemes (too many colors)
  • ❌ Thin, hairline arrows (arrows must be THICK)
  • ❌ Unlabeled connections
  • ❌ Plain boring rectangles (add some visual interest)
  • Over-decorated with shadows/glows/icons (too flashy)
  • ❌ Small text that's unreadable when printed
  • WRONG arrow directions — This is UNACCEPTABLE!
  • ❌ 彩虹配色(颜色过多)
  • ❌ 纤细箭头(箭头必须加粗)
  • ❌ 未标注的连接
  • ❌ 完全平淡的方块(需增加一定设计感)
  • 过度装饰(阴影/发光/图标过多)(过于花哨)
  • ❌ 字号过小导致打印后无法阅读
  • 箭头方向错误 — 绝对不可接受!

Scope

适用范围

Figure TypeQualityExamples
Architecture diagramsExcellentModel architecture, pipeline, encoder-decoder
Method illustrationsExcellentConceptual diagrams, algorithm flowcharts
Conceptual figuresGoodComparison diagrams, taxonomy trees
Not for: Statistical plots (use
/paper-figure
), photo-realistic images
图表类型生成质量示例
架构图优秀模型架构、流水线、编码器-解码器
方法示意图优秀概念图、算法流程图
概念图表良好对比图、分类树
不适用: 统计图表(请使用
/paper-figure
)、照片级写实图像

Workflow: MUST EXECUTE ALL STEPS

工作流:必须执行所有步骤

Step 0: Pre-flight Check

步骤0:预检查

bash
undefined
bash
undefined

Check API key

Check API key

if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY not set" echo "Get your key from: https://aistudio.google.com/app/apikey" echo "Set it: export GEMINI_API_KEY='your-key'" exit 1 fi
if [ -z "$GEMINI_API_KEY" ]; then echo "ERROR: GEMINI_API_KEY not set" echo "Get your key from: https://aistudio.google.com/app/apikey" echo "Set it: export GEMINI_API_KEY='your-key'" exit 1 fi

Create output directory

Create output directory

mkdir -p figures/ai_generated
undefined
mkdir -p figures/ai_generated
undefined

Step 1: Claude Plans the Figure (YOU ARE HERE)

步骤1:Claude规划图表(当前步骤)

CRITICAL: Claude must first analyze the user's request and create a detailed prompt.
Parse the input: $ARGUMENTS
Claude's task:
  1. Understand what figure the user wants
  2. Identify all components, connections, data flow
  3. Create a detailed, structured prompt for Gemini
  4. Include style requirements AND visual appeal requirements
Prompt Template for Claude to generate:
Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.
关键:Claude必须先分析用户需求,生成详细提示词。
解析输入:$ARGUMENTS
Claude的任务:
  1. 理解用户需要的图表类型
  2. 识别所有组件、连接关系、数据流
  3. 为Gemini生成详细结构化的提示词
  4. 包含风格要求和视觉吸引力要求
Claude需生成的提示词模板:
Create a PROFESSIONAL, VISUALLY APPEALING publication-quality academic diagram following CVPR/ICLR/NeurIPS standards.

Visual Style: 科研风格 (Academic Professional Style)

Visual Style: 科研风格 (Academic Professional Style)

目标:平衡 — 既不保守也不花哨

目标:平衡 — 既不保守也不花哨

DO (应该有):

DO (应该有):

  • Subtle gradients — 同色系淡雅渐变(如 #2563EB → #3B82F6),不是多色炫彩
  • Rounded corners — 圆角矩形(6-10px),现代感
  • Clear visual hierarchy — 通过大小、深浅区分层次
  • Internal structure — 大模块内显示子组件结构
  • Consistent color coding — 统一的3-4色方案
  • Professional polish — 精致但不夸张
  • Subtle gradients — 同色系淡雅渐变(如 #2563EB → #3B82F6),不是多色炫彩
  • Rounded corners — 圆角矩形(6-10px),现代感
  • Clear visual hierarchy — 通过大小、深浅区分层次
  • Internal structure — 大模块内显示子组件结构
  • Consistent color coding — 统一的3-4色方案
  • Professional polish — 精致但不夸张

DON'T (不要有):

DON'T (不要有):

  • ❌ Rainbow/multi-color gradients (彩虹渐变)
  • ❌ Heavy drop shadows (重阴影)
  • ❌ 3D effects / perspective (3D效果)
  • ❌ Glowing effects (发光效果)
  • ❌ Excessive decorative icons (过多装饰图标)
  • ❌ Plain boring rectangles (完全平淡的方块)
  • ❌ Rainbow/multi-color gradients (彩虹渐变)
  • ❌ Heavy drop shadows (重阴影)
  • ❌ 3D effects / perspective (3D效果)
  • ❌ Glowing effects (发光效果)
  • ❌ Excessive decorative icons (过多装饰图标)
  • ❌ Plain boring rectangles (完全平淡的方块)

理想效果:

理想效果:

像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力
像顶会论文中精心设计的架构图 — 专业、清晰、有适度的视觉吸引力

Figure Type

Figure Type

[Architecture Diagram / Pipeline / Comparison / etc.]
[Architecture Diagram / Pipeline / Comparison / etc.]

Components to Include (BE SPECIFIC ABOUT CONTENT)

Components to Include (BE SPECIFIC ABOUT CONTENT)

  1. [Component 1]:
    • Label: "[exact text]"
    • Sub-label: "[smaller text below]"
    • Position: [left/center/right, top/middle/bottom]
    • Style: [border color, fill, internal structure]
  1. [Component 1]:
    • Label: "[exact text]"
    • Sub-label: "[smaller text below]"
    • Position: [left/center/right, top/middle/bottom]
    • Style: [border color, fill, internal structure]

Layout

Layout

  • Direction: [left-to-right / top-to-bottom]
  • Spacing: [tight / normal / loose]
  • Grouping: [how components should be grouped]
  • Direction: [left-to-right / top-to-bottom]
  • Spacing: [tight / normal / loose]
  • Grouping: [how components should be grouped]

Connections (BE EXPLICIT ABOUT DIRECTION)

Connections (BE EXPLICIT ABOUT DIRECTION)

EXACT arrow specifications:
  1. [Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
  2. [Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!
EXACT arrow specifications:
  1. [Component A] → [Component B]: Arrow goes FROM A TO B, label it "[data type]"
  2. [Component C] → [Component D]: Arrow goes FROM C TO D, label it "[data type]" ... VERIFY: Each arrow must point to the CORRECT target!

Style Requirements (CVPR/ICLR/NeurIPS Standard)

Style Requirements (CVPR/ICLR/NeurIPS Standard)

Visual Style

Visual Style

  • Color palette: Professional academic colors
    • Inputs: Green (#10B981)
    • Encoders: Blue (#2563EB)
    • Fusion modules: Purple (#7C3AED)
    • Outputs: Orange (#EA580C)
  • Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
  • Background: Clean white, no patterns
  • Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
  • Subtle shadows for depth effect
  • Print-friendly (must work in grayscale)
  • Color palette: Professional academic colors
    • Inputs: Green (#10B981)
    • Encoders: Blue (#2563EB)
    • Fusion modules: Purple (#7C3AED)
    • Outputs: Orange (#EA580C)
  • Font: Sans-serif (Arial/Helvetica), minimum 14pt, bold for labels
  • Background: Clean white, no patterns
  • Blocks: Rounded rectangles (8-12px radius), subtle gradient fill, colored border (2-3px)
  • Subtle shadows for depth effect
  • Print-friendly (must work in grayscale)

CRITICAL: Arrow & Data Flow Requirements

CRITICAL: Arrow & Data Flow Requirements

  1. ALL arrows must be VERY THICK - minimum 5-6px stroke width
  2. ALL arrows must have CLEAR arrowheads - large, visible triangular heads
  3. ALL arrows must be BLACK or DARK GRAY - not colored
  4. Label EVERY arrow with what data flows through it
  5. VERIFY arrow direction - each arrow MUST point to the correct target
  6. No ambiguous connections - every arrow should have a clear source and destination
  1. ALL arrows must be VERY THICK - minimum 5-6px stroke width
  2. ALL arrows must have CLEAR arrowheads - large, visible triangular heads
  3. ALL arrows must be BLACK or DARK GRAY - not colored
  4. Label EVERY arrow with what data flows through it
  5. VERIFY arrow direction - each arrow MUST point to the correct target
  6. No ambiguous connections - every arrow should have a clear source and destination

Logic Clarity Requirements

Logic Clarity Requirements

  1. Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
  2. No crossing arrows - reorganize layout to avoid arrow crossings
  3. Consistent direction - maintain left-to-right or top-to-bottom flow throughout
  4. Group related components - use subtle background boxes or spacing to group modules
  5. Clear hierarchy - main components larger, sub-components smaller
  1. Data flow must be immediately obvious - viewer should understand the pipeline in 5 seconds
  2. No crossing arrows - reorganize layout to avoid arrow crossings
  3. Consistent direction - maintain left-to-right or top-to-bottom flow throughout
  4. Group related components - use subtle background boxes or spacing to group modules
  5. Clear hierarchy - main components larger, sub-components smaller

Additional Requirements

Additional Requirements

[Any specific requirements from user]
undefined
[Any specific requirements from user]
undefined

Step 2: Gemini Layout Optimization (gemini-3-pro)

步骤2:Gemini布局优化(gemini-3-pro)

Claude sends the initial prompt to Gemini (gemini-3-pro) for layout optimization.
bash
#!/bin/bash
Claude将初始提示词发送给Gemini(gemini-3-pro)进行布局优化。
bash
#!/bin/bash

Step 2: Optimize layout using Gemini gemini-3-pro

Step 2: Optimize layout using Gemini gemini-3-pro

This step refines component positioning and spacing

This step refines component positioning and spacing

set -e
OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"
set -e
OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

The initial prompt from Claude

The initial prompt from Claude

INITIAL_PROMPT='[Claude fills in the detailed prompt here]'
INITIAL_PROMPT='[Claude fills in the detailed prompt here]'

Layout optimization request

Layout optimization request

LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.
Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:
$INITIAL_PROMPT
Provide:
  1. Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
  2. Spacing Recommendations: Specific spacing between components
  3. Grouping Strategy: Which components should be visually grouped together
  4. Arrow Routing: Optimal paths for arrows to avoid crossings
  5. Visual Hierarchy: Size recommendations for main vs sub-components
Output a DETAILED layout specification that will be used for rendering."
LAYOUT_REQUEST="You are an expert in academic figure layout design for CVPR/NeurIPS papers.
Analyze this figure request and provide an OPTIMIZED LAYOUT DESCRIPTION:
$INITIAL_PROMPT
Provide:
  1. Optimized Component Positions: Exact positions (left/center/right, top/middle/bottom) for each component
  2. Spacing Recommendations: Specific spacing between components
  3. Grouping Strategy: Which components should be visually grouped together
  4. Arrow Routing: Optimal paths for arrows to avoid crossings
  5. Visual Hierarchy: Size recommendations for main vs sub-components
Output a DETAILED layout specification that will be used for rendering."

Build JSON payload

Build JSON payload

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}] } with open("/tmp/gemini_layout_request.json", "w") as f: json.dump(payload, f, indent=2) print("Layout request created") PYTHON
python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$LAYOUT_REQUEST'''}]}] } with open("/tmp/gemini_layout_request.json", "w") as f: json.dump(payload, f, indent=2) print("Layout request created") PYTHON

Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy)

Call Gemini gemini-3-pro-preview for layout optimization (DIRECT connection, no proxy)

RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_layout_request.json)

Extract layout description

Extract layout description

LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting layout') ")
echo "=== Layout Optimization Complete ===" echo "$LAYOUT_DESCRIPTION" echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"
undefined
LAYOUT_DESCRIPTION=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting layout') ")
echo "=== Layout Optimization Complete ===" echo "$LAYOUT_DESCRIPTION" echo "$LAYOUT_DESCRIPTION" > "$OUTPUT_DIR/layout_description.txt"
undefined

Step 3: Gemini Style Verification (gemini-3-pro)

步骤3:Gemini风格验证(gemini-3-pro)

Claude sends the optimized layout to Gemini for CVPR/NeurIPS style verification.
bash
#!/bin/bash
Claude将优化后的布局发送给Gemini进行CVPR/NeurIPS风格验证。
bash
#!/bin/bash

Step 3: Verify and enhance style compliance using Gemini gemini-3-pro

Step 3: Verify and enhance style compliance using Gemini gemini-3-pro

Read layout from previous step

Read layout from previous step

LAYOUT=$(cat figures/ai_generated/layout_description.txt)
LAYOUT=$(cat figures/ai_generated/layout_description.txt)

Style verification request

Style verification request

STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.
Review and ENHANCE this figure specification for top-tier conference compliance:
$LAYOUT
Ensure compliance with:
  1. Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
  2. Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
  3. Font Standards: Sans-serif, minimum 14pt, readable in print
  4. Visual Appeal (科研风格):
    • ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
    • ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients
Output an ENHANCED figure specification with explicit style instructions for rendering."
STYLE_REQUEST="You are a CVPR/NeurIPS paper figure reviewer specializing in visual standards.
Review and ENHANCE this figure specification for top-tier conference compliance:
$LAYOUT
Ensure compliance with:
  1. Color Palette: Use professional academic colors (green for inputs, blue for encoders, purple for fusion, orange for outputs)
  2. Arrow Standards: Thick (5-6px), black/dark gray, clear arrowheads, all labeled
  3. Font Standards: Sans-serif, minimum 14pt, readable in print
  4. Visual Appeal (科研风格):
    • ✅ Subtle same-color gradients, rounded corners (6-10px), internal structure visible
    • ❌ NO heavy shadows, NO glowing effects, NO rainbow gradients
Output an ENHANCED figure specification with explicit style instructions for rendering."

Build JSON payload

Build JSON payload

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}] } with open("/tmp/gemini_style_request.json", "w") as f: json.dump(payload, f, indent=2) print("Style request created") PYTHON
python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$STYLE_REQUEST'''}]}] } with open("/tmp/gemini_style_request.json", "w") as f: json.dump(payload, f, indent=2) print("Style request created") PYTHON

Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy)

Call Gemini gemini-3-pro-preview for style verification (DIRECT connection, no proxy)

RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)
RESPONSE=$(curl -s --max-time 90
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_style_request.json)

Extract style-enhanced specification

Extract style-enhanced specification

STYLE_SPEC=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting style spec') ")
echo "=== Style Verification Complete ===" echo "$STYLE_SPEC" echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"
undefined
STYLE_SPEC=$(echo "$RESPONSE" | python3 -c " import sys, json data = json.load(sys.stdin) try: print(data['candidates'][0]['content']['parts'][0]['text']) except: print('Error extracting style spec') ")
echo "=== Style Verification Complete ===" echo "$STYLE_SPEC" echo "$STYLE_SPEC" > "figures/ai_generated/style_spec.txt"
undefined

Step 4: Paperbanana Image Rendering (gemini-3-pro-image-preview)

步骤4:Paperbanana图像渲染(gemini-3-pro-image-preview)

Claude sends the optimized, style-verified specification to Paperbanana for rendering.
bash
#!/bin/bash
Claude将经过优化和风格验证的规范发送给Paperbanana进行渲染。
bash
#!/bin/bash

Step 4: Render image using Paperbanana (gemini-3-pro-image-preview)

Step 4: Render image using Paperbanana (gemini-3-pro-image-preview)

Internal codename: Nano Banana Pro

Internal codename: Nano Banana Pro

Use DIRECT connection (no proxy) - proxy causes SSL errors

Use DIRECT connection (no proxy) - proxy causes SSL errors

set -e
OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"
set -e
OUTPUT_DIR="figures/ai_generated" mkdir -p "$OUTPUT_DIR"

Read the style-enhanced specification from previous step

Read the style-enhanced specification from previous step

STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt)
STYLE_SPEC=$(cat figures/ai_generated/style_spec.txt)

Add rendering instructions

Add rendering instructions

RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:
$STYLE_SPEC
RENDERING REQUIREMENTS:
  • Output a clean, professional diagram suitable for CVPR/NeurIPS submission
  • Use vector-quality rendering with sharp edges and clear text
  • Ensure all elements are properly aligned and spaced
  • The diagram should be immediately understandable at a glance"
RENDER_PROMPT="Render a publication-quality academic diagram based on this specification:
$STYLE_SPEC
RENDERING REQUIREMENTS:
  • Output a clean, professional diagram suitable for CVPR/NeurIPS submission
  • Use vector-quality rendering with sharp edges and clear text
  • Ensure all elements are properly aligned and spaced
  • The diagram should be immediately understandable at a glance"

Build JSON payload using Python for proper escaping

Build JSON payload using Python for proper escaping

python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } with open("/tmp/gemini_request.json", "w") as f: json.dump(payload, f, indent=2) print("JSON payload created") PYTHON
python3 << PYTHON import json payload = { "contents": [{"parts": [{"text": '''$RENDER_PROMPT'''}]}], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } with open("/tmp/gemini_request.json", "w") as f: json.dump(payload, f, indent=2) print("JSON payload created") PYTHON

Call Paperbanana API WITHOUT proxy (direct connection works better)

Call Paperbanana API WITHOUT proxy (direct connection works better)

RESPONSE=$(curl -s --max-time 180
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)
RESPONSE=$(curl -s --max-time 180
-X POST "$URL"
-H 'Content-Type: application/json'
-d @/tmp/gemini_request.json)

Check for error

Check for error

if echo "$RESPONSE" | grep -q '"error"'; then echo "API Error:" echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE" exit 1 fi
if echo "$RESPONSE" | grep -q '"error"'; then echo "API Error:" echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE" exit 1 fi

Extract and save image

Extract and save image

echo "$RESPONSE" | python3 << 'PYTHON' import sys, json, base64 from pathlib import Path
output_dir = Path("figures/ai_generated") data = json.load(sys.stdin)
try: parts = data['candidates'][0]['content']['parts'] iteration = 1 # Claude increments this each iteration
for part in parts:
    if 'text' in part:
        print(f"\n[Paperbanana]: {part['text'][:200]}...")
    elif 'inlineData' in part:
        img_data = base64.b64decode(part['inlineData']['data'])
        img_path = output_dir / f"figure_v{iteration}.png"
        with open(img_path, "wb") as f:
            f.write(img_data)
        print(f"\n✅ Image saved: {img_path}")
        print(f"   Size: {len(img_data)/1024:.1f} KB")
except Exception as e: print(f"Parse error: {e}") print(f"Raw response: {str(data)[:500]}") PYTHON
undefined
echo "$RESPONSE" | python3 << 'PYTHON' import sys, json, base64 from pathlib import Path
output_dir = Path("figures/ai_generated") data = json.load(sys.stdin)
try: parts = data['candidates'][0]['content']['parts'] iteration = 1 # Claude increments this each iteration
for part in parts:
    if 'text' in part:
        print(f"\n[Paperbanana]: {part['text'][:200]}...")
    elif 'inlineData' in part:
        img_data = base64.b64decode(part['inlineData']['data'])
        img_path = output_dir / f"figure_v{iteration}.png"
        with open(img_path, "wb") as f:
            f.write(img_data)
        print(f"\n✅ Image saved: {img_path}")
        print(f"   Size: {len(img_data)/1024:.1f} KB")
except Exception as e: print(f"Parse error: {e}") print(f"Raw response: {str(data)[:500]}") PYTHON
undefined

Step 5: Claude STRICT Visual Review & Scoring (MANDATORY)

步骤5:Claude严格视觉审核与评分(必须执行)

Claude MUST read the generated image and perform a STRICT review:
  1. Visual Analysis: What does the image show in detail?
  2. Strengths: What's good about it?
  3. STRICT Verification: Check EVERY item below
  4. Score: Rate 1-10 (10 = perfect) — BE STRICT!
STRICT Review Template:
markdown
undefined
Claude必须查看生成的图像并执行严格审核:
  1. 视觉分析:详细描述图像内容
  2. 优势:图像的可取之处
  3. 严格验证:检查以下每一项
  4. 评分:1-10分(10分=完美)——必须严格评分!
严格审核模板:
markdown
undefined

Claude's STRICT Review of Figure v{N}

Claude's STRICT Review of Figure v{N}

What I See

What I See

[Describe the generated image in DETAIL - every block, every arrow]
[Describe the generated image in DETAIL - every block, every arrow]

Strengths

Strengths

  • [Strength 1]
  • [Strength 2]
  • [Strength 1]
  • [Strength 2]

═══════════════════════════════════════════════════════════════

═══════════════════════════════════════════════════════════════

STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9)

STRICT VERIFICATION CHECKLIST (ALL must pass for score ≥ 9)

═══════════════════════════════════════════════════════════════

═══════════════════════════════════════════════════════════════

A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6)

A. Arrow Correctness Verification (CRITICAL - any failure = score ≤ 6)

Check EACH arrow:
  • Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?
Check EACH arrow:
  • Arrow 1: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 2: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 3: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 4: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 5: [Source] → [Target] — Does it point to the CORRECT target?
  • Arrow 6: [Source] → [Target] — Does it point to the CORRECT target?

B. Block Content Verification (any failure = score ≤ 7)

B. Block Content Verification (any failure = score ≤ 7)

Check EACH block:
  • Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 7 "[Name]": Has correct label? Has sub-label? Content correct?
Check EACH block:
  • Block 1 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 2 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 3 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 4 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 5 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 6 "[Name]": Has correct label? Has sub-label? Content correct?
  • Block 7 "[Name]": Has correct label? Has sub-label? Content correct?

C. Arrow Visibility (any failure = score ≤ 7)

C. Arrow Visibility (any failure = score ≤ 7)

  • ALL arrows are THICK (≥5px visible stroke)
  • ALL arrows have CLEAR arrowheads (large triangular heads)
  • ALL arrows are BLACK or DARK GRAY (not light colors)
  • NO arrows are too thin or invisible
  • ALL arrows are THICK (≥5px visible stroke)
  • ALL arrows have CLEAR arrowheads (large triangular heads)
  • ALL arrows are BLACK or DARK GRAY (not light colors)
  • NO arrows are too thin or invisible

D. Arrow Labels (any failure = score ≤ 7)

D. Arrow Labels (any failure = score ≤ 7)

  • EVERY arrow has a text label
  • Labels are readable (not too small)
  • Labels correctly describe the data flowing
  • EVERY arrow has a text label
  • Labels are readable (not too small)
  • Labels correctly describe the data flowing

E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8)

E. Visual Appeal (科研风格 - Balanced Academic Style) (any failure = score ≤ 8)

  • 有适度视觉吸引力 — 有subtle渐变或圆角,但不夸张
  • 不是平淡方块 — 有一定设计感
  • 不过度装饰 — 没有重阴影、发光效果、彩虹配色
  • 专业学术风格 — 像CVPR论文中的图表,不是PPT模板
  • Internal structure visible — 大模块内部显示子组件结构
  • Color palette: 3-4种协调色 — 不是彩虹,也不是纯黑白
  • 有适度视觉吸引力 — 有subtle渐变或圆角,但不夸张
  • 不是平淡方块 — 有一定设计感
  • 不过度装饰 — 没有重阴影、发光效果、彩虹配色
  • 专业学术风格 — 像CVPR论文中的图表,不是PPT模板
  • Internal structure visible — 大模块内部显示子组件结构
  • Color palette: 3-4种协调色 — 不是彩虹,也不是纯黑白

E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found)

E2. Visual Appeal - RED FLAGS (immediate score ≤ 7 if found)

  • NO heavy drop shadows (重阴影 = too flashy)
  • NO glowing effects (发光效果 = too flashy)
  • NO rainbow gradients (彩虹渐变 = unprofessional)
  • NO excessive decorative icons (过多装饰图标 = distracting)
  • NO heavy drop shadows (重阴影 = too flashy)
  • NO glowing effects (发光效果 = too flashy)
  • NO rainbow gradients (彩虹渐变 = unprofessional)
  • NO excessive decorative icons (过多装饰图标 = distracting)

F. Layout & Flow (any failure = score ≤ 7)

F. Layout & Flow (any failure = score ≤ 7)

  • Clean horizontal left-to-right flow
  • No arrow crossings
  • Data flow traceable in 5 seconds
  • Balanced spacing (not cramped, not sparse)
  • Clean horizontal left-to-right flow
  • No arrow crossings
  • Data flow traceable in 5 seconds
  • Balanced spacing (not cramped, not sparse)

G. Style Compliance

G. Style Compliance

  • CVPR/NeurIPS professional style
  • Color palette appropriate (not rainbow)
  • Font readable
  • Print-friendly (grayscale test)
  • CVPR/NeurIPS professional style
  • Color palette appropriate (not rainbow)
  • Font readable
  • Print-friendly (grayscale test)

═══════════════════════════════════════════════════════════════

═══════════════════════════════════════════════════════════════

Issues Found (BE SPECIFIC)

Issues Found (BE SPECIFIC)

  1. [Issue 1]: [EXACTLY what is wrong] → [How to fix]
  2. [Issue 2]: [EXACTLY what is wrong] → [How to fix]
  3. [Issue 3]: [EXACTLY what is wrong] → [How to fix]
  1. [Issue 1]: [EXACTLY what is wrong] → [How to fix]
  2. [Issue 2]: [EXACTLY what is wrong] → [How to fix]
  3. [Issue 3]: [EXACTLY what is wrong] → [How to fix]

Score: X/10

Score: X/10

STRICT Score Breakdown Guide:

STRICT Score Breakdown Guide:

  • 10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
  • 9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
  • 8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
  • 7: Usable but has clear problems. 箭头或内容有问题。
  • 6: Has arrow direction errors (箭头指向错误) OR missing major components.
  • 1-5: Major issues. Unacceptable.
  • 10: Perfect. No issues. Publication-ready masterpiece. 视觉风格完美平衡。
  • 9: Excellent. Minor issues that don't affect understanding. 可以直接使用。
  • 8: Good but has noticeable issues. 视觉上太平淡或太花哨都需要改进。
  • 7: Usable but has clear problems. 箭头或内容有问题。
  • 6: Has arrow direction errors (箭头指向错误) OR missing major components.
  • 1-5: Major issues. Unacceptable.

Visual Style Scoring (视觉风格评分):

Visual Style Scoring (视觉风格评分):

  • 太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
  • 太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
  • 恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10
  • 太花哨 (Too flashy): 重阴影、发光效果、彩虹配色 → score ≤ 7
  • 太平淡 (Too plain): 纯黑白方块、无任何视觉设计 → score ≤ 8
  • 恰到好处 (Balanced): 适度渐变、圆角、清晰层次 → score 9-10

Verdict

Verdict

[ ] ACCEPT (score ≥ 9 AND all critical checks pass) [ ] REFINE (score < 9 OR any critical check fails)
If REFINE: List the EXACT issues that must be fixed
undefined
[ ] ACCEPT (score ≥ 9 AND all critical checks pass) [ ] REFINE (score < 9 OR any critical check fails)
If REFINE: List the EXACT issues that must be fixed
undefined

Step 6: Decision Point

步骤6:决策点

IF score >= 9 AND all critical checks pass:
    → Accept figure, generate LaTeX snippet, DONE
ELSE IF iteration < MAX_ITERATIONS:
    → Generate SPECIFIC improvement prompt based on EXACT issues
    → Go to Step 2 (Gemini Layout) with refined prompt
ELSE:
    → Max iterations reached, show best version
    → Ask user if they want to continue or accept
IF score >= 9 AND all critical checks pass:
    → 接受图表,生成LaTeX代码片段,完成流程
ELSE IF iteration < MAX_ITERATIONS:
    → 根据具体问题生成针对性优化提示词
    → 携带优化后的提示词返回步骤2(Gemini布局优化)
ELSE:
    → 已达最大迭代次数,展示最优版本
    → 询问用户是否继续优化或接受当前版本

Step 7: Generate Improvement Prompt (for refinement)

步骤7:生成优化提示词(用于迭代优化)

Claude generates TARGETED improvement prompt with EXACT issues:
Refine this academic diagram. This is iteration {N}.
Claude需基于具体问题生成针对性的优化提示词:
Refine this academic diagram. This is iteration {N}.

═══════════════════════════════════════════════════════════════

═══════════════════════════════════════════════════════════════

CRITICAL: Fix These EXACT Issues (from previous review)

CRITICAL: Fix These EXACT Issues (from previous review)

═══════════════════════════════════════════════════════════════

═══════════════════════════════════════════════════════════════

Arrow Direction Errors (MUST FIX):

Arrow Direction Errors (MUST FIX):

  1. EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.
  1. EXACT issue: Arrow from [A] to [B] is pointing to wrong target. It should point to [C] instead.

Missing Arrow Labels (MUST FIX):

Missing Arrow Labels (MUST FIX):

  1. Arrow from [A] to [B] is missing label "[data type]"
  2. ...
  1. Arrow from [A] to [B] is missing label "[data type]"
  2. ...

Block Content Issues (MUST FIX):

Block Content Issues (MUST FIX):

  1. Block "[Name]" has wrong label. Should be "[correct label]"
  2. ...
  1. Block "[Name]" has wrong label. Should be "[correct label]"
  2. ...

Visual Appeal Issues (SHOULD FIX):

Visual Appeal Issues (SHOULD FIX):

  1. Blocks are too plain. Add [gradients/shadows/internal structure]
  2. ...
  1. Blocks are too plain. Add [gradients/shadows/internal structure]
  2. ...

Keep These Good Elements:

Keep These Good Elements:

  • [What to preserve from previous version]
  • [What to preserve from previous version]

Generate the improved figure with ALL issues fixed.

Generate the improved figure with ALL issues fixed.

undefined
undefined

Step 8: Final Output

步骤8:最终输出

When figure is accepted (score ≥ 9):
latex
% === AI-Generated Figure ===
\begin{figure*}[t]
    \centering
    \includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
    \caption{[Caption based on user's original request].}
    \label{fig:[label]}
\end{figure*}
当图表被接受(分数≥9)时:
latex
% === AI-Generated Figure ===
\begin{figure*}[t]
    \centering
    \includegraphics[width=0.95\textwidth]{figures/ai_generated/figure_final.png}
    \caption{[Caption based on user's original request].}
    \label{fig:[label]}
\end{figure*}

Key Rules (MUST FOLLOW - STRICT)

核心规则(必须严格遵守)

  1. NEVER skip the review step — Always read and STRICTLY score the image
  2. NEVER accept score < 9 — Keep refining until excellence
  3. VERIFY EVERY ARROW DIRECTION — Wrong direction = automatic fail (score ≤ 6)
  4. VERIFY EVERY BLOCK CONTENT — Wrong content = automatic fail (score ≤ 7)
  5. BE SPECIFIC in feedback — "Arrow from A to B points to wrong target C" not "arrow is wrong"
  6. SAVE all iterations — Keep version history for comparison
  7. Claude is the STRICT boss — Accept only excellence, not "good enough"
  8. ARROW CORRECTNESS IS NON-NEGOTIABLE — Any wrong arrow direction = reject
  9. VISUAL APPEAL MATTERS — Plain boring figures = score ≤ 8
  10. Target score is 9 — Not 8, not "good enough"
  11. USE MULTI-STAGE WORKFLOW — Claude → Gemini Layout → Gemini Style → Paperbanana → Claude Review
  12. USE CORRECT MODELS — gemini-3-pro for reasoning, gemini-3-pro-image-preview for rendering
  1. 绝不跳过审核步骤 — 必须查看图像并严格评分
  2. 绝不接受分数<9的图表 — 持续优化直至达到优秀标准
  3. 验证每一个箭头方向 — 方向错误直接判定不合格(分数≤6)
  4. 验证每一个组件内容 — 内容错误直接判定不合格(分数≤7)
  5. 反馈需具体 — 需说明“从A到B的箭头指向了错误的目标C”,而非“箭头有问题”
  6. 保存所有迭代版本 — 保留版本历史用于对比
  7. Claude是严格的审核者 — 只接受优秀成果,而非“足够好”
  8. 箭头正确性不可妥协 — 任何箭头方向错误都需拒绝
  9. 视觉吸引力很重要 — 完全平淡的图表分数≤8
  10. 目标分数为9 — 不是8,也不是“足够好”
  11. 使用多阶段工作流 — Claude → Gemini布局优化 → Gemini风格验证 → Paperbanana渲染 → Claude审核
  12. 使用正确模型 — gemini-3-pro用于推理,gemini-3-pro-image-preview用于渲染

Output Structure

输出结构

figures/ai_generated/
├── layout_description.txt  # Step 2: Gemini layout optimization output
├── style_spec.txt          # Step 3: Gemini style verification output
├── figure_v1.png           # Iteration 1 (Paperbanana render)
├── figure_v2.png           # Iteration 2
├── figure_v3.png           # Iteration 3
├── figure_final.png        # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex       # LaTeX snippet
└── review_log.json         # All review scores and STRICT feedback
figures/ai_generated/
├── layout_description.txt  # Step 2: Gemini layout optimization output
├── style_spec.txt          # Step 3: Gemini style verification output
├── figure_v1.png           # Iteration 1 (Paperbanana render)
├── figure_v2.png           # Iteration 2
├── figure_v3.png           # Iteration 3
├── figure_final.png        # Accepted version (copy of best, score ≥ 9)
├── latex_include.tex       # LaTeX snippet
└── review_log.json         # All review scores and STRICT feedback

Model Summary

模型汇总

StageModelPurpose
Step 1ClaudeParse request, create initial prompt
Step 2gemini-3-proLayout optimization (positioning, spacing, grouping)
Step 3gemini-3-proCVPR/NeurIPS style verification
Step 4gemini-3-pro-image-preview (Paperbanana)High-quality image rendering
Step 5ClaudeSTRICT visual review and scoring
阶段模型用途
步骤1Claude解析需求,生成初始提示词
步骤2gemini-3-pro布局优化(组件位置、间距、分组)
步骤3gemini-3-proCVPR/NeurIPS风格验证
步骤4gemini-3-pro-image-preview (Paperbanana)高质量图像渲染
步骤5Claude严格视觉审核与评分