gemini-image-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Image Generation Skill

Gemini图像生成技能

Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.
借助Google的Gemini 2.5 Flash Image模型,通过文本提示、图像编辑和多图像合成功能生成高质量图像。

When to Use This Skill

何时使用此技能

Use this skill when you need to:
  • Generate images from text descriptions
  • Edit existing images by adding/removing elements or changing styles
  • Combine multiple source images into new compositions
  • Iteratively refine images through conversational editing
  • Create visual content for documentation, design, or creative projects
在以下场景中使用此技能:
  • 从文本描述生成图像
  • 通过添加/移除元素或更改风格编辑现有图像
  • 将多个源图像组合成新的合成图
  • 通过对话式编辑迭代优化图像
  • 为文档、设计或创意项目创建视觉内容

Prerequisites

前提条件

API Key Setup

API密钥设置

The skill supports both Google AI Studio and Vertex AI endpoints.
本技能支持Google AI StudioVertex AI两种端点。

Option 1: Google AI Studio (Default)

选项1:Google AI Studio(默认)

The skill automatically detects your
GEMINI_API_KEY
in this order:
  1. Process environment:
    export GEMINI_API_KEY="your-key"
  2. Project root:
    .env
  3. .claude directory:
    .claude/.env
  4. .claude/skills directory:
    .claude/skills/.env
  5. Skill directory:
    .claude/skills/gemini-image-gen/.env
Get your API key: Visit Google AI Studio
Create
.env
file with:
bash
GEMINI_API_KEY=your_api_key_here
技能会按以下顺序自动检测你的
GEMINI_API_KEY
  1. 进程环境
    export GEMINI_API_KEY="your-key"
  2. 项目根目录
    .env
    文件
  3. .claude目录
    .claude/.env
    文件
  4. .claude/skills目录
    .claude/skills/.env
    文件
  5. 技能目录
    .claude/skills/gemini-image-gen/.env
    文件
获取API密钥:访问Google AI Studio
创建
.env
文件并添加:
bash
GEMINI_API_KEY=your_api_key_here

Option 2: Vertex AI

选项2:Vertex AI

To use Vertex AI instead:
bash
undefined
若要使用Vertex AI:
bash
undefined

Enable Vertex AI

启用Vertex AI

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1

Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1
export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # 可选,默认值为us-central1

或在`.env`文件中配置:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

Python Setup

Python环境设置

Install required package:
bash
pip install google-genai
安装所需依赖包:
bash
pip install google-genai

Quick Start

快速开始

Basic Text-to-Image Generation

基础文本转图像生成

python
from google import genai
from google.genai import types
import os
python
from google import genai
from google.genai import types
import os

API key detection handled automatically by helper script

API密钥检测由辅助脚本自动处理

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content( model='gemini-2.5-flash-image', contents='A serene mountain landscape at sunset with snow-capped peaks', config=types.GenerateContentConfig( response_modalities=['image'], aspect_ratio='16:9' ) )
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content( model='gemini-2.5-flash-image', contents='日落时分宁静的雪山景观', config=types.GenerateContentConfig( response_modalities=['image'], aspect_ratio='16:9' ) )

Save to ./docs/assets/

保存至./docs/assets/目录

for i, part in enumerate(response.candidates[0].content.parts): if part.inline_data: with open(f'./docs/assets/generated-{i}.png', 'wb') as f: f.write(part.inline_data.data)
undefined
for i, part in enumerate(response.candidates[0].content.parts): if part.inline_data: with open(f'./docs/assets/generated-{i}.png', 'wb') as f: f.write(part.inline_data.data)
undefined

Using the Helper Script

使用辅助脚本

For convenience, use the provided helper script that handles API key detection and file saving:
bash
undefined
为方便使用,可借助提供的辅助脚本,它会自动处理API密钥检测和文件保存:
bash
undefined

Generate single image

生成单张图像

python .claude/skills/gemini-image-gen/scripts/generate.py
"A futuristic city with flying cars"
--aspect-ratio 16:9
--output ./docs/assets/city.png
python .claude/skills/gemini-image-gen/scripts/generate.py
"带有飞行汽车的未来城市"
--aspect-ratio 16:9
--output ./docs/assets/city.png

Generate with specific modalities

指定模态生成

python .claude/skills/gemini-image-gen/scripts/generate.py
"Modern architecture design"
--response-modalities image text
--aspect-ratio 1:1
undefined
python .claude/skills/gemini-image-gen/scripts/generate.py
"现代建筑设计"
--response-modalities image text
--aspect-ratio 1:1
undefined

Key Features

核心功能

Aspect Ratios

宽高比

RatioResolutionUse CaseToken Cost
1:11024×1024Social media, avatars1290
16:91344×768Landscapes, banners1290
9:16768×1344Mobile, portraits1290
4:31152×896Traditional media1290
3:4896×1152Vertical posters1290
比例分辨率适用场景Token成本
1:11024×1024社交媒体、头像1290
16:91344×768风景图、横幅1290
9:16768×1344移动端、肖像1290
4:31152×896传统媒体1290
3:4896×1152竖版海报1290

Response Modalities

响应模态

  • ['image']
    : Generate only images
  • ['text']
    : Generate only text descriptions
  • ['image', 'text']
    : Generate both images and descriptions
  • ['image']
    :仅生成图像
  • ['text']
    :仅生成文本描述
  • ['image', 'text']
    :同时生成图像和描述

Image Editing

图像编辑

Provide existing image + text instructions to modify:
python
import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)
提供现有图像+文本指令进行修改:
python
import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        '在天空中添加一个红色气球',
        img
    ]
)

Multi-Image Composition

多图像合成

Combine up to 3 source images (recommended):
python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)
最多可组合3张源图像(推荐):
python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        '将这些图像组合成一个连贯的场景',
        img1,
        img2
    ]
)

Prompt Engineering Tips

提示词工程技巧

Structure effective prompts with three elements:
  1. Subject: What to generate ("a robot")
  2. Context: Environmental setting ("in a futuristic city")
  3. Style: Artistic treatment ("cyberpunk style, neon lighting")
Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"
Quality modifiers:
  • Add terms like "4K", "HDR", "high-quality", "professional photography"
  • Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"
Text in images:
  • Limit to 25 characters maximum
  • Use up to 3 distinct phrases
  • Specify font styles: "bold sans-serif title" or "handwritten script"
See
references/prompting-guide.md
for comprehensive prompt engineering strategies.
构建有效提示词包含三个要素:
  1. 主体:要生成的内容(如“一个机器人”)
  2. 环境:场景设定(如“在未来城市中”)
  3. 风格:艺术处理(如“赛博朋克风格,霓虹灯光”)
示例:“未来城市中的机器人,赛博朋克风格,带有霓虹灯光和雨后湿滑的街道”
质量修饰词
  • 添加“4K”、“HDR”、“高质量”、“专业摄影”等词汇
  • 指定相机参数:“35mm镜头”、“浅景深”、“黄金时段光线”
图像中的文字
  • 最多限制25个字符
  • 最多使用3个不同短语
  • 指定字体风格:“粗体无衬线标题”或“手写体脚本”
如需全面的提示词策略,请查看
references/prompting-guide.md

Safety Settings

安全设置

The model includes adjustable safety filters. Configure per-request:
python
config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)
See
references/safety-settings.md
for detailed configuration options.
模型包含可调节的安全过滤器,可按请求配置:
python
config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)
详细配置选项请查看
references/safety-settings.md

Output Management

输出管理

All generated images should be saved to
./docs/assets/
directory:
bash
undefined
所有生成的图像应保存至
./docs/assets/
目录:
bash
undefined

Create directory if needed

按需创建目录

mkdir -p ./docs/assets

The helper script automatically saves to this location with timestamped filenames.
mkdir -p ./docs/assets

辅助脚本会自动将图像保存到该位置,并使用带时间戳的文件名。

Model Specifications

模型规格

Model:
gemini-2.5-flash-image
  • Input tokens: Up to 65,536
  • Output tokens: Up to 32,768
  • Supported inputs: Text and images
  • Supported outputs: Text and images
  • Knowledge cutoff: June 2025
  • Features: Image generation, structured outputs, batch API, caching
模型
gemini-2.5-flash-image
  • 输入Token上限:65,536
  • 输出Token上限:32,768
  • 支持的输入类型:文本和图像
  • 支持的输出类型:文本和图像
  • 知识截止日期:2025年6月
  • 功能:图像生成、结构化输出、批量API、缓存

Limitations

局限性

  • Maximum 3 input images recommended for best results
  • Text rendering works best when generated separately first
  • Does not support audio/video inputs
  • Regional restrictions on child image uploads (EEA, CH, UK)
  • Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
  • 为获得最佳效果,建议最多使用3张输入图像
  • 文本渲染在单独生成时效果最佳
  • 不支持音频/视频输入
  • 部分地区(EEA、CH、UK)限制儿童图像上传
  • 最优支持语言:英语、西班牙语(墨西哥)、日语、普通话、印地语

Error Handling

错误处理

Common issues and solutions:
API key not found:
bash
undefined
常见问题及解决方案:
未找到API密钥
bash
undefined

Check environment variables

检查环境变量

echo $GEMINI_API_KEY
echo $GEMINI_API_KEY

Verify .env file exists

验证.env文件是否存在

cat .claude/skills/gemini-image-gen/.env
cat .claude/skills/gemini-image-gen/.env

or

cat .env

**Safety filter blocking**:
- Review `response.prompt_feedback.block_reason`
- Adjust safety settings if appropriate for your use case
- Modify prompt to avoid triggering filters

**Token limit exceeded**:
- Reduce prompt length
- Use fewer input images
- Simplify image editing instructions
cat .env

**安全过滤器拦截**:
- 查看`response.prompt_feedback.block_reason`
- 若你的使用场景允许,调整安全设置
- 修改提示词以避免触发过滤器

**超出Token限制**:
- 缩短提示词长度
- 减少输入图像数量
- 简化图像编辑指令

Reference Documentation

参考文档

For detailed information, see:
  • references/api-reference.md
    - Complete API specifications
  • references/prompting-guide.md
    - Advanced prompt engineering
  • references/safety-settings.md
    - Safety configuration details
  • references/code-examples.md
    - Additional implementation examples
如需详细信息,请查看:
  • references/api-reference.md
    - 完整API规格
  • references/prompting-guide.md
    - 高级提示词工程
  • references/safety-settings.md
    - 安全配置细节
  • references/code-examples.md
    - 更多实现示例

Resources

资源