gemini-image-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Image Generation Skill

Gemini图像生成技能

Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.

借助Google的Gemini 2.5 Flash Image模型，通过文本提示、图像编辑和多图像合成功能生成高质量图像。

When to Use This Skill

何时使用此技能

Use this skill when you need to:

Generate images from text descriptions
Edit existing images by adding/removing elements or changing styles
Combine multiple source images into new compositions
Iteratively refine images through conversational editing
Create visual content for documentation, design, or creative projects

在以下场景中使用此技能：

从文本描述生成图像
通过添加/移除元素或更改风格编辑现有图像
将多个源图像组合成新的合成图
通过对话式编辑迭代优化图像
为文档、设计或创意项目创建视觉内容

Prerequisites

前提条件

API Key Setup

API密钥设置

The skill supports both Google AI Studio and Vertex AI endpoints.

本技能支持Google AI Studio和Vertex AI两种端点。

Option 1: Google AI Studio (Default)

选项1：Google AI Studio（默认）

The skill automatically detects your

GEMINI_API_KEY

in this order:

Process environment:
```
export GEMINI_API_KEY="your-key"
```
Project root:
```
.env
```
.claude directory:
```
.claude/.env
```
.claude/skills directory:
```
.claude/skills/.env
```
Skill directory:
```
.claude/skills/gemini-image-gen/.env
```

Get your API key: Visit Google AI Studio

Create

.env

file with:

bash

GEMINI_API_KEY=your_api_key_here

技能会按以下顺序自动检测你的

GEMINI_API_KEY

：

进程环境：
```
export GEMINI_API_KEY="your-key"
```
项目根目录：
```
.env
```
文件
.claude目录：
```
.claude/.env
```
文件
.claude/skills目录：
```
.claude/skills/.env
```
文件
技能目录：
```
.claude/skills/gemini-image-gen/.env
```
文件

获取API密钥：访问Google AI Studio

创建

.env

文件并添加：

bash

GEMINI_API_KEY=your_api_key_here

Option 2: Vertex AI

选项2：Vertex AI

To use Vertex AI instead:

bash

undefined

若要使用Vertex AI：

bash

undefined

Enable Vertex AI

启用Vertex AI

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1


Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # 可选，默认值为us-central1


或在`.env`文件中配置：
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

Python Setup

Python环境设置

Install required package:

bash

pip install google-genai

安装所需依赖包：

bash

pip install google-genai

Quick Start

快速开始

Basic Text-to-Image Generation

基础文本转图像生成

python

from google import genai
from google.genai import types
import os

python

from google import genai
from google.genai import types
import os

API key detection handled automatically by helper script

API密钥检测由辅助脚本自动处理

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content( model='gemini-2.5-flash-image', contents='A serene mountain landscape at sunset with snow-capped peaks', config=types.GenerateContentConfig( response_modalities=['image'], aspect_ratio='16:9' ) )

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content( model='gemini-2.5-flash-image', contents='日落时分宁静的雪山景观', config=types.GenerateContentConfig( response_modalities=['image'], aspect_ratio='16:9' ) )

Save to ./docs/assets/

保存至./docs/assets/目录

for i, part in enumerate(response.candidates[0].content.parts): if part.inline_data: with open(f'./docs/assets/generated-{i}.png', 'wb') as f: f.write(part.inline_data.data)

undefined

for i, part in enumerate(response.candidates[0].content.parts): if part.inline_data: with open(f'./docs/assets/generated-{i}.png', 'wb') as f: f.write(part.inline_data.data)

undefined

Using the Helper Script

使用辅助脚本

For convenience, use the provided helper script that handles API key detection and file saving:

bash

undefined

为方便使用，可借助提供的辅助脚本，它会自动处理API密钥检测和文件保存：

bash

undefined

Generate single image

生成单张图像

python .claude/skills/gemini-image-gen/scripts/generate.py
"A futuristic city with flying cars"
--aspect-ratio 16:9
--output ./docs/assets/city.png

python .claude/skills/gemini-image-gen/scripts/generate.py
"带有飞行汽车的未来城市"
--aspect-ratio 16:9
--output ./docs/assets/city.png

Generate with specific modalities

指定模态生成

python .claude/skills/gemini-image-gen/scripts/generate.py
"Modern architecture design"
--response-modalities image text
--aspect-ratio 1:1

undefined

python .claude/skills/gemini-image-gen/scripts/generate.py
"现代建筑设计"
--response-modalities image text
--aspect-ratio 1:1

undefined

Key Features

核心功能

Aspect Ratios

宽高比

Ratio	Resolution	Use Case	Token Cost
1:1	1024×1024	Social media, avatars	1290
16:9	1344×768	Landscapes, banners	1290
9:16	768×1344	Mobile, portraits	1290
4:3	1152×896	Traditional media	1290
3:4	896×1152	Vertical posters	1290

比例	分辨率	适用场景	Token成本
1:1	1024×1024	社交媒体、头像	1290
16:9	1344×768	风景图、横幅	1290
9:16	768×1344	移动端、肖像	1290
4:3	1152×896	传统媒体	1290
3:4	896×1152	竖版海报	1290

Response Modalities

响应模态

['image']
: Generate only images
['text']
: Generate only text descriptions
['image', 'text']
: Generate both images and descriptions

['image']
：仅生成图像
['text']
：仅生成文本描述
['image', 'text']
：同时生成图像和描述

Image Editing

图像编辑

Provide existing image + text instructions to modify:

python

import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)

提供现有图像+文本指令进行修改：

python

import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        '在天空中添加一个红色气球',
        img
    ]
)

Multi-Image Composition

多图像合成

Combine up to 3 source images (recommended):

python

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)

最多可组合3张源图像（推荐）：

python

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        '将这些图像组合成一个连贯的场景',
        img1,
        img2
    ]
)

Prompt Engineering Tips

提示词工程技巧

Structure effective prompts with three elements:

Subject: What to generate ("a robot")
Context: Environmental setting ("in a futuristic city")
Style: Artistic treatment ("cyberpunk style, neon lighting")

Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"

Quality modifiers:

Add terms like "4K", "HDR", "high-quality", "professional photography"
Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"

Text in images:

Limit to 25 characters maximum
Use up to 3 distinct phrases
Specify font styles: "bold sans-serif title" or "handwritten script"

See

references/prompting-guide.md

for comprehensive prompt engineering strategies.

构建有效提示词包含三个要素：

主体：要生成的内容（如“一个机器人”）
环境：场景设定（如“在未来城市中”）
风格：艺术处理（如“赛博朋克风格，霓虹灯光”）

示例：“未来城市中的机器人，赛博朋克风格，带有霓虹灯光和雨后湿滑的街道”

质量修饰词：

添加“4K”、“HDR”、“高质量”、“专业摄影”等词汇
指定相机参数：“35mm镜头”、“浅景深”、“黄金时段光线”

图像中的文字：

最多限制25个字符
最多使用3个不同短语
指定字体风格：“粗体无衬线标题”或“手写体脚本”

如需全面的提示词策略，请查看

references/prompting-guide.md

。

Safety Settings

安全设置

The model includes adjustable safety filters. Configure per-request:

python

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

See

references/safety-settings.md

for detailed configuration options.

模型包含可调节的安全过滤器，可按请求配置：

python

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

详细配置选项请查看

references/safety-settings.md

。

Output Management

输出管理

All generated images should be saved to

./docs/assets/

directory:

bash

undefined

所有生成的图像应保存至

./docs/assets/

bash

undefined

Create directory if needed

按需创建目录

mkdir -p ./docs/assets


The helper script automatically saves to this location with timestamped filenames.

mkdir -p ./docs/assets


辅助脚本会自动将图像保存到该位置，并使用带时间戳的文件名。

Model Specifications

模型规格

Model:

gemini-2.5-flash-image

Input tokens: Up to 65,536
Output tokens: Up to 32,768
Supported inputs: Text and images
Supported outputs: Text and images
Knowledge cutoff: June 2025
Features: Image generation, structured outputs, batch API, caching

模型：

gemini-2.5-flash-image

输入Token上限：65,536
输出Token上限：32,768
支持的输入类型：文本和图像
支持的输出类型：文本和图像
知识截止日期：2025年6月
功能：图像生成、结构化输出、批量API、缓存

Limitations

局限性

Maximum 3 input images recommended for best results
Text rendering works best when generated separately first
Does not support audio/video inputs
Regional restrictions on child image uploads (EEA, CH, UK)
Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi

为获得最佳效果，建议最多使用3张输入图像
文本渲染在单独生成时效果最佳
不支持音频/视频输入
部分地区（EEA、CH、UK）限制儿童图像上传
最优支持语言：英语、西班牙语（墨西哥）、日语、普通话、印地语

Error Handling

错误处理

Common issues and solutions:

API key not found:

bash

undefined

常见问题及解决方案：

未找到API密钥：

bash

undefined

Check environment variables

检查环境变量

echo $GEMINI_API_KEY

Verify .env file exists

验证.env文件是否存在

cat .claude/skills/gemini-image-gen/.env

or

或

cat .env


**Safety filter blocking**:
- Review `response.prompt_feedback.block_reason`
- Adjust safety settings if appropriate for your use case
- Modify prompt to avoid triggering filters

**Token limit exceeded**:
- Reduce prompt length
- Use fewer input images
- Simplify image editing instructions

cat .env


**安全过滤器拦截**：
- 查看`response.prompt_feedback.block_reason`
- 若你的使用场景允许，调整安全设置
- 修改提示词以避免触发过滤器

**超出Token限制**：
- 缩短提示词长度
- 减少输入图像数量
- 简化图像编辑指令

Reference Documentation

参考文档

For detailed information, see:

```
references/api-reference.md
```
- Complete API specifications
```
references/prompting-guide.md
```
- Advanced prompt engineering
```
references/safety-settings.md
```
- Safety configuration details
```
references/code-examples.md
```
- Additional implementation examples

如需详细信息，请查看：

```
references/api-reference.md
```
- 完整API规格
```
references/prompting-guide.md
```
- 高级提示词工程
```
references/safety-settings.md
```
- 安全配置细节
```
references/code-examples.md
```
- 更多实现示例

gemini-image-gen

Original

Translation

Gemini Image Generation Skill

Gemini图像生成技能

When to Use This Skill

何时使用此技能

Prerequisites

前提条件

API Key Setup

API密钥设置

Option 1: Google AI Studio (Default)

选项1：Google AI Studio（默认）

Option 2: Vertex AI

选项2：Vertex AI

Enable Vertex AI

启用Vertex AI

Python Setup

Python环境设置

Quick Start

快速开始

Basic Text-to-Image Generation

基础文本转图像生成

API key detection handled automatically by helper script

API密钥检测由辅助脚本自动处理

Save to ./docs/assets/

保存至./docs/assets/目录

Using the Helper Script

使用辅助脚本

Generate single image

生成单张图像

Generate with specific modalities

指定模态生成

Key Features

核心功能

Aspect Ratios

宽高比

Response Modalities

响应模态

Image Editing

图像编辑

Multi-Image Composition

多图像合成

Prompt Engineering Tips

提示词工程技巧

Safety Settings

安全设置

Output Management

输出管理

Create directory if needed

按需创建目录

Model Specifications

模型规格

Limitations

局限性

Error Handling

错误处理

Check environment variables

检查环境变量

Verify .env file exists

验证.env文件是否存在

or

或

Reference Documentation

参考文档

Resources

资源