gemini-image-gen
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Image Generation Skill
Gemini图像生成技能
Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.
借助Google的Gemini 2.5 Flash Image模型,通过文本提示、图像编辑和多图像合成功能生成高质量图像。
When to Use This Skill
何时使用此技能
Use this skill when you need to:
- Generate images from text descriptions
- Edit existing images by adding/removing elements or changing styles
- Combine multiple source images into new compositions
- Iteratively refine images through conversational editing
- Create visual content for documentation, design, or creative projects
在以下场景中使用此技能:
- 从文本描述生成图像
- 通过添加/移除元素或更改风格编辑现有图像
- 将多个源图像组合成新的合成图
- 通过对话式编辑迭代优化图像
- 为文档、设计或创意项目创建视觉内容
Prerequisites
前提条件
API Key Setup
API密钥设置
The skill supports both Google AI Studio and Vertex AI endpoints.
本技能支持Google AI Studio和Vertex AI两种端点。
Option 1: Google AI Studio (Default)
选项1:Google AI Studio(默认)
The skill automatically detects your in this order:
GEMINI_API_KEY- Process environment:
export GEMINI_API_KEY="your-key" - Project root:
.env - .claude directory:
.claude/.env - .claude/skills directory:
.claude/skills/.env - Skill directory:
.claude/skills/gemini-image-gen/.env
Get your API key: Visit Google AI Studio
Create file with:
.envbash
GEMINI_API_KEY=your_api_key_here技能会按以下顺序自动检测你的:
GEMINI_API_KEY- 进程环境:
export GEMINI_API_KEY="your-key" - 项目根目录:文件
.env - .claude目录:文件
.claude/.env - .claude/skills目录:文件
.claude/skills/.env - 技能目录:文件
.claude/skills/gemini-image-gen/.env
获取API密钥:访问Google AI Studio
创建文件并添加:
.envbash
GEMINI_API_KEY=your_api_key_hereOption 2: Vertex AI
选项2:Vertex AI
To use Vertex AI instead:
bash
undefined若要使用Vertex AI:
bash
undefinedEnable Vertex AI
启用Vertex AI
export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1
Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # 可选,默认值为us-central1
或在`.env`文件中配置:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1Python Setup
Python环境设置
Install required package:
bash
pip install google-genai安装所需依赖包:
bash
pip install google-genaiQuick Start
快速开始
Basic Text-to-Image Generation
基础文本转图像生成
python
from google import genai
from google.genai import types
import ospython
from google import genai
from google.genai import types
import osAPI key detection handled automatically by helper script
API密钥检测由辅助脚本自动处理
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='日落时分宁静的雪山景观',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
Save to ./docs/assets/
保存至./docs/assets/目录
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
undefinedfor i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
undefinedUsing the Helper Script
使用辅助脚本
For convenience, use the provided helper script that handles API key detection and file saving:
bash
undefined为方便使用,可借助提供的辅助脚本,它会自动处理API密钥检测和文件保存:
bash
undefinedGenerate single image
生成单张图像
python .claude/skills/gemini-image-gen/scripts/generate.py
"A futuristic city with flying cars"
--aspect-ratio 16:9
--output ./docs/assets/city.png
"A futuristic city with flying cars"
--aspect-ratio 16:9
--output ./docs/assets/city.png
python .claude/skills/gemini-image-gen/scripts/generate.py
"带有飞行汽车的未来城市"
--aspect-ratio 16:9
--output ./docs/assets/city.png
"带有飞行汽车的未来城市"
--aspect-ratio 16:9
--output ./docs/assets/city.png
Generate with specific modalities
指定模态生成
python .claude/skills/gemini-image-gen/scripts/generate.py
"Modern architecture design"
--response-modalities image text
--aspect-ratio 1:1
"Modern architecture design"
--response-modalities image text
--aspect-ratio 1:1
undefinedpython .claude/skills/gemini-image-gen/scripts/generate.py
"现代建筑设计"
--response-modalities image text
--aspect-ratio 1:1
"现代建筑设计"
--response-modalities image text
--aspect-ratio 1:1
undefinedKey Features
核心功能
Aspect Ratios
宽高比
| Ratio | Resolution | Use Case | Token Cost |
|---|---|---|---|
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
| 4:3 | 1152×896 | Traditional media | 1290 |
| 3:4 | 896×1152 | Vertical posters | 1290 |
| 比例 | 分辨率 | 适用场景 | Token成本 |
|---|---|---|---|
| 1:1 | 1024×1024 | 社交媒体、头像 | 1290 |
| 16:9 | 1344×768 | 风景图、横幅 | 1290 |
| 9:16 | 768×1344 | 移动端、肖像 | 1290 |
| 4:3 | 1152×896 | 传统媒体 | 1290 |
| 3:4 | 896×1152 | 竖版海报 | 1290 |
Response Modalities
响应模态
- : Generate only images
['image'] - : Generate only text descriptions
['text'] - : Generate both images and descriptions
['image', 'text']
- :仅生成图像
['image'] - :仅生成文本描述
['text'] - :同时生成图像和描述
['image', 'text']
Image Editing
图像编辑
Provide existing image + text instructions to modify:
python
import PIL.Image
img = PIL.Image.open('original.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
]
)提供现有图像+文本指令进行修改:
python
import PIL.Image
img = PIL.Image.open('original.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'在天空中添加一个红色气球',
img
]
)Multi-Image Composition
多图像合成
Combine up to 3 source images (recommended):
python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2
]
)最多可组合3张源图像(推荐):
python
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'将这些图像组合成一个连贯的场景',
img1,
img2
]
)Prompt Engineering Tips
提示词工程技巧
Structure effective prompts with three elements:
- Subject: What to generate ("a robot")
- Context: Environmental setting ("in a futuristic city")
- Style: Artistic treatment ("cyberpunk style, neon lighting")
Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"
Quality modifiers:
- Add terms like "4K", "HDR", "high-quality", "professional photography"
- Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"
Text in images:
- Limit to 25 characters maximum
- Use up to 3 distinct phrases
- Specify font styles: "bold sans-serif title" or "handwritten script"
See for comprehensive prompt engineering strategies.
references/prompting-guide.md构建有效提示词包含三个要素:
- 主体:要生成的内容(如“一个机器人”)
- 环境:场景设定(如“在未来城市中”)
- 风格:艺术处理(如“赛博朋克风格,霓虹灯光”)
示例:“未来城市中的机器人,赛博朋克风格,带有霓虹灯光和雨后湿滑的街道”
质量修饰词:
- 添加“4K”、“HDR”、“高质量”、“专业摄影”等词汇
- 指定相机参数:“35mm镜头”、“浅景深”、“黄金时段光线”
图像中的文字:
- 最多限制25个字符
- 最多使用3个不同短语
- 指定字体风格:“粗体无衬线标题”或“手写体脚本”
如需全面的提示词策略,请查看。
references/prompting-guide.mdSafety Settings
安全设置
The model includes adjustable safety filters. Configure per-request:
python
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)See for detailed configuration options.
references/safety-settings.md模型包含可调节的安全过滤器,可按请求配置:
python
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)详细配置选项请查看。
references/safety-settings.mdOutput Management
输出管理
All generated images should be saved to directory:
./docs/assets/bash
undefined所有生成的图像应保存至目录:
./docs/assets/bash
undefinedCreate directory if needed
按需创建目录
mkdir -p ./docs/assets
The helper script automatically saves to this location with timestamped filenames.mkdir -p ./docs/assets
辅助脚本会自动将图像保存到该位置,并使用带时间戳的文件名。Model Specifications
模型规格
Model:
gemini-2.5-flash-image- Input tokens: Up to 65,536
- Output tokens: Up to 32,768
- Supported inputs: Text and images
- Supported outputs: Text and images
- Knowledge cutoff: June 2025
- Features: Image generation, structured outputs, batch API, caching
模型:
gemini-2.5-flash-image- 输入Token上限:65,536
- 输出Token上限:32,768
- 支持的输入类型:文本和图像
- 支持的输出类型:文本和图像
- 知识截止日期:2025年6月
- 功能:图像生成、结构化输出、批量API、缓存
Limitations
局限性
- Maximum 3 input images recommended for best results
- Text rendering works best when generated separately first
- Does not support audio/video inputs
- Regional restrictions on child image uploads (EEA, CH, UK)
- Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
- 为获得最佳效果,建议最多使用3张输入图像
- 文本渲染在单独生成时效果最佳
- 不支持音频/视频输入
- 部分地区(EEA、CH、UK)限制儿童图像上传
- 最优支持语言:英语、西班牙语(墨西哥)、日语、普通话、印地语
Error Handling
错误处理
Common issues and solutions:
API key not found:
bash
undefined常见问题及解决方案:
未找到API密钥:
bash
undefinedCheck environment variables
检查环境变量
echo $GEMINI_API_KEY
echo $GEMINI_API_KEY
Verify .env file exists
验证.env文件是否存在
cat .claude/skills/gemini-image-gen/.env
cat .claude/skills/gemini-image-gen/.env
or
或
cat .env
**Safety filter blocking**:
- Review `response.prompt_feedback.block_reason`
- Adjust safety settings if appropriate for your use case
- Modify prompt to avoid triggering filters
**Token limit exceeded**:
- Reduce prompt length
- Use fewer input images
- Simplify image editing instructionscat .env
**安全过滤器拦截**:
- 查看`response.prompt_feedback.block_reason`
- 若你的使用场景允许,调整安全设置
- 修改提示词以避免触发过滤器
**超出Token限制**:
- 缩短提示词长度
- 减少输入图像数量
- 简化图像编辑指令Reference Documentation
参考文档
For detailed information, see:
- - Complete API specifications
references/api-reference.md - - Advanced prompt engineering
references/prompting-guide.md - - Safety configuration details
references/safety-settings.md - - Additional implementation examples
references/code-examples.md
如需详细信息,请查看:
- - 完整API规格
references/api-reference.md - - 高级提示词工程
references/prompting-guide.md - - 安全配置细节
references/safety-settings.md - - 更多实现示例
references/code-examples.md
Resources
资源
- Official Documentation
- API Reference
- Get API Key
- Google AI Studio - Interactive testing
- 官方文档
- API参考
- 获取API密钥
- Google AI Studio - 交互式测试