stable-diffusion-image-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStable Diffusion Image Generation
Stable Diffusion 图像生成
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
使用HuggingFace Diffusers库基于Stable Diffusion生成图像的综合指南。
When to use Stable Diffusion
何时使用Stable Diffusion
Use Stable Diffusion when:
- Generating images from text descriptions
- Performing image-to-image translation (style transfer, enhancement)
- Inpainting (filling in masked regions)
- Outpainting (extending images beyond boundaries)
- Creating variations of existing images
- Building custom image generation workflows
Key features:
- Text-to-Image: Generate images from natural language prompts
- Image-to-Image: Transform existing images with text guidance
- Inpainting: Fill masked regions with context-aware content
- ControlNet: Add spatial conditioning (edges, poses, depth)
- LoRA Support: Efficient fine-tuning and style adaptation
- Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support
Use alternatives instead:
- DALL-E 3: For API-based generation without GPU
- Midjourney: For artistic, stylized outputs
- Imagen: For Google Cloud integration
- Leonardo.ai: For web-based creative workflows
在以下场景使用Stable Diffusion:
- 根据文本描述生成图像
- 执行图像到图像转换(风格迁移、图像增强)
- 图像修复(填充蒙版区域)
- 图像扩展(将图像延伸至边界之外)
- 创建现有图像的变体
- 构建自定义图像生成工作流
核心特性:
- 文本转图像:根据自然语言提示生成图像
- 图像转图像:通过文本引导转换现有图像
- 图像修复:填充蒙版区域并生成符合上下文的内容
- ControlNet:添加空间条件控制(边缘、姿态、深度)
- LoRA支持:高效微调与风格适配
- 多模型兼容:支持SD 1.5、SDXL、SD 3.0、Flux模型
可选择替代方案的场景:
- DALL-E 3:无需GPU的基于API的图像生成
- Midjourney:生成艺术化、风格化的输出
- Imagen:集成Google Cloud的图像生成
- Leonardo.ai:基于网页的创意工作流
Quick start
快速开始
Installation
安装
bash
pip install diffusers transformers accelerate torch
pip install xformers # Optional: memory-efficient attentionbash
pip install diffusers transformers accelerate torch
pip install xformers # 可选:内存高效注意力机制Basic text-to-image
基础文本转图像
python
from diffusers import DiffusionPipeline
import torchpython
from diffusers import DiffusionPipeline
import torchLoad pipeline (auto-detects model type)
加载工作流(自动检测模型类型)
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
Generate image
生成图像
image = pipe(
"A serene mountain landscape at sunset, highly detailed",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
undefinedimage = pipe(
"日落时分宁静的山地景观,细节丰富",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
undefinedUsing SDXL (higher quality)
使用SDXL(更高质量)
python
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")python
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")Enable memory optimization
启用内存优化
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic city with flying cars, cinematic lighting",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
undefinedpipe.enable_model_cpu_offload()
image = pipe(
prompt="拥有飞行汽车的未来城市,电影级灯光",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
undefinedArchitecture overview
架构概述
Three-pillar design
三大核心组件设计
Diffusers is built around three core components:
Pipeline (orchestration)
├── Model (neural networks)
│ ├── UNet / Transformer (noise prediction)
│ ├── VAE (latent encoding/decoding)
│ └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)Diffusers围绕三个核心组件构建:
Pipeline(工作流编排)
├── Model(神经网络)
│ ├── UNet / Transformer(噪声预测)
│ ├── VAE(潜在编码/解码)
│ └── Text Encoder(CLIP/T5)
└── Scheduler(去噪算法)Pipeline inference flow
工作流推理流程
Text Prompt → Text Encoder → Text Embeddings
↓
Random Noise → [Denoising Loop] ← Scheduler
↓
Predicted Noise
↓
VAE Decoder → Final Image文本提示 → 文本编码器 → 文本嵌入向量
↓
随机噪声 → [去噪循环] ← 调度器
↓
预测噪声
↓
VAE解码器 → 最终图像Core concepts
核心概念
Pipelines
工作流(Pipelines)
Pipelines orchestrate complete workflows:
| Pipeline | Purpose |
|---|---|
| Text-to-image (SD 1.x/2.x) |
| Text-to-image (SDXL) |
| Text-to-image (SD 3.0) |
| Text-to-image (Flux models) |
| Image-to-image |
| Inpainting |
工作流用于编排完整的生成流程:
| 工作流 | 用途 |
|---|---|
| 文本转图像(SD 1.x/2.x) |
| 文本转图像(SDXL) |
| 文本转图像(SD 3.0) |
| 文本转图像(Flux模型) |
| 图像转图像 |
| 图像修复 |
Schedulers
调度器(Schedulers)
Schedulers control the denoising process:
| Scheduler | Steps | Quality | Use Case |
|---|---|---|---|
| 20-50 | Good | Default choice |
| 20-50 | Good | More variation |
| 15-25 | Excellent | Fast, high quality |
| 50-100 | Good | Deterministic |
| 4-8 | Good | Very fast |
| 15-25 | Excellent | Fast convergence |
调度器控制去噪过程:
| 调度器 | 步数 | 质量 | 使用场景 |
|---|---|---|---|
| 20-50 | 良好 | 默认选择 |
| 20-50 | 良好 | 更多变体生成 |
| 15-25 | 优秀 | 快速、高质量 |
| 50-100 | 良好 | 确定性生成 |
| 4-8 | 良好 | 极快生成 |
| 15-25 | 优秀 | 快速收敛 |
Swapping schedulers
切换调度器
python
from diffusers import DPMSolverMultistepSchedulerpython
from diffusers import DPMSolverMultistepSchedulerSwap for faster generation
切换为更快的生成调度器
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
Now generate with fewer steps
使用更少的步数生成图像
image = pipe(prompt, num_inference_steps=20).images[0]
undefinedimage = pipe(prompt, num_inference_steps=20).images[0]
undefinedGeneration parameters
生成参数
Key parameters
关键参数
| Parameter | Default | Description |
|---|---|---|
| Required | Text description of desired image |
| None | What to avoid in the image |
| 50 | Denoising steps (more = better quality) |
| 7.5 | Prompt adherence (7-12 typical) |
| 512/1024 | Output dimensions (multiples of 8) |
| None | Torch generator for reproducibility |
| 1 | Batch size |
| 参数 | 默认值 | 描述 |
|---|---|---|
| 必填 | 目标图像的文本描述 |
| 无 | 图像中需要避免的内容 |
| 50 | 去噪步数(步数越多,质量越好) |
| 7.5 | 提示词遵循度(典型值7-12) |
| 512/1024 | 输出图像尺寸(需为8的倍数) |
| 无 | 用于结果复现的Torch生成器 |
| 1 | 批量生成数量 |
Reproducible generation
可复现的生成
python
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A cat wearing a top hat",
generator=generator,
num_inference_steps=50
).images[0]python
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="一只戴着礼帽的猫",
generator=generator,
num_inference_steps=50
).images[0]Negative prompts
负面提示词
python
image = pipe(
prompt="Professional photo of a dog in a garden",
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
guidance_scale=7.5
).images[0]python
image = pipe(
prompt="花园里狗狗的专业照片",
negative_prompt="模糊、低质量、扭曲、丑陋、解剖结构错误",
guidance_scale=7.5
).images[0]Image-to-image
图像转图像
Transform existing images with text guidance:
python
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="A watercolor painting of the scene",
image=init_image,
strength=0.75, # How much to transform (0-1)
num_inference_steps=50
).images[0]通过文本引导转换现有图像:
python
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="该场景的水彩画版本",
image=init_image,
strength=0.75, # 转换强度(0-1)
num_inference_steps=50
).images[0]Inpainting
图像修复
Fill masked regions:
python
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # White = inpaint region
result = pipe(
prompt="A red car parked on the street",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]填充蒙版区域:
python
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # 白色区域为需要修复的部分
result = pipe(
prompt="停在街道上的红色汽车",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]ControlNet
ControlNet
Add spatial conditioning for precise control:
python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch添加空间条件控制以实现精准生成:
python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torchLoad ControlNet for edge conditioning
加载用于边缘条件控制的ControlNet
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
Use Canny edge image as control
使用Canny边缘图作为控制输入
control_image = get_canny_image(input_image)
image = pipe(
prompt="A beautiful house in the style of Van Gogh",
image=control_image,
num_inference_steps=30
).images[0]
undefinedcontrol_image = get_canny_image(input_image)
image = pipe(
prompt="梵高风格的美丽房屋",
image=control_image,
num_inference_steps=30
).images[0]
undefinedAvailable ControlNets
可用的ControlNet类型
| ControlNet | Input Type | Use Case |
|---|---|---|
| Edge maps | Preserve structure |
| Pose skeletons | Human poses |
| Depth maps | 3D-aware generation |
| Normal maps | Surface details |
| Line segments | Architectural lines |
| Rough sketches | Sketch-to-image |
| ControlNet | 输入类型 | 使用场景 |
|---|---|---|
| 边缘图 | 保留结构 |
| 姿态骨架 | 人体姿态控制 |
| 深度图 | 3D感知生成 |
| 法线图 | 表面细节控制 |
| 线段图 | 建筑线条控制 |
| 草稿图 | 草图转图像 |
LoRA adapters
LoRA适配器
Load fine-tuned style adapters:
python
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")加载微调后的风格适配器:
python
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")Load LoRA weights
加载LoRA权重
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
Generate with LoRA style
生成带有LoRA风格的图像
image = pipe("A portrait in the trained style").images[0]
image = pipe("训练风格的肖像").images[0]
Adjust LoRA strength
调整LoRA强度
pipe.fuse_lora(lora_scale=0.8)
pipe.fuse_lora(lora_scale=0.8)
Unload LoRA
卸载LoRA
pipe.unload_lora_weights()
undefinedpipe.unload_lora_weights()
undefinedMultiple LoRAs
多LoRA叠加
python
undefinedpython
undefinedLoad multiple LoRAs
加载多个LoRA
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="character")
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="character")
Set weights for each
设置每个LoRA的权重
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("A portrait").images[0]
undefinedpipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("一幅肖像").images[0]
undefinedMemory optimization
内存优化
Enable CPU offloading
启用CPU卸载
python
undefinedpython
undefinedModel CPU offload - moves models to CPU when not in use
模型CPU卸载 - 不使用时将模型移至CPU
pipe.enable_model_cpu_offload()
pipe.enable_model_cpu_offload()
Sequential CPU offload - more aggressive, slower
顺序CPU卸载 - 更激进,速度较慢
pipe.enable_sequential_cpu_offload()
undefinedpipe.enable_sequential_cpu_offload()
undefinedAttention slicing
注意力分片
python
undefinedpython
undefinedReduce memory by computing attention in chunks
通过分块计算注意力来减少内存占用
pipe.enable_attention_slicing()
pipe.enable_attention_slicing()
Or specific chunk size
或指定分片大小
pipe.enable_attention_slicing("max")
undefinedpipe.enable_attention_slicing("max")
undefinedxFormers memory-efficient attention
xFormers内存高效注意力
python
undefinedpython
undefinedRequires xformers package
需要安装xformers包
pipe.enable_xformers_memory_efficient_attention()
undefinedpipe.enable_xformers_memory_efficient_attention()
undefinedVAE slicing for large images
大图像VAE分片
python
undefinedpython
undefinedDecode latents in tiles for large images
分块解码潜在向量以支持大图像
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
undefinedpipe.enable_vae_slicing()
pipe.enable_vae_tiling()
undefinedModel variants
模型变体
Loading different precisions
加载不同精度的模型
python
undefinedpython
undefinedFP16 (recommended for GPU)
FP16(GPU推荐使用)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16,
variant="fp16"
)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16,
variant="fp16"
)
BF16 (better precision, requires Ampere+ GPU)
BF16(精度更高,需要Ampere及以上GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.bfloat16
)
undefinedpipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.bfloat16
)
undefinedLoading specific components
加载特定组件
python
from diffusers import UNet2DConditionModel, AutoencoderKLpython
from diffusers import UNet2DConditionModel, AutoencoderKLLoad custom VAE
加载自定义VAE
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
Use with pipeline
在工作流中使用自定义VAE
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
vae=vae,
torch_dtype=torch.float16
)
undefinedpipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
vae=vae,
torch_dtype=torch.float16
)
undefinedBatch generation
批量生成
Generate multiple images efficiently:
python
undefined高效生成多张图像:
python
undefinedMultiple prompts
多提示词生成
prompts = [
"A cat playing piano",
"A dog reading a book",
"A bird painting a picture"
]
images = pipe(prompts, num_inference_steps=30).images
prompts = [
"弹钢琴的猫",
"看书的狗",
"画画的鸟"
]
images = pipe(prompts, num_inference_steps=30).images
Multiple images per prompt
单提示词批量生成
images = pipe(
"A beautiful sunset",
num_images_per_prompt=4,
num_inference_steps=30
).images
undefinedimages = pipe(
"美丽的日落",
num_images_per_prompt=4,
num_inference_steps=30
).images
undefinedCommon workflows
常见工作流
Workflow 1: High-quality generation
工作流1:高质量图像生成
python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torchpython
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch1. Load SDXL with optimizations
1. 加载带优化的SDXL
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
2. Generate with quality settings
2. 使用高质量参数生成
image = pipe(
prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
negative_prompt="blurry, low quality, cartoon, anime, sketch",
num_inference_steps=30,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
undefinedimage = pipe(
prompt="稀树草原上的雄伟狮子,黄金时段光线,8K分辨率,毛发细节丰富",
negative_prompt="模糊、低质量、卡通、动漫、草图",
num_inference_steps=30,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
undefinedWorkflow 2: Fast prototyping
工作流2:快速原型生成
python
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torchpython
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torchUse LCM for 4-8 step generation
使用LCM实现4-8步快速生成
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
Load LCM LoRA for fast generation
加载LCM LoRA以实现快速生成
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.fuse_lora()
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.fuse_lora()
Generate in ~1 second
约1秒生成图像
image = pipe(
"A beautiful landscape",
num_inference_steps=4,
guidance_scale=1.0
).images[0]
undefinedimage = pipe(
"美丽的风景",
num_inference_steps=4,
guidance_scale=1.0
).images[0]
undefinedCommon issues
常见问题
CUDA out of memory:
python
undefinedCUDA内存不足:
python
undefinedEnable memory optimizations
启用内存优化
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
Or use lower precision
或使用更低精度
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
**Black/noise images:**
```pythonpipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
**生成黑色/噪声图像:**
```pythonCheck VAE configuration
检查VAE配置
Use safety checker bypass if needed
如有需要可关闭安全检查器
pipe.safety_checker = None
pipe.safety_checker = None
Ensure proper dtype consistency
确保数据类型一致
pipe = pipe.to(dtype=torch.float16)
**Slow generation:**
```pythonpipe = pipe.to(dtype=torch.float16)
**生成速度慢:**
```pythonUse faster scheduler
使用更快的调度器
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
Reduce steps
减少去噪步数
image = pipe(prompt, num_inference_steps=20).images[0]
undefinedimage = pipe(prompt, num_inference_steps=20).images[0]
undefinedReferences
参考资料
- Advanced Usage - Custom pipelines, fine-tuning, deployment
- Troubleshooting - Common issues and solutions
- 高级用法 - 自定义工作流、微调、部署
- 故障排除 - 常见问题及解决方案
Resources
资源
- Documentation: https://huggingface.co/docs/diffusers
- Repository: https://github.com/huggingface/diffusers
- Model Hub: https://huggingface.co/models?library=diffusers
- Discord: https://discord.gg/diffusers