stable-diffusion-image-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Stable Diffusion Image Generation

Stable Diffusion 图像生成

Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
使用HuggingFace Diffusers库基于Stable Diffusion生成图像的综合指南。

When to use Stable Diffusion

何时使用Stable Diffusion

Use Stable Diffusion when:
  • Generating images from text descriptions
  • Performing image-to-image translation (style transfer, enhancement)
  • Inpainting (filling in masked regions)
  • Outpainting (extending images beyond boundaries)
  • Creating variations of existing images
  • Building custom image generation workflows
Key features:
  • Text-to-Image: Generate images from natural language prompts
  • Image-to-Image: Transform existing images with text guidance
  • Inpainting: Fill masked regions with context-aware content
  • ControlNet: Add spatial conditioning (edges, poses, depth)
  • LoRA Support: Efficient fine-tuning and style adaptation
  • Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support
Use alternatives instead:
  • DALL-E 3: For API-based generation without GPU
  • Midjourney: For artistic, stylized outputs
  • Imagen: For Google Cloud integration
  • Leonardo.ai: For web-based creative workflows
在以下场景使用Stable Diffusion:
  • 根据文本描述生成图像
  • 执行图像到图像转换(风格迁移、图像增强)
  • 图像修复(填充蒙版区域)
  • 图像扩展(将图像延伸至边界之外)
  • 创建现有图像的变体
  • 构建自定义图像生成工作流
核心特性:
  • 文本转图像:根据自然语言提示生成图像
  • 图像转图像:通过文本引导转换现有图像
  • 图像修复:填充蒙版区域并生成符合上下文的内容
  • ControlNet:添加空间条件控制(边缘、姿态、深度)
  • LoRA支持:高效微调与风格适配
  • 多模型兼容:支持SD 1.5、SDXL、SD 3.0、Flux模型
可选择替代方案的场景:
  • DALL-E 3:无需GPU的基于API的图像生成
  • Midjourney:生成艺术化、风格化的输出
  • Imagen:集成Google Cloud的图像生成
  • Leonardo.ai:基于网页的创意工作流

Quick start

快速开始

Installation

安装

bash
pip install diffusers transformers accelerate torch
pip install xformers  # Optional: memory-efficient attention
bash
pip install diffusers transformers accelerate torch
pip install xformers  # 可选:内存高效注意力机制

Basic text-to-image

基础文本转图像

python
from diffusers import DiffusionPipeline
import torch
python
from diffusers import DiffusionPipeline
import torch

Load pipeline (auto-detects model type)

加载工作流(自动检测模型类型)

pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ) pipe.to("cuda")
pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ) pipe.to("cuda")

Generate image

生成图像

image = pipe( "A serene mountain landscape at sunset, highly detailed", num_inference_steps=50, guidance_scale=7.5 ).images[0]
image.save("output.png")
undefined
image = pipe( "日落时分宁静的山地景观,细节丰富", num_inference_steps=50, guidance_scale=7.5 ).images[0]
image.save("output.png")
undefined

Using SDXL (higher quality)

使用SDXL(更高质量)

python
from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")
python
from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

Enable memory optimization

启用内存优化

pipe.enable_model_cpu_offload()
image = pipe( prompt="A futuristic city with flying cars, cinematic lighting", height=1024, width=1024, num_inference_steps=30 ).images[0]
undefined
pipe.enable_model_cpu_offload()
image = pipe( prompt="拥有飞行汽车的未来城市,电影级灯光", height=1024, width=1024, num_inference_steps=30 ).images[0]
undefined

Architecture overview

架构概述

Three-pillar design

三大核心组件设计

Diffusers is built around three core components:
Pipeline (orchestration)
├── Model (neural networks)
│   ├── UNet / Transformer (noise prediction)
│   ├── VAE (latent encoding/decoding)
│   └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)
Diffusers围绕三个核心组件构建:
Pipeline(工作流编排)
├── Model(神经网络)
│   ├── UNet / Transformer(噪声预测)
│   ├── VAE(潜在编码/解码)
│   └── Text Encoder(CLIP/T5)
└── Scheduler(去噪算法)

Pipeline inference flow

工作流推理流程

Text Prompt → Text Encoder → Text Embeddings
Random Noise → [Denoising Loop] ← Scheduler
               Predicted Noise
              VAE Decoder → Final Image
文本提示 → 文本编码器 → 文本嵌入向量
随机噪声 → [去噪循环] ← 调度器
               预测噪声
              VAE解码器 → 最终图像

Core concepts

核心概念

Pipelines

工作流(Pipelines)

Pipelines orchestrate complete workflows:
PipelinePurpose
StableDiffusionPipeline
Text-to-image (SD 1.x/2.x)
StableDiffusionXLPipeline
Text-to-image (SDXL)
StableDiffusion3Pipeline
Text-to-image (SD 3.0)
FluxPipeline
Text-to-image (Flux models)
StableDiffusionImg2ImgPipeline
Image-to-image
StableDiffusionInpaintPipeline
Inpainting
工作流用于编排完整的生成流程:
工作流用途
StableDiffusionPipeline
文本转图像(SD 1.x/2.x)
StableDiffusionXLPipeline
文本转图像(SDXL)
StableDiffusion3Pipeline
文本转图像(SD 3.0)
FluxPipeline
文本转图像(Flux模型)
StableDiffusionImg2ImgPipeline
图像转图像
StableDiffusionInpaintPipeline
图像修复

Schedulers

调度器(Schedulers)

Schedulers control the denoising process:
SchedulerStepsQualityUse Case
EulerDiscreteScheduler
20-50GoodDefault choice
EulerAncestralDiscreteScheduler
20-50GoodMore variation
DPMSolverMultistepScheduler
15-25ExcellentFast, high quality
DDIMScheduler
50-100GoodDeterministic
LCMScheduler
4-8GoodVery fast
UniPCMultistepScheduler
15-25ExcellentFast convergence
调度器控制去噪过程:
调度器步数质量使用场景
EulerDiscreteScheduler
20-50良好默认选择
EulerAncestralDiscreteScheduler
20-50良好更多变体生成
DPMSolverMultistepScheduler
15-25优秀快速、高质量
DDIMScheduler
50-100良好确定性生成
LCMScheduler
4-8良好极快生成
UniPCMultistepScheduler
15-25优秀快速收敛

Swapping schedulers

切换调度器

python
from diffusers import DPMSolverMultistepScheduler
python
from diffusers import DPMSolverMultistepScheduler

Swap for faster generation

切换为更快的生成调度器

pipe.scheduler = DPMSolverMultistepScheduler.from_config( pipe.scheduler.config )
pipe.scheduler = DPMSolverMultistepScheduler.from_config( pipe.scheduler.config )

Now generate with fewer steps

使用更少的步数生成图像

image = pipe(prompt, num_inference_steps=20).images[0]
undefined
image = pipe(prompt, num_inference_steps=20).images[0]
undefined

Generation parameters

生成参数

Key parameters

关键参数

ParameterDefaultDescription
prompt
RequiredText description of desired image
negative_prompt
NoneWhat to avoid in the image
num_inference_steps
50Denoising steps (more = better quality)
guidance_scale
7.5Prompt adherence (7-12 typical)
height
,
width
512/1024Output dimensions (multiples of 8)
generator
NoneTorch generator for reproducibility
num_images_per_prompt
1Batch size
参数默认值描述
prompt
必填目标图像的文本描述
negative_prompt
图像中需要避免的内容
num_inference_steps
50去噪步数(步数越多,质量越好)
guidance_scale
7.5提示词遵循度(典型值7-12)
height
,
width
512/1024输出图像尺寸(需为8的倍数)
generator
用于结果复现的Torch生成器
num_images_per_prompt
1批量生成数量

Reproducible generation

可复现的生成

python
import torch

generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(
    prompt="A cat wearing a top hat",
    generator=generator,
    num_inference_steps=50
).images[0]
python
import torch

generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(
    prompt="一只戴着礼帽的猫",
    generator=generator,
    num_inference_steps=50
).images[0]

Negative prompts

负面提示词

python
image = pipe(
    prompt="Professional photo of a dog in a garden",
    negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
    guidance_scale=7.5
).images[0]
python
image = pipe(
    prompt="花园里狗狗的专业照片",
    negative_prompt="模糊、低质量、扭曲、丑陋、解剖结构错误",
    guidance_scale=7.5
).images[0]

Image-to-image

图像转图像

Transform existing images with text guidance:
python
from diffusers import AutoPipelineForImage2Image
from PIL import Image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

init_image = Image.open("input.jpg").resize((512, 512))

image = pipe(
    prompt="A watercolor painting of the scene",
    image=init_image,
    strength=0.75,  # How much to transform (0-1)
    num_inference_steps=50
).images[0]
通过文本引导转换现有图像:
python
from diffusers import AutoPipelineForImage2Image
from PIL import Image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

init_image = Image.open("input.jpg").resize((512, 512))

image = pipe(
    prompt="该场景的水彩画版本",
    image=init_image,
    strength=0.75,  # 转换强度(0-1)
    num_inference_steps=50
).images[0]

Inpainting

图像修复

Fill masked regions:
python
from diffusers import AutoPipelineForInpainting
from PIL import Image

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16
).to("cuda")

image = Image.open("photo.jpg")
mask = Image.open("mask.png")  # White = inpaint region

result = pipe(
    prompt="A red car parked on the street",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]
填充蒙版区域:
python
from diffusers import AutoPipelineForInpainting
from PIL import Image

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16
).to("cuda")

image = Image.open("photo.jpg")
mask = Image.open("mask.png")  # 白色区域为需要修复的部分

result = pipe(
    prompt="停在街道上的红色汽车",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

ControlNet

ControlNet

Add spatial conditioning for precise control:
python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
添加空间条件控制以实现精准生成:
python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

Load ControlNet for edge conditioning

加载用于边缘条件控制的ControlNet

controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16 )
pipe = StableDiffusionControlNetPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda")
controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16 )
pipe = StableDiffusionControlNetPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda")

Use Canny edge image as control

使用Canny边缘图作为控制输入

control_image = get_canny_image(input_image)
image = pipe( prompt="A beautiful house in the style of Van Gogh", image=control_image, num_inference_steps=30 ).images[0]
undefined
control_image = get_canny_image(input_image)
image = pipe( prompt="梵高风格的美丽房屋", image=control_image, num_inference_steps=30 ).images[0]
undefined

Available ControlNets

可用的ControlNet类型

ControlNetInput TypeUse Case
canny
Edge mapsPreserve structure
openpose
Pose skeletonsHuman poses
depth
Depth maps3D-aware generation
normal
Normal mapsSurface details
mlsd
Line segmentsArchitectural lines
scribble
Rough sketchesSketch-to-image
ControlNet输入类型使用场景
canny
边缘图保留结构
openpose
姿态骨架人体姿态控制
depth
深度图3D感知生成
normal
法线图表面细节控制
mlsd
线段图建筑线条控制
scribble
草稿图草图转图像

LoRA adapters

LoRA适配器

Load fine-tuned style adapters:
python
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")
加载微调后的风格适配器:
python
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

Load LoRA weights

加载LoRA权重

pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")

Generate with LoRA style

生成带有LoRA风格的图像

image = pipe("A portrait in the trained style").images[0]
image = pipe("训练风格的肖像").images[0]

Adjust LoRA strength

调整LoRA强度

pipe.fuse_lora(lora_scale=0.8)
pipe.fuse_lora(lora_scale=0.8)

Unload LoRA

卸载LoRA

pipe.unload_lora_weights()
undefined
pipe.unload_lora_weights()
undefined

Multiple LoRAs

多LoRA叠加

python
undefined
python
undefined

Load multiple LoRAs

加载多个LoRA

pipe.load_lora_weights("lora1", adapter_name="style") pipe.load_lora_weights("lora2", adapter_name="character")
pipe.load_lora_weights("lora1", adapter_name="style") pipe.load_lora_weights("lora2", adapter_name="character")

Set weights for each

设置每个LoRA的权重

pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("A portrait").images[0]
undefined
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("一幅肖像").images[0]
undefined

Memory optimization

内存优化

Enable CPU offloading

启用CPU卸载

python
undefined
python
undefined

Model CPU offload - moves models to CPU when not in use

模型CPU卸载 - 不使用时将模型移至CPU

pipe.enable_model_cpu_offload()
pipe.enable_model_cpu_offload()

Sequential CPU offload - more aggressive, slower

顺序CPU卸载 - 更激进,速度较慢

pipe.enable_sequential_cpu_offload()
undefined
pipe.enable_sequential_cpu_offload()
undefined

Attention slicing

注意力分片

python
undefined
python
undefined

Reduce memory by computing attention in chunks

通过分块计算注意力来减少内存占用

pipe.enable_attention_slicing()
pipe.enable_attention_slicing()

Or specific chunk size

或指定分片大小

pipe.enable_attention_slicing("max")
undefined
pipe.enable_attention_slicing("max")
undefined

xFormers memory-efficient attention

xFormers内存高效注意力

python
undefined
python
undefined

Requires xformers package

需要安装xformers包

pipe.enable_xformers_memory_efficient_attention()
undefined
pipe.enable_xformers_memory_efficient_attention()
undefined

VAE slicing for large images

大图像VAE分片

python
undefined
python
undefined

Decode latents in tiles for large images

分块解码潜在向量以支持大图像

pipe.enable_vae_slicing() pipe.enable_vae_tiling()
undefined
pipe.enable_vae_slicing() pipe.enable_vae_tiling()
undefined

Model variants

模型变体

Loading different precisions

加载不同精度的模型

python
undefined
python
undefined

FP16 (recommended for GPU)

FP16(GPU推荐使用)

pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.float16, variant="fp16" )
pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.float16, variant="fp16" )

BF16 (better precision, requires Ampere+ GPU)

BF16(精度更高,需要Ampere及以上GPU)

pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.bfloat16 )
undefined
pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.bfloat16 )
undefined

Loading specific components

加载特定组件

python
from diffusers import UNet2DConditionModel, AutoencoderKL
python
from diffusers import UNet2DConditionModel, AutoencoderKL

Load custom VAE

加载自定义VAE

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")

Use with pipeline

在工作流中使用自定义VAE

pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 )
undefined
pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 )
undefined

Batch generation

批量生成

Generate multiple images efficiently:
python
undefined
高效生成多张图像:
python
undefined

Multiple prompts

多提示词生成

prompts = [ "A cat playing piano", "A dog reading a book", "A bird painting a picture" ]
images = pipe(prompts, num_inference_steps=30).images
prompts = [ "弹钢琴的猫", "看书的狗", "画画的鸟" ]
images = pipe(prompts, num_inference_steps=30).images

Multiple images per prompt

单提示词批量生成

images = pipe( "A beautiful sunset", num_images_per_prompt=4, num_inference_steps=30 ).images
undefined
images = pipe( "美丽的日落", num_images_per_prompt=4, num_inference_steps=30 ).images
undefined

Common workflows

常见工作流

Workflow 1: High-quality generation

工作流1:高质量图像生成

python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
python
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch

1. Load SDXL with optimizations

1. 加载带优化的SDXL

pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16" ) pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16" ) pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()

2. Generate with quality settings

2. 使用高质量参数生成

image = pipe( prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur", negative_prompt="blurry, low quality, cartoon, anime, sketch", num_inference_steps=30, guidance_scale=7.5, height=1024, width=1024 ).images[0]
undefined
image = pipe( prompt="稀树草原上的雄伟狮子,黄金时段光线,8K分辨率,毛发细节丰富", negative_prompt="模糊、低质量、卡通、动漫、草图", num_inference_steps=30, guidance_scale=7.5, height=1024, width=1024 ).images[0]
undefined

Workflow 2: Fast prototyping

工作流2:快速原型生成

python
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch
python
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch

Use LCM for 4-8 step generation

使用LCM实现4-8步快速生成

pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 ).to("cuda")
pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 ).to("cuda")

Load LCM LoRA for fast generation

加载LCM LoRA以实现快速生成

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) pipe.fuse_lora()
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) pipe.fuse_lora()

Generate in ~1 second

约1秒生成图像

image = pipe( "A beautiful landscape", num_inference_steps=4, guidance_scale=1.0 ).images[0]
undefined
image = pipe( "美丽的风景", num_inference_steps=4, guidance_scale=1.0 ).images[0]
undefined

Common issues

常见问题

CUDA out of memory:
python
undefined
CUDA内存不足:
python
undefined

Enable memory optimizations

启用内存优化

pipe.enable_model_cpu_offload() pipe.enable_attention_slicing() pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload() pipe.enable_attention_slicing() pipe.enable_vae_slicing()

Or use lower precision

或使用更低精度

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

**Black/noise images:**
```python
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

**生成黑色/噪声图像:**
```python

Check VAE configuration

检查VAE配置

Use safety checker bypass if needed

如有需要可关闭安全检查器

pipe.safety_checker = None
pipe.safety_checker = None

Ensure proper dtype consistency

确保数据类型一致

pipe = pipe.to(dtype=torch.float16)

**Slow generation:**
```python
pipe = pipe.to(dtype=torch.float16)

**生成速度慢:**
```python

Use faster scheduler

使用更快的调度器

from diffusers import DPMSolverMultistepScheduler pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
from diffusers import DPMSolverMultistepScheduler pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

Reduce steps

减少去噪步数

image = pipe(prompt, num_inference_steps=20).images[0]
undefined
image = pipe(prompt, num_inference_steps=20).images[0]
undefined

References

参考资料

  • Advanced Usage - Custom pipelines, fine-tuning, deployment
  • Troubleshooting - Common issues and solutions
  • 高级用法 - 自定义工作流、微调、部署
  • 故障排除 - 常见问题及解决方案

Resources

资源