stable-diffusion-image-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Stable Diffusion Image Generation

Stable Diffusion 图像生成

Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.

使用HuggingFace Diffusers库基于Stable Diffusion生成图像的综合指南。

When to use Stable Diffusion

何时使用Stable Diffusion

Use Stable Diffusion when:

Generating images from text descriptions
Performing image-to-image translation (style transfer, enhancement)
Inpainting (filling in masked regions)
Outpainting (extending images beyond boundaries)
Creating variations of existing images
Building custom image generation workflows

Key features:

Text-to-Image: Generate images from natural language prompts
Image-to-Image: Transform existing images with text guidance
Inpainting: Fill masked regions with context-aware content
ControlNet: Add spatial conditioning (edges, poses, depth)
LoRA Support: Efficient fine-tuning and style adaptation
Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support

Use alternatives instead:

DALL-E 3: For API-based generation without GPU
Midjourney: For artistic, stylized outputs
Imagen: For Google Cloud integration
Leonardo.ai: For web-based creative workflows

在以下场景使用Stable Diffusion：

根据文本描述生成图像
执行图像到图像转换（风格迁移、图像增强）
图像修复（填充蒙版区域）
图像扩展（将图像延伸至边界之外）
创建现有图像的变体
构建自定义图像生成工作流

核心特性：

文本转图像：根据自然语言提示生成图像
图像转图像：通过文本引导转换现有图像
图像修复：填充蒙版区域并生成符合上下文的内容
ControlNet：添加空间条件控制（边缘、姿态、深度）
LoRA支持：高效微调与风格适配
多模型兼容：支持SD 1.5、SDXL、SD 3.0、Flux模型

可选择替代方案的场景：

DALL-E 3：无需GPU的基于API的图像生成
Midjourney：生成艺术化、风格化的输出
Imagen：集成Google Cloud的图像生成
Leonardo.ai：基于网页的创意工作流

Quick start

快速开始

Installation

安装

bash

pip install diffusers transformers accelerate torch
pip install xformers  # Optional: memory-efficient attention

bash

pip install diffusers transformers accelerate torch
pip install xformers  # 可选：内存高效注意力机制

Basic text-to-image

基础文本转图像

python

from diffusers import DiffusionPipeline
import torch

python

from diffusers import DiffusionPipeline
import torch

Load pipeline (auto-detects model type)

加载工作流（自动检测模型类型）

pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ) pipe.to("cuda")

Generate image

生成图像

image = pipe( "A serene mountain landscape at sunset, highly detailed", num_inference_steps=50, guidance_scale=7.5 ).images[0]

image.save("output.png")

undefined

image = pipe( "日落时分宁静的山地景观，细节丰富", num_inference_steps=50, guidance_scale=7.5 ).images[0]

image.save("output.png")

undefined

Using SDXL (higher quality)

使用SDXL（更高质量）

python

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

python

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

Enable memory optimization

启用内存优化

pipe.enable_model_cpu_offload()

image = pipe( prompt="A futuristic city with flying cars, cinematic lighting", height=1024, width=1024, num_inference_steps=30 ).images[0]

undefined

pipe.enable_model_cpu_offload()

image = pipe( prompt="拥有飞行汽车的未来城市，电影级灯光", height=1024, width=1024, num_inference_steps=30 ).images[0]

undefined

Architecture overview

架构概述

Three-pillar design

三大核心组件设计

Diffusers is built around three core components:

Pipeline (orchestration)
├── Model (neural networks)
│   ├── UNet / Transformer (noise prediction)
│   ├── VAE (latent encoding/decoding)
│   └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)

Diffusers围绕三个核心组件构建：

Pipeline（工作流编排）
├── Model（神经网络）
│   ├── UNet / Transformer（噪声预测）
│   ├── VAE（潜在编码/解码）
│   └── Text Encoder（CLIP/T5）
└── Scheduler（去噪算法）

Pipeline inference flow

工作流推理流程

Text Prompt → Text Encoder → Text Embeddings
                                    ↓
Random Noise → [Denoising Loop] ← Scheduler
                      ↓
               Predicted Noise
                      ↓
              VAE Decoder → Final Image

文本提示 → 文本编码器 → 文本嵌入向量
                                    ↓
随机噪声 → [去噪循环] ← 调度器
                      ↓
               预测噪声
                      ↓
              VAE解码器 → 最终图像

Core concepts

核心概念

Pipelines

工作流（Pipelines）

Pipelines orchestrate complete workflows:

Pipeline	Purpose
`StableDiffusionPipeline`	Text-to-image (SD 1.x/2.x)
`StableDiffusionXLPipeline`	Text-to-image (SDXL)
`StableDiffusion3Pipeline`	Text-to-image (SD 3.0)
`FluxPipeline`	Text-to-image (Flux models)
`StableDiffusionImg2ImgPipeline`	Image-to-image
`StableDiffusionInpaintPipeline`	Inpainting

工作流用于编排完整的生成流程：

工作流	用途
`StableDiffusionPipeline`	文本转图像（SD 1.x/2.x）
`StableDiffusionXLPipeline`	文本转图像（SDXL）
`StableDiffusion3Pipeline`	文本转图像（SD 3.0）
`FluxPipeline`	文本转图像（Flux模型）
`StableDiffusionImg2ImgPipeline`	图像转图像
`StableDiffusionInpaintPipeline`	图像修复

Schedulers

调度器（Schedulers）

Schedulers control the denoising process:

Scheduler	Steps	Quality	Use Case
`EulerDiscreteScheduler`	20-50	Good	Default choice
`EulerAncestralDiscreteScheduler`	20-50	Good	More variation
`DPMSolverMultistepScheduler`	15-25	Excellent	Fast, high quality
`DDIMScheduler`	50-100	Good	Deterministic
`LCMScheduler`	4-8	Good	Very fast
`UniPCMultistepScheduler`	15-25	Excellent	Fast convergence

调度器控制去噪过程：

调度器	步数	质量	使用场景
`EulerDiscreteScheduler`	20-50	良好	默认选择
`EulerAncestralDiscreteScheduler`	20-50	良好	更多变体生成
`DPMSolverMultistepScheduler`	15-25	优秀	快速、高质量
`DDIMScheduler`	50-100	良好	确定性生成
`LCMScheduler`	4-8	良好	极快生成
`UniPCMultistepScheduler`	15-25	优秀	快速收敛

Swapping schedulers

切换调度器

python

from diffusers import DPMSolverMultistepScheduler

python

from diffusers import DPMSolverMultistepScheduler

Swap for faster generation

切换为更快的生成调度器

pipe.scheduler = DPMSolverMultistepScheduler.from_config( pipe.scheduler.config )

Now generate with fewer steps

使用更少的步数生成图像

image = pipe(prompt, num_inference_steps=20).images[0]

undefined

image = pipe(prompt, num_inference_steps=20).images[0]

undefined

Generation parameters

生成参数

Key parameters

关键参数

Parameter	Default	Description
`prompt`	Required	Text description of desired image
`negative_prompt`	None	What to avoid in the image
`num_inference_steps`	50	Denoising steps (more = better quality)
`guidance_scale`	7.5	Prompt adherence (7-12 typical)
`height` , `width`	512/1024	Output dimensions (multiples of 8)
`generator`	None	Torch generator for reproducibility
`num_images_per_prompt`	1	Batch size

参数	默认值	描述
`prompt`	必填	目标图像的文本描述
`negative_prompt`	无	图像中需要避免的内容
`num_inference_steps`	50	去噪步数（步数越多，质量越好）
`guidance_scale`	7.5	提示词遵循度（典型值7-12）
`height` , `width`	512/1024	输出图像尺寸（需为8的倍数）
`generator`	无	用于结果复现的Torch生成器
`num_images_per_prompt`	1	批量生成数量

Reproducible generation

可复现的生成

python

import torch

generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(
    prompt="A cat wearing a top hat",
    generator=generator,
    num_inference_steps=50
).images[0]

python

import torch

generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(
    prompt="一只戴着礼帽的猫",
    generator=generator,
    num_inference_steps=50
).images[0]

Negative prompts

负面提示词

python

image = pipe(
    prompt="Professional photo of a dog in a garden",
    negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
    guidance_scale=7.5
).images[0]

python

image = pipe(
    prompt="花园里狗狗的专业照片",
    negative_prompt="模糊、低质量、扭曲、丑陋、解剖结构错误",
    guidance_scale=7.5
).images[0]

Image-to-image

图像转图像

Transform existing images with text guidance:

python

from diffusers import AutoPipelineForImage2Image
from PIL import Image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

init_image = Image.open("input.jpg").resize((512, 512))

image = pipe(
    prompt="A watercolor painting of the scene",
    image=init_image,
    strength=0.75,  # How much to transform (0-1)
    num_inference_steps=50
).images[0]

通过文本引导转换现有图像：

python

from diffusers import AutoPipelineForImage2Image
from PIL import Image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

init_image = Image.open("input.jpg").resize((512, 512))

image = pipe(
    prompt="该场景的水彩画版本",
    image=init_image,
    strength=0.75,  # 转换强度（0-1）
    num_inference_steps=50
).images[0]

Inpainting

图像修复

Fill masked regions:

python

from diffusers import AutoPipelineForInpainting
from PIL import Image

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16
).to("cuda")

image = Image.open("photo.jpg")
mask = Image.open("mask.png")  # White = inpaint region

result = pipe(
    prompt="A red car parked on the street",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

填充蒙版区域：

python

from diffusers import AutoPipelineForInpainting
from PIL import Image

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16
).to("cuda")

image = Image.open("photo.jpg")
mask = Image.open("mask.png")  # 白色区域为需要修复的部分

result = pipe(
    prompt="停在街道上的红色汽车",
    image=image,
    mask_image=mask,
    num_inference_steps=50
).images[0]

ControlNet

Add spatial conditioning for precise control:

python

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

添加空间条件控制以实现精准生成：

python

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

Load ControlNet for edge conditioning

加载用于边缘条件控制的ControlNet

controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16 )

pipe = StableDiffusionControlNetPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda")

controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16 )

pipe = StableDiffusionControlNetPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda")

Use Canny edge image as control

使用Canny边缘图作为控制输入

control_image = get_canny_image(input_image)

image = pipe( prompt="A beautiful house in the style of Van Gogh", image=control_image, num_inference_steps=30 ).images[0]

undefined

control_image = get_canny_image(input_image)

image = pipe( prompt="梵高风格的美丽房屋", image=control_image, num_inference_steps=30 ).images[0]

undefined

Available ControlNets

可用的ControlNet类型

ControlNet	Input Type	Use Case
`canny`	Edge maps	Preserve structure
`openpose`	Pose skeletons	Human poses
`depth`	Depth maps	3D-aware generation
`normal`	Normal maps	Surface details
`mlsd`	Line segments	Architectural lines
`scribble`	Rough sketches	Sketch-to-image

ControlNet	输入类型	使用场景
`canny`	边缘图	保留结构
`openpose`	姿态骨架	人体姿态控制
`depth`	深度图	3D感知生成
`normal`	法线图	表面细节控制
`mlsd`	线段图	建筑线条控制
`scribble`	草稿图	草图转图像

LoRA adapters

LoRA适配器

Load fine-tuned style adapters:

python

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

加载微调后的风格适配器：

python

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

Load LoRA weights

加载LoRA权重

pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")

Generate with LoRA style

生成带有LoRA风格的图像

image = pipe("A portrait in the trained style").images[0]

image = pipe("训练风格的肖像").images[0]

Adjust LoRA strength

调整LoRA强度

pipe.fuse_lora(lora_scale=0.8)

Unload LoRA

卸载LoRA

pipe.unload_lora_weights()

undefined

pipe.unload_lora_weights()

undefined

Multiple LoRAs

多LoRA叠加

python

undefined

python

undefined

Load multiple LoRAs

加载多个LoRA

pipe.load_lora_weights("lora1", adapter_name="style") pipe.load_lora_weights("lora2", adapter_name="character")

Set weights for each

设置每个LoRA的权重

pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe("A portrait").images[0]

undefined

pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe("一幅肖像").images[0]

undefined

Memory optimization

内存优化

Enable CPU offloading

启用CPU卸载

python

undefined

python

undefined

Model CPU offload - moves models to CPU when not in use

模型CPU卸载 - 不使用时将模型移至CPU

pipe.enable_model_cpu_offload()

Sequential CPU offload - more aggressive, slower

顺序CPU卸载 - 更激进，速度较慢

pipe.enable_sequential_cpu_offload()

undefined

pipe.enable_sequential_cpu_offload()

undefined

Attention slicing

注意力分片

python

undefined

python

undefined

Reduce memory by computing attention in chunks

通过分块计算注意力来减少内存占用

pipe.enable_attention_slicing()

Or specific chunk size

或指定分片大小

pipe.enable_attention_slicing("max")

undefined

pipe.enable_attention_slicing("max")

undefined

xFormers memory-efficient attention

xFormers内存高效注意力

python

undefined

python

undefined

Requires xformers package

需要安装xformers包

pipe.enable_xformers_memory_efficient_attention()

undefined

pipe.enable_xformers_memory_efficient_attention()

undefined

VAE slicing for large images

大图像VAE分片

python

undefined

python

undefined

Decode latents in tiles for large images

分块解码潜在向量以支持大图像

pipe.enable_vae_slicing() pipe.enable_vae_tiling()

undefined

pipe.enable_vae_slicing() pipe.enable_vae_tiling()

undefined

Model variants

模型变体

Loading different precisions

加载不同精度的模型

python

undefined

python

undefined

FP16 (recommended for GPU)

FP16（GPU推荐使用）

pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.float16, variant="fp16" )

BF16 (better precision, requires Ampere+ GPU)

BF16（精度更高，需要Ampere及以上GPU）

pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.bfloat16 )

undefined

pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.bfloat16 )

undefined

Loading specific components

加载特定组件

python

from diffusers import UNet2DConditionModel, AutoencoderKL

python

from diffusers import UNet2DConditionModel, AutoencoderKL

Load custom VAE

加载自定义VAE

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")

Use with pipeline

在工作流中使用自定义VAE

pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 )

undefined

pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 )

undefined

Batch generation

批量生成

Generate multiple images efficiently:

python

undefined

高效生成多张图像：

python

undefined

Multiple prompts

多提示词生成

prompts = [ "A cat playing piano", "A dog reading a book", "A bird painting a picture" ]

images = pipe(prompts, num_inference_steps=30).images

prompts = [ "弹钢琴的猫", "看书的狗", "画画的鸟" ]

images = pipe(prompts, num_inference_steps=30).images

Multiple images per prompt

单提示词批量生成

images = pipe( "A beautiful sunset", num_images_per_prompt=4, num_inference_steps=30 ).images

undefined

images = pipe( "美丽的日落", num_images_per_prompt=4, num_inference_steps=30 ).images

undefined

Common workflows

常见工作流

Workflow 1: High-quality generation

工作流1：高质量图像生成

python

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch

python

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch

1. Load SDXL with optimizations

1. 加载带优化的SDXL

pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16" ) pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()

2. Generate with quality settings

2. 使用高质量参数生成

image = pipe( prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur", negative_prompt="blurry, low quality, cartoon, anime, sketch", num_inference_steps=30, guidance_scale=7.5, height=1024, width=1024 ).images[0]

undefined

image = pipe( prompt="稀树草原上的雄伟狮子，黄金时段光线，8K分辨率，毛发细节丰富", negative_prompt="模糊、低质量、卡通、动漫、草图", num_inference_steps=30, guidance_scale=7.5, height=1024, width=1024 ).images[0]

undefined

Workflow 2: Fast prototyping

工作流2：快速原型生成

python

from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch

python

from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch

Use LCM for 4-8 step generation

使用LCM实现4-8步快速生成

pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 ).to("cuda")

Load LCM LoRA for fast generation

加载LCM LoRA以实现快速生成

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) pipe.fuse_lora()

Generate in ~1 second

约1秒生成图像

image = pipe( "A beautiful landscape", num_inference_steps=4, guidance_scale=1.0 ).images[0]

undefined

image = pipe( "美丽的风景", num_inference_steps=4, guidance_scale=1.0 ).images[0]

undefined

Common issues

常见问题

CUDA out of memory:

python

undefined

CUDA内存不足：

python

undefined

Enable memory optimizations

启用内存优化

pipe.enable_model_cpu_offload() pipe.enable_attention_slicing() pipe.enable_vae_slicing()

Or use lower precision

或使用更低精度

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)


**Black/noise images:**
```python

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)


**生成黑色/噪声图像：**
```python

Check VAE configuration

检查VAE配置

Use safety checker bypass if needed

如有需要可关闭安全检查器

pipe.safety_checker = None

Ensure proper dtype consistency

确保数据类型一致

pipe = pipe.to(dtype=torch.float16)


**Slow generation:**
```python

pipe = pipe.to(dtype=torch.float16)


**生成速度慢：**
```python

Use faster scheduler

使用更快的调度器

from diffusers import DPMSolverMultistepScheduler pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

Reduce steps

减少去噪步数

image = pipe(prompt, num_inference_steps=20).images[0]

undefined

image = pipe(prompt, num_inference_steps=20).images[0]

undefined

References

参考资料

Advanced Usage - Custom pipelines, fine-tuning, deployment
Troubleshooting - Common issues and solutions

高级用法 - 自定义工作流、微调、部署
故障排除 - 常见问题及解决方案

Resources

资源

Documentation: https://huggingface.co/docs/diffusers
Repository: https://github.com/huggingface/diffusers
Model Hub: https://huggingface.co/models?library=diffusers
Discord: https://discord.gg/diffusers

文档：https://huggingface.co/docs/diffusers
代码仓库：https://github.com/huggingface/diffusers
模型库：https://huggingface.co/models?library=diffusers
Discord社区：https://discord.gg/diffusers