Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象), Replicate, and xheai (中转站) providers.
Script Directory
Agent Execution:
- = this SKILL.md file's directory
- Script path =
${SKILL_DIR}/scripts/main.ts
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
| Result | Action |
|---|
| Found | Load, parse, apply settings. If is null → ask model only (Flow 2) |
| Not found | ⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
| Path | Location |
|---|
.baoyu-skills/baoyu-image-gen/EXTEND.md
| Project directory |
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md
| User home |
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models
Schema:
references/config/preferences-schema.md
Usage
bash
# ⚠️ 长prompt用 --promptfiles 或双引号包裹,避免换行解析错误
# 推荐: npx -y bun "${SKILL_DIR}/scripts/main.ts" -p "长prompt" --image out.png
# 或: npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles prompt.txt --image out.png
# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal or OpenAI edits)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# Replicate (google/nano-banana-pro)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Replicate with specific model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
# xheai (中转站 - 兼容 OpenAI 格式)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider xheai
# xheai with nano-banana-2
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider xheai --model nano-banana-2
Options
| Option | Description |
|---|
| , | Prompt text |
| Read prompt from files (concatenated) |
| Output image path (required) |
--provider google|openai|dashscope|replicate|xheai
| Force provider (default: auto-detect) |
| , | Model ID (Google: gemini-3-pro-image-preview
, gemini-3.1-flash-image-preview
; OpenAI: ; xheai: gemini-3.1-flash-image-preview
, ) |
| Aspect ratio (e.g., , , ) |
| Size (e.g., ) |
| Quality preset (default: 2k) |
| Image size for Google (default: from quality) |
| Reference images. Supported by Google multimodal (gemini-3-pro-image-preview
, , gemini-3.1-flash-image-preview
) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
| Number of images |
| JSON output |
Environment Variables
| Variable | Description |
|---|
| OpenAI API key (also used for xheai when OPENAI_BASE_URL points to xheai.cc) |
| Google API key |
| DashScope API key (阿里云) |
| Replicate API token |
| OpenAI/xheai model override |
| Google model override |
| DashScope model override (default: z-image-turbo) |
| Replicate model override (default: google/nano-banana-pro) |
| Custom OpenAI endpoint (set to https://api.xheai.cc for xheai) |
| Custom Google endpoint |
| Custom DashScope endpoint |
| Custom Replicate endpoint |
| Set to to enable debug output for env loading and provider detection |
Load Priority: CLI args > EXTEND.md > env vars >
>
Cross-Platform Paths:
automatically resolves to:
- Windows:
C:\Users\<username>\.baoyu-skills\.env
- macOS:
/Users/<username>/.baoyu-skills/.env
- Linux:
/home/<username>/.baoyu-skills/.env
Debug Mode: Set
to see which .env files are loaded and which API keys are detected:
bash
DEBUG_ENV=1 npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
Model Resolution
Model priority (highest → lowest), applies to all providers:
- CLI flag:
- EXTEND.md:
- Env var: (e.g., )
- Built-in default
EXTEND.md overrides env vars. If both EXTEND.md
default_model.google: "gemini-3-pro-image-preview"
and env var
GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview
exist, EXTEND.md wins.
Agent MUST display model info before each generation:
- Show:
Using [provider] / [model]
- Show switch hint:
Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
Replicate Models
Supported model formats:
- (recommended for official models), e.g.
- (community models by version), e.g.
stability-ai/sdxl:<version>
Examples:
bash
# Use Replicate default model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
- provided + no → auto-select Google first, then OpenAI, then Replicate
- specified → use it (if , must be , , or )
- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | Use Case |
|---|
| 1K | 1024px | Quick previews |
| (default) | 2K | 2048px | Covers, illustrations, infographics |
Google imageSize: Can be overridden with
Aspect Ratios
- Google multimodal: uses
- Google Imagen: uses parameter
- OpenAI: maps to closest supported size
Generation Mode
Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.
Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.
| Mode | When to Use |
|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel | User explicitly requests, large batches (10+) |
Parallel Settings (when requested):
| Setting | Value |
|---|
| Recommended concurrency | 4 subagents |
| Max concurrency | 8 subagents |
| Use case | Large batch generation when user requests parallel |
Agent Implementation (parallel mode only):
# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal:
gemini-3-pro-image-preview
, gemini-3.1-flash-image-preview
; or OpenAI GPT Image edits)
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.