Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution:
- = this SKILL.md file's directory
- Script path =
{baseDir}/scripts/main.ts
- Resolve runtime: if installed → ; if available → ; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
bash
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-imagine/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-imagine/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-imagine/EXTEND.md" && echo "user"
powershell
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-imagine/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-imagine/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-imagine/EXTEND.md") { "user" }
| Result | Action |
|---|
| Found | Load, parse, apply settings. If is null → ask model only (Flow 2) |
| Not found | ⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
| Path | Location |
|---|
.baoyu-skills/baoyu-imagine/EXTEND.md
| Project directory |
$HOME/.baoyu-skills/baoyu-imagine/EXTEND.md
| User home |
Legacy compatibility: if
.baoyu-skills/baoyu-image-gen/EXTEND.md
exists and the new path does not, runtime renames it to
. If both files exist, runtime leaves them unchanged and uses the new path.
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits
Schema:
references/config/preferences-schema.md
Usage
bash
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google, OpenAI, Azure OpenAI, OpenRouter, Replicate, MiniMax, or Seedream 4.0/4.5/5.0)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# Azure OpenAI (model means deployment name)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider azure --model gpt-image-1.5
# OpenRouter (recommended default model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter
# OpenRouter with reference images
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872
# DashScope legacy Qwen fixed-size model
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928
# MiniMax
${BUN_X} {baseDir}/scripts/main.ts --prompt "A fashion editorial portrait by a bright studio window" --image out.jpg --provider minimax
# MiniMax with subject reference (best for character/portrait consistency)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A girl stands by the library window, cinematic lighting" --image out.jpg --provider minimax --model image-01 --ref portrait.png --ar 16:9
# MiniMax with custom size (documented for image-01)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cinematic poster" --image out.jpg --provider minimax --model image-01 --size 1536x1024
# Replicate (google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Replicate with specific model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
# Batch mode with saved prompt files
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json
# Batch mode with explicit worker count
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
Batch File Format
json
{
"jobs": 4,
"tasks": [
{
"id": "hero",
"promptFiles": ["prompts/hero.md"],
"image": "out/hero.png",
"provider": "replicate",
"model": "google/nano-banana-pro",
"ar": "16:9",
"quality": "2k"
},
{
"id": "diagram",
"promptFiles": ["prompts/diagram.md"],
"image": "out/diagram.png",
"ref": ["references/original.png"]
}
]
}
Paths in
,
, and
are resolved relative to the batch file's directory.
is optional (overridden by CLI
). Top-level array format (without
wrapper) is also accepted.
Options
| Option | Description |
|---|
| , | Prompt text |
| Read prompt from files (concatenated) |
| Output image path (required in single-image mode) |
| JSON batch file for multi-image generation |
| Worker count for batch mode (default: auto, max from config, built-in default 10) |
--provider google|openai|azure|openrouter|dashscope|minimax|jimeng|seedream|replicate
| Force provider (default: auto-detect) |
| , | Model ID (Google: gemini-3-pro-image-preview
; OpenAI: ; Azure: deployment name such as or ; OpenRouter: google/gemini-3.1-flash-image-preview
; DashScope: ; MiniMax: ) |
| Aspect ratio (e.g., , , ) |
| Size (e.g., ) |
| Quality preset (default: ) |
| Image size for Google/OpenRouter (default: from quality) |
| Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate, MiniMax subject-reference, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0 |
| Number of images |
| JSON output |
Environment Variables
| Variable | Description |
|---|
| OpenAI API key |
| Azure OpenAI API key |
| OpenRouter API key |
| Google API key |
| DashScope API key (阿里云) |
| MiniMax API key |
| Replicate API token |
| Jimeng (即梦) Volcengine access key |
| Jimeng (即梦) Volcengine secret key |
| Seedream (豆包) Volcengine ARK API key |
| OpenAI model override |
| Azure default deployment name |
| Backward-compatible alias for Azure default deployment/model name |
| OpenRouter model override (default: google/gemini-3.1-flash-image-preview
) |
| Google model override |
| DashScope model override (default: ) |
| MiniMax model override (default: ) |
| Replicate model override (default: google/nano-banana-pro) |
| Jimeng model override (default: jimeng_t2i_v40) |
| Seedream model override (default: doubao-seedream-5-0-260128) |
| Custom OpenAI endpoint |
| Azure resource endpoint or deployment endpoint |
| Azure image API version (default: ) |
| Custom OpenRouter endpoint (default: https://openrouter.ai/api/v1
) |
| Optional app/site URL for OpenRouter attribution |
| Optional app name for OpenRouter attribution |
| Custom Google endpoint |
| Custom DashScope endpoint |
| Custom MiniMax endpoint (default: ) |
| Custom Replicate endpoint |
| Custom Jimeng endpoint (default: https://visual.volcengineapi.com
) |
| Jimeng region (default: ) |
| Custom Seedream endpoint (default: https://ark.cn-beijing.volces.com/api/v3
) |
BAOYU_IMAGE_GEN_MAX_WORKERS
| Override batch worker cap |
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY
| Override provider concurrency, e.g. BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
|
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS
| Override provider start gap, e.g. BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS
|
Load Priority: CLI args > EXTEND.md > env vars >
>
Model Resolution
Model priority (highest → lowest), applies to all providers:
- CLI flag:
- EXTEND.md:
- Env var: (e.g., )
- Built-in default
For Azure,
/
should be the Azure deployment name.
is the preferred env var, and
remains as a backward-compatible alias.
EXTEND.md overrides env vars. If both EXTEND.md
default_model.google: "gemini-3-pro-image-preview"
and env var
GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview
exist, EXTEND.md wins.
Agent MUST display model info before each generation:
- Show:
Using [provider] / [model]
- Show switch hint:
Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
DashScope Models
Use
--model qwen-image-2.0-pro
or set
/
when the user wants official Qwen-Image behavior.
Official DashScope model families:
- ,
qwen-image-2.0-pro-2026-03-03
, , qwen-image-2.0-2026-03-03
- Free-form in format
- Total pixels must stay between and
- Default size is approximately
- Best choice for custom ratios such as and text-heavy Chinese/English layouts
- ,
qwen-image-max-2025-12-30
, , qwen-image-plus-2026-01-09
,
- Fixed sizes only: , , , ,
- Default size is
- currently has the same capability as
- Legacy DashScope models such as , ,
- Keep using them only when the user explicitly asks for legacy behavior or compatibility
When translating CLI args into DashScope behavior:
- wins over
- For , prefer explicit ; otherwise infer from and use the official recommended resolutions below
- For
qwen-image-max/plus/image
, only use the five official fixed sizes; if the requested ratio is not covered, switch to
- is a baoyu-imagine compatibility preset, not a native DashScope API field. Mapping / onto the table below is an implementation inference, not an official API guarantee
Recommended
sizes for common aspect ratios:
DashScope official APIs also expose
,
, and
, but
does not expose them as dedicated CLI flags today.
Official references:
MiniMax Models
Use
or set
/
when the user wants MiniMax image generation.
Official MiniMax image model options currently documented in the API reference:
- (recommended default)
- Supports text-to-image and subject-reference image generation
- Supports official values: , , , , , , ,
- Supports documented custom / output sizes when using
- and must both be between and , and both must be divisible by
-
- Lower-latency variant
- Use for sizing; MiniMax documents custom / as only effective for
MiniMax subject reference notes:
- files are sent as MiniMax
- MiniMax docs currently describe as
- Official docs say supports public URLs or Base64 Data URLs; sends local refs as Data URLs
- Official docs recommend front-facing portrait references in JPG/JPEG/PNG under 10MB
Official references:
OpenRouter Models
Use full OpenRouter model IDs, e.g.:
google/gemini-3.1-flash-image-preview
(recommended, supports image output and reference-image workflows)
google/gemini-2.5-flash-image-preview
black-forest-labs/flux.2-pro
- Other OpenRouter image-capable model IDs
Notes:
- OpenRouter image generation uses , not the OpenAI endpoints
- If is used, choose a multimodal model that supports image input and image output
- maps to OpenRouter
imageGenerationOptions.size
; is converted to the nearest OpenRouter size and inferred aspect ratio when possible
Replicate Models
Supported model formats:
- (recommended for official models), e.g.
- (community models by version), e.g.
stability-ai/sdxl:<version>
Examples:
bash
# Use Replicate default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
- provided + no → auto-select Google first, then OpenAI, then Azure, then OpenRouter, then Replicate, then Seedream, then MiniMax (MiniMax subject reference is more specialized toward character/portrait consistency)
- specified → use it (if , must be , , , , , , or )
- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | OpenRouter size | Replicate resolution | Use Case |
|---|
| 1K | 1024px | 1K | 1K | Quick previews |
| (default) | 2K | 2048px | 2K | 2K | Covers, illustrations, infographics |
Google/OpenRouter imageSize: Can be overridden with
Aspect Ratios
- Google multimodal: uses
- OpenAI: maps to closest supported size
- OpenRouter: sends
imageGenerationOptions.aspect_ratio
; if only is given, aspect ratio is inferred automatically
- Replicate: passes to model; when is provided without , defaults to
- MiniMax: sends official values directly; if is given without , / are sent for
Generation Mode
Default: Sequential generation.
Batch Parallel Generation: When
contains 2 or more pending tasks, the script automatically enables parallel generation.
| Mode | When to Use |
|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel batch | Batch mode with 2+ tasks |
Execution choice:
| Situation | Preferred approach | Why |
|---|
| One image, or 1-2 simple images | Sequential | Lower coordination overhead and easier debugging |
| Multiple images already have saved prompt files | Batch () | Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput |
| Each image still needs separate reasoning, prompt writing, or style exploration | Subagents | The work is still exploratory, so each image may need independent analysis before generation |
Output comes from baoyu-article-illustrator
with + | Batch ( -> ) | That workflow already produces prompt files, so direct batch execution is the intended path |
Rule of thumb:
- Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
- Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration
Parallel behavior:
- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
- You can override worker count with
- Each image retries automatically up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.