image-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • User wants to generate an AI image from a text description
  • User says "generate image", "draw", "create picture", "配图"
  • User says "生成图片", "画一张", "AI图"
  • User needs a cover image, illustration, or concept art
  • 用户希望根据文本描述生成AI图片
  • 用户说出"generate image"、"draw"、"create picture"、"配图"
  • 用户说出"生成图片"、"画一张"、"AI图"
  • 用户需要封面图、插画或概念艺术图

When NOT to Use

不适用场景

  • User wants to create audio content (use
    /podcast
    ,
    /speech
    )
  • User wants to create a video (use
    /explainer
    )
  • User wants to edit an existing image (not supported)
  • User wants to extract content from a URL (use
    /content-parser
    )
  • 用户想要创建音频内容(请使用
    /podcast
    /speech
  • 用户想要创建视频(请使用
    /explainer
  • 用户想要编辑已有图片(暂不支持)
  • 用户想要从URL提取内容(请使用
    /content-parser

Purpose

功能目标

Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.
通过Labnana API生成AI图片。支持基于文本提示词生成,可附带参考图片,支持多种分辨率和宽高比。生成的图片将保存为本地文件。

Hard Constraints

硬性约束

  • No shell scripts. Construct curl commands from the API reference files listed in Resources
  • Always read
    shared/authentication.md
    for API key and headers
  • Follow
    shared/common-patterns.md
    for error handling
  • Image generation uses a different base URL:
    https://api.labnana.com/openapi/v1
  • Always read config following
    shared/config-pattern.md
    before any interaction
  • Output saved to
    .listenhub/image-gen/YYYY-MM-DD-{jobId}/
    — never
    ~/Downloads/
<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call the image generation API until the user has explicitly confirmed. </HARD-GATE>
  • 不得使用shell脚本。需根据资源中列出的API参考文件构建curl命令
  • 务必阅读
    shared/authentication.md
    获取API密钥和请求头信息
  • 遵循
    shared/common-patterns.md
    中的错误处理规范
  • 图片生成使用独立的基础URL
    https://api.labnana.com/openapi/v1
  • 在进行任何交互前,务必遵循
    shared/config-pattern.md
    读取配置
  • 输出文件保存至
    .listenhub/image-gen/YYYY-MM-DD-{jobId}/
    ——绝对不能保存到
    ~/Downloads/
<HARD-GATE> 在每个选择题步骤中必须使用AskUserQuestion工具——不得直接以纯文本形式打印选项。一次只提出一个问题,等待用户回复后再进行下一步。收集完所有参数后,汇总用户的选择并请求确认。在用户明确确认之前,不得调用图片生成API。 </HARD-GATE>

Step -1: API Key Check

步骤-1:API密钥检查

Follow
shared/config-pattern.md
§ API Key Check. If the key is missing, stop immediately.
遵循
shared/config-pattern.md
中的「API密钥检查」章节。如果密钥缺失,立即终止操作。

Step 0: Config Setup

步骤0:配置设置

Follow
shared/config-pattern.md
Step 0.
If file doesn't exist — ask location, then create immediately:
bash
mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
遵循
shared/config-pattern.md
中的步骤0。
若配置文件不存在——询问保存位置,立即创建:
bash
mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"

(or $HOME/.listenhub/image-gen/config.json for global)

(或全局配置保存至 $HOME/.listenhub/image-gen/config.json)

Then run **Setup Flow** below.

**If file exists** — read config, display summary, and confirm:
当前配置 (image-gen): 输出方式:{inline / download / both}
Ask: "使用已保存的配置?" → **确认,直接继续** / **重新配置**
随后执行下方的**设置流程**。

**若配置文件已存在**——读取配置,显示汇总信息并请求确认:
当前配置 (image-gen): 输出方式:{inline / download / both}
询问:"使用已保存的配置?" → **确认,直接继续** / **重新配置**

Setup Flow (first run or reconfigure)

设置流程(首次运行或重新配置)

  1. outputMode: Follow
    shared/output-mode.md
    § Setup Flow Question.
Save immediately:
bash
undefined
  1. 输出模式(outputMode):遵循
    shared/output-mode.md
    中的「设置流程问题」章节。
立即保存配置:
bash
undefined

Follow shared/output-mode.md § Save to Config

遵循 shared/output-mode.md 中的「保存至配置文件」章节

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")
undefined
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")
undefined

Interaction Flow

交互流程

Step 1: Image Description

步骤1:图片描述

Free text input. Ask the user:
Describe the image you want to generate.
If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.
支持自由文本输入。向用户提问:
请描述你想要生成的图片内容。
如果提示词非常简短(少于10个单词)且用户未要求完全按原词生成,可主动提出帮助优化提示词。否则直接使用用户提供的提示词。

Step 2: Model

步骤2:模型选择

Ask:
Question: "Which model?"
Options:
  - "pro (recommended)" — gemini-3-pro-image-preview, higher quality
  - "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)
询问用户:
问题:"选择哪种模型?"
选项:
  - "pro(推荐)" —— gemini-3-pro-image-preview,画质更高
  - "flash" —— gemini-3.1-flash-image-preview,生成速度更快、成本更低,支持极端宽高比(1:4、4:1、1:8、8:1)

Step 3: Resolution and Aspect Ratio

步骤3:分辨率与宽高比

Ask both together (independent parameters):
Question: "What resolution?"
Options:
  - "1K" — Standard quality
  - "2K (recommended)" — High quality, good balance
  - "4K" — Ultra high quality, slower generation
Question: "What aspect ratio?"
Options (all models):
  - "16:9" — Landscape, widescreen
  - "1:1" — Square
  - "9:16" — Portrait, phone screen
  - "Other" — 2:3, 3:2, 3:4, 4:3, 21:9
If flash model was selected, also offer:
1:4
(narrow portrait),
4:1
(wide landscape),
1:8
(extreme portrait),
8:1
(panoramic)
同时询问这两个独立参数:
问题:"选择哪种分辨率?"
选项:
  - "1K" —— 标准画质
  - "2K(推荐)" —— 高画质,平衡度佳
  - "4K" —— 超高清画质,生成速度较慢
问题:"选择哪种宽高比?"
所有模型均支持的选项:
  - "16:9" —— 横屏,宽屏格式
  - "1:1" —— 正方形
  - "9:16" —— 竖屏,手机屏幕格式
  - "其他" —— 2:3、3:2、3:4、4:3、21:9
如果用户选择了flash模型,额外提供以下选项:
1:4
(窄竖屏)、
4:1
(宽横屏)、
1:8
(极端竖屏)、
8:1
(全景)

Step 4: Reference Images (optional)

步骤4:参考图片(可选)

Question: "Any reference images for style guidance?"
Options:
  - "Yes, I have URL(s)" — Provide reference image URLs
  - "No references" — Generate from prompt only
If yes, collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:
json
{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }
Suffix mapping:
.jpg
/
.jpeg
image/jpeg
,
.png
image/png
,
.webp
image/webp
,
.gif
image/gif
问题:"是否需要提供参考图片以指定风格?"
选项:
  - "是,我有图片URL" —— 提供参考图片的URL
  - "不需要参考图片" —— 仅根据提示词生成
如果用户选择是,收集URL(逗号分隔,最多14个)。根据URL后缀推断mimeType,并构建如下结构:
json
{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }
后缀映射规则:
.jpg
/
.jpeg
image/jpeg
.png
image/png
.webp
image/webp
.gif
image/gif

Step 5: Confirm & Generate

步骤5:确认并生成

Summarize all choices:
Ready to generate image:

  Prompt: {prompt text}
  Model: {pro / flash}
  Resolution: {1K / 2K / 4K}
  Aspect ratio: {ratio}
  References: {yes (N URLs) / no}

  Proceed?
Wait for explicit confirmation before calling the API.
汇总所有用户选择:
即将生成图片,参数如下:

  提示词:{prompt text}
  模型:{pro / flash}
  分辨率:{1K / 2K / 4K}
  宽高比:{ratio}
  参考图片:{是(N个URL)/ 否}

  是否继续?
等待用户明确确认后,再调用API。

Workflow

执行流程

  1. Build request: Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages
  2. Submit:
    POST https://api.labnana.com/openapi/v1/images/generation
    with timeout of 600s
  3. Extract image: Parse base64 data from response
  4. Decode and present result
Read
OUTPUT_MODE
from config. Follow
shared/output-mode.md
for behavior.
inline
or
both
: Decode base64 to a temp file, then use the Read tool.
bash
JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg
Then use the Read tool on
/tmp/image-gen-{jobId}.jpg
. The image displays inline in the conversation.
Present:
图片已生成!
download
or
both
: Save to the artifact directory.
bash
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Present:
图片已生成!

已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/:
  {jobId}.jpg
Base64 decoding (cross-platform):
bash
undefined
  1. 构建请求:拼接包含provider、model、prompt、imageConfig及可选referenceImages的JSON数据
  2. 提交请求:发送
    POST https://api.labnana.com/openapi/v1/images/generation
    请求,超时时间设置为600秒
  3. 提取图片:从响应中解析base64格式的图片数据
  4. 解码并展示结果
从配置中读取
OUTPUT_MODE
参数,遵循
shared/output-mode.md
中的规则处理输出。
若为
inline
both
:将base64数据解码至临时文件,然后使用Read工具展示。
bash
JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg
随后对
/tmp/image-gen-{jobId}.jpg
调用Read工具,图片将在对话中内联显示。
展示提示:
图片已生成!
若为
download
both
:将图片保存至工件目录。
bash
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
展示提示:
图片已生成!

已保存至 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/:
  {jobId}.jpg
跨平台Base64解码命令
bash
undefined

Linux

Linux系统

echo "$BASE64_DATA" | base64 -d > output.jpg
echo "$BASE64_DATA" | base64 -d > output.jpg

macOS

macOS系统

echo "$BASE64_DATA" | base64 -D > output.jpg
echo "$BASE64_DATA" | base64 -D > output.jpg

or

或使用

echo "$BASE64_DATA" | base64 --decode > output.jpg

**Retry logic**: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.
echo "$BASE64_DATA" | base64 --decode > output.jpg

**重试逻辑**:若遇到429(请求频率超限)错误,等待15秒后重试,最多重试3次。

Prompt Handling

提示词处理

Default: Pass the user's prompt directly without modification.
When to offer optimization:
  • Prompt is very short (a few words) AND user hasn't requested verbatim
  • Ask: "Would you like help enriching the prompt with style/lighting/composition details?"
When to never modify:
  • Long, detailed, or structured prompts — treat the user as experienced
  • User says "use this prompt exactly"
Optimization techniques (if user agrees):
  • Style: "cyberpunk" → add "neon lights, futuristic, dystopian"
  • Scene: time of day, lighting, weather
  • Quality: "highly detailed", "8K quality", "cinematic composition"
  • Always use English keywords (models trained on English)
  • Show optimized prompt before submitting
默认规则:直接使用用户提供的提示词,不做任何修改。
可主动提供优化的场景
  • 提示词非常简短(仅几个单词)且用户未要求完全按原词生成
  • 询问用户:"是否需要帮助为提示词添加风格、光线、构图等细节以优化生成效果?"
绝对不能修改的场景
  • 用户提供的提示词较长、细节丰富或结构化——默认用户为专业用户
  • 用户明确要求"完全使用该提示词"
优化技巧(若用户同意优化):
  • 风格:例如"赛博朋克" → 添加"霓虹灯光、未来感、反乌托邦"
  • 场景:添加时间、光线、天气等元素
  • 画质:添加"高度细节化"、"8K画质"、"电影级构图"
  • 务必使用英文关键词(模型基于英文语料训练)
  • 在提交前展示优化后的提示词,供用户确认

API Reference

API参考

  • Image generation:
    shared/api-image.md
  • Error handling:
    shared/common-patterns.md
    § Error Handling
  • 图片生成:
    shared/api-image.md
  • 错误处理:
    shared/common-patterns.md
    中的「错误处理」章节

Composability

可组合性

  • Invokes: nothing (direct API call)
  • Invoked by: platform skills for cover images (Phase 2)
  • 调用其他服务:无(直接调用API)
  • 被其他服务调用:平台技能可调用本功能生成封面图(第二阶段)

Example

示例

User: "Generate an image: cyberpunk city at night"
Agent workflow:
  1. Prompt is short → offer enrichment → user declines
  2. Ask model → "pro"
  3. Ask resolution → "2K"
  4. Ask ratio → "16:9"
  5. No references
bash
RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "cyberpunk city at night",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Decode the base64 data per
outputMode
(see
shared/output-mode.md
).
用户:"Generate an image: cyberpunk city at night"
Agent工作流
  1. 提示词较短 → 主动提供优化建议 → 用户拒绝
  2. 询问模型选择 → 用户选择"pro"
  3. 询问分辨率 → 用户选择"2K"
  4. 询问宽高比 → 用户选择"16:9"
  5. 用户表示不需要参考图片
bash
RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "cyberpunk city at night",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
根据
outputMode
参数解码base64数据(详见
shared/output-mode.md
)。