creator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • User wants a full content package for a specific platform (WeChat article, Xiaohongshu post, narration script)
  • User says "帮我写篇公众号", "小红书图文", "口播稿", "create content"
  • User provides a URL/text/topic and wants it turned into platform-ready content with images
  • 用户需要为特定平台生成完整内容包(公众号文章、小红书笔记、口播脚本)
  • 用户发送“帮我写篇公众号”“小红书图文”“口播稿”“create content”等指令
  • 用户提供链接/文本/主题,希望将其转换为带配图的平台适配内容

When NOT to Use

不适用场景

  • User wants a single image without a content workflow → use image-gen directly
  • User wants a single TTS audio → use tts directly
  • User wants to transcribe audio → use asr directly
  • User wants a podcast episode → use podcast directly
  • User wants to extract content from a URL without further processing → use content-parser directly
Creator is for multi-step content production that combines writing + media generation into a platform-ready package.
  • 用户仅需生成单张图片且无需内容工作流→直接使用image-gen
  • 用户仅需生成单条TTS音频→直接使用tts
  • 用户仅需转录音频→直接使用asr
  • 用户仅需生成播客节目→直接使用podcast
  • 用户仅需从链接提取内容且无需后续处理→直接使用content-parser
Creator 用于多步骤内容生产,将写作与媒体生成为可直接发布的平台适配内容包。

Purpose

目标

Generate platform-specific content packages by orchestrating existing skills. Input: topic, URL, text, or audio/video file. Output: a folder with article/script, images, and metadata — ready to publish.
通过编排现有技能生成平台专属内容包。输入:主题、链接、文本或音视频文件。输出:包含文章/脚本、图片和元数据的文件夹——可直接发布。

Hard Constraints

硬性约束

  • No shell scripts. Construct curl commands from the API reference files in
    shared/
  • Always read config following
    shared/config-pattern.md
    before any interaction
  • Follow
    shared/common-patterns.md
    for polling, errors, and interaction patterns
  • Never save files to
    ~/Downloads/
    or
    .listenhub/
    — save content packages to the current working directory
  • JSON parsing: use
    jq
    only (no python3, awk)
<HARD-GATE> Language Adaptation: All UI text follows the user's input language. Chinese input → Chinese output. English input → English output. Mixed → follow dominant language. </HARD-GATE> <HARD-GATE> Use AskUserQuestion for every multiple-choice step. One question at a time. Wait for the answer. After template is selected and input is understood, show a confirmation summary and wait for explicit approval before executing the pipeline. </HARD-GATE> <HARD-GATE> API Key Check at Confirmation Gate: If the pipeline includes any remote API call (image-gen, content-parser, tts), check `LISTENHUB_API_KEY` before proceeding. If missing, run interactive setup from `shared/authentication.md`. Pure text-only pipelines (e.g., topic → narration script without TTS) can proceed without an API key. </HARD-GATE>
  • 禁止使用shell脚本。从
    shared/
    目录下的API参考文件构建curl命令
  • 在任何交互前,务必遵循
    shared/config-pattern.md
    读取配置
  • 轮询、错误处理和交互模式遵循
    shared/common-patterns.md
  • 禁止将文件保存到
    ~/Downloads/
    .listenhub/
    ——将内容包保存到当前工作目录
  • JSON解析:仅使用
    jq
    (禁止使用python3、awk)
<HARD-GATE> 语言适配:所有UI文本遵循用户输入语言。中文输入→中文输出;英文输入→英文输出;混合输入→遵循占主导的语言。 </HARD-GATE> <HARD-GATE> 每一个多选步骤都需使用AskUserQuestion。一次只提一个问题,等待用户回答。在选择模板并明确输入内容后,显示确认摘要,等待用户明确批准后再执行工作流。 </HARD-GATE> <HARD-GATE> 确认环节的API密钥检查:如果工作流包含任何远程API调用(image-gen、content-parser、tts),在执行前需检查`LISTENHUB_API_KEY`。如果缺失,运行`shared/authentication.md`中的交互式设置。纯文本工作流(例如:仅从主题生成口播脚本而不使用TTS)无需API密钥即可执行。 </HARD-GATE>

Step -1: API Key Check

步骤-1:API密钥检查

Deferred. API key is checked at the confirmation gate (Step 4) only when the pipeline requires remote API calls. See Hard Constraints above.
延迟执行。仅当工作流需要远程API调用时,才在确认环节(步骤4)检查API密钥。详见上述硬性约束。

Step 0: Config Setup

步骤0:配置设置

Follow
shared/config-pattern.md
Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
bash
mkdir -p ".listenhub/creator" ".listenhub/creator/styles"
cat > ".listenhub/creator/config.json" << 'EOF'
{"outputMode":"download","language":null,"preferences":{"wechat":{"history":[]},"xiaohongshu":{"mode":"both","history":[]},"narration":{"defaultSpeaker":null,"history":[]}}}
EOF
CONFIG_PATH=".listenhub/creator/config.json"
CONFIG=$(cat "$CONFIG_PATH")
User style preferences are stored as markdown files in
.listenhub/creator/styles/
:
  • .listenhub/creator/styles/wechat.md
  • .listenhub/creator/styles/xiaohongshu.md
  • .listenhub/creator/styles/narration.md
These files are plain markdown — one directive per line. If the file does not exist, no custom style is applied. Users can edit these files directly.
Note:
outputMode
defaults to
"download"
(not the usual
"inline"
) because creator always produces multi-file output folders that must be saved to disk.
If file exists — read config silently and proceed:
bash
CONFIG_PATH=".listenhub/creator/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/creator/config.json"
CONFIG=$(cat "$CONFIG_PATH")
遵循
shared/config-pattern.md
中的步骤0(零提问启动)。
如果配置文件不存在——自动创建默认配置并继续:
bash
mkdir -p ".listenhub/creator" ".listenhub/creator/styles"
cat > ".listenhub/creator/config.json" << 'EOF'
{"outputMode":"download","language":null,"preferences":{"wechat":{"history":[]},"xiaohongshu":{"mode":"both","history":[]},"narration":{"defaultSpeaker":null,"history":[]}}}
EOF
CONFIG_PATH=".listenhub/creator/config.json"
CONFIG=$(cat "$CONFIG_PATH")
用户风格偏好存储为
.listenhub/creator/styles/
目录下的markdown文件:
  • .listenhub/creator/styles/wechat.md
  • .listenhub/creator/styles/xiaohongshu.md
  • .listenhub/creator/styles/narration.md
这些文件为纯markdown格式——每行一个指令。如果文件不存在,则不应用自定义风格。用户可直接编辑这些文件。
注意:
outputMode
默认值为
"download"
(而非通常的
"inline"
),因为Creator始终生成需保存到磁盘的多文件输出文件夹。
如果配置文件存在——静默读取配置并继续:
bash
CONFIG_PATH=".listenhub/creator/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/creator/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

设置流程(仅当用户主动发起重新配置时)

Only when user explicitly asks to reconfigure. Display current settings:
当前配置 (creator):
  输出方式:{outputMode}
  小红书模式:{both / cards / long-text}
Ask:
  1. outputMode: Follow
    shared/output-mode.md
    § Setup Flow Question.
  2. xiaohongshu.mode: "小红书默认模式?"
    • "图文 + 长文(both)"
    • "仅图文卡片(cards)"
    • "仅长文(long-text)"
仅当用户明确要求重新配置时执行。显示当前设置:
当前配置 (creator):
  输出方式:{outputMode}
  小红书模式:{both / cards / long-text}
询问:
  1. outputMode:遵循
    shared/output-mode.md
    中的设置流程问题。
  2. xiaohongshu.mode:"小红书默认模式?"
    • "图文 + 长文(both)"
    • "仅图文卡片(cards)"
    • "仅长文(long-text)"

Interaction Flow

交互流程

Step 1: Understand Input

步骤1:理解输入

The user provides input along with their request. Classify the input:
Input TypeDetectionAuto Action
URL (web/article)
http(s)://
prefix, not an audio/video URL
Will call content-parser (requires API key)
URL (audio/video)Extension
.mp3/.mp4/.wav/.m4a/.webm
or domain is youtube.com/bilibili.com/douyin.com
Will download + call
coli asr
to transcribe
Local audio fileFile path exists, extension is audio/videoWill call
coli asr
directly
Local text fileFile path exists, extension is
.txt/.md/.json
Read file content
Raw textMulti-line or >50 chars, not a URL/pathUse directly as material
Topic/keywordsShort text (<50 chars), no URL/path patternAI writes from scratch
Style reference detection: If the user's prompt contains keywords like "参考", "风格", "照着…写", "style", "reference", the associated input (file path / URL / pasted text) should be classified as a style reference rather than content material. A single request may contain both material and a style reference — classify them separately. If only a style reference is provided with no material or topic, this is a standalone style learning request (see Step 2.5).
For URL (audio/video) inputs:
  1. Download to
    /tmp/creator-{slug}.{ext}
    using
    curl -L -o
  2. Check
    coli
    is available:
    which coli 2>/dev/null && echo yes || echo no
  3. If
    coli
    missing: inform user to install (
    npm install -g @marswave/coli
    ), ask them to paste text instead
  4. Transcribe:
    coli asr -j --model sensevoice "/tmp/creator-{slug}.{ext}"
  5. Extract text from JSON result
  6. Cleanup:
    rm "/tmp/creator-{slug}.{ext}"
For URL (web/article) inputs: Content-parser will be called during pipeline execution (after confirmation).
用户在请求时提供输入。对输入进行分类:
输入类型检测方式自动操作
网页/文章链接带有
http(s)://
前缀,且不是音视频链接
将调用content-parser(需要API密钥)
音视频链接扩展名是
.mp3/.mp4/.wav/.m4a/.webm
,或域名是youtube.com/bilibili.com/douyin.com
将下载文件并调用
coli asr
进行转录
本地音视频文件文件路径存在,且扩展名为音视频格式直接调用
coli asr
本地文本文件文件路径存在,且扩展名为
.txt/.md/.json
读取文件内容
原始文本多行内容或超过50个字符,且不是链接/路径直接作为素材使用
主题/关键词短文本(少于50字符),且无链接/路径格式AI从零开始创作
风格参考检测:如果用户的提示包含“参考”“风格”“照着…写”“style”“reference”等关键词,相关的输入(文件路径/链接/粘贴的文本)应被归类为风格参考而非内容素材。单个请求可能同时包含素材和风格参考——需分别分类。如果仅提供风格参考而无素材或主题,这属于独立风格学习请求(详见步骤2.5)。
对于音视频链接输入
  1. 使用
    curl -L -o
    下载到
    /tmp/creator-{slug}.{ext}
  2. 检查
    coli
    是否可用:
    which coli 2>/dev/null && echo yes || echo no
  3. 如果
    coli
    缺失:告知用户安装(
    npm install -g @marswave/coli
    ),并请用户直接粘贴文本内容
  4. 转录:
    coli asr -j --model sensevoice "/tmp/creator-{slug}.{ext}"
  5. 从JSON结果中提取文本
  6. 清理:
    rm "/tmp/creator-{slug}.{ext}"
对于网页/文章链接输入: 将在工作流执行期间(确认后)调用content-parser。

Step 2: Template Matching

步骤2:模板匹配

If the user specified a platform in their prompt, match directly:
  • "公众号", "wechat", "微信" → wechat
  • "小红书", "xiaohongshu", "xhs" → xiaohongshu
  • "口播", "narration", "脚本" → narration
If no platform was specified, ask via AskUserQuestion:
Question: "Which content template?" / "用哪个创作模板?" Options (adapt language to user's input):
  • "WeChat article (公众号长文)" — Long-form article with AI illustrations
  • "Xiaohongshu (小红书)" — Image cards + long text post
  • "Narration script (口播稿)" — Spoken script with optional audio
如果用户在提示中指定了平台,直接匹配:
  • "公众号"、"wechat"、"微信" → wechat模板
  • "小红书"、"xiaohongshu"、"xhs" → xiaohongshu模板
  • "口播"、"narration"、"脚本" → narration模板
如果用户未指定平台,通过AskUserQuestion询问:
问题:"选择哪种内容模板?" / "用哪个创作模板?" 选项(根据用户输入语言适配):
  • "WeChat article (公众号长文)" — 带AI插图的长篇文章
  • "Xiaohongshu (小红书)" — 图文卡片+长文笔记
  • "Narration script (口播稿)" — 可搭配音频的口播脚本

Step 3: Style Extraction (if style reference provided)

步骤3:风格提取(如果提供了风格参考)

This step runs only when the user provided a style reference in Step 1. If no style reference was detected, skip to Step 3b.
Read the reference content:
  • Local file → Read tool
  • URL → content-parser API (requires API key)
  • Pasted text → use directly
Analyze and extract style directives:
AI reads the reference content and extracts 3-5 concrete style directives. Focus on observable patterns:
  • Sentence length and paragraph structure
  • Tone and register (formal/casual, first/third person)
  • Use of rhetorical devices (questions, lists, bold, quotes)
  • Vocabulary level and domain jargon
  • Formatting habits (heading style, emoji usage, whitespace)
Present to user for confirmation:
从参考文章中提炼了以下风格特征:

  1. {directive 1}
  2. {directive 2}
  3. {directive 3}
  ...

你可以修改或删除其中的条目。确认后本次生成会应用这些规则。
Wait for user confirmation. The confirmed directives become
sessionStyle
— applied to this generation only.
After user confirms the style directives, proactively ask whether to persist:
要将这些风格规则保存吗?(保存后每次生成{platform}内容都会应用)
If yes → write to
.listenhub/creator/styles/{platform}.md
. If no → only apply to this generation.
Standalone style learning: If the user only provided a style reference without material/topic (e.g., "学习一下这篇文章的风格"), run the extraction above, then persist directly to
.listenhub/creator/styles/{platform}.md
without asking — the user's intent to save is already explicit. Confirm with a brief message: "已保存到 styles/{platform}.md". Do not proceed to content generation.
仅当步骤1中检测到风格参考时执行此步骤。如果未检测到风格参考,跳至步骤3b。
读取参考内容
  • 本地文件 → 使用读取工具
  • 链接 → 调用content-parser API(需要API密钥)
  • 粘贴的文本 → 直接使用
分析并提取风格指令
AI读取参考内容,提取3-5条具体的风格指令。重点关注可观察的模式:
  • 句子长度和段落结构
  • 语气和语体(正式/非正式,第一/第三人称)
  • 修辞手法的使用(提问、列表、加粗、引用)
  • 词汇水平和领域术语
  • 格式习惯(标题风格、表情符号使用、空格)
呈现给用户确认
从参考文章中提炼了以下风格特征:

  1. {指令1}
  2. {指令2}
  3. {指令3}
  ...

你可以修改或删除其中的条目。确认后本次生成会应用这些规则。
等待用户确认。确认后的指令将成为
sessionStyle
——仅应用于本次内容生成。
用户确认风格指令后,主动询问是否需要保存:
要将这些风格规则保存吗?(保存后每次生成{platform}内容都会应用)
如果用户同意 → 将指令写入
.listenhub/creator/styles/{platform}.md
。如果不同意 → 仅应用于本次生成。
独立风格学习:如果用户仅提供风格参考而无素材或主题(例如:“学习一下这篇文章的风格”),执行上述提取步骤,然后直接保存
.listenhub/creator/styles/{platform}.md
无需询问——用户的保存意图已明确。用简短消息确认:“已保存到 styles/{platform}.md”。不继续执行内容生成。

Step 3b: Preset Selection (if applicable)

步骤3b:预设选择(如适用)

If the selected template uses illustration or card presets and the mode requires images, the preset MUST be chosen before the confirmation gate so it can be displayed in the summary.
Skip this step entirely for:
  • Narration template (no visual presets)
  • Xiaohongshu with
    preferences.xiaohongshu.mode
    =
    "long-text"
    (no cards or images generated)
Otherwise:
  1. Read the template's preset section to get available presets and the topic-matching table.
  2. If the user already specified a preset in their prompt (e.g., "用水彩风格"): use that preset directly.
  3. If not specified: ask the user via AskUserQuestion. Output a one-line hint first: "配图风格可以随时换,先选一个开始吧". List all available presets with their Chinese labels (from frontmatter
    label
    field). Use the topic-matching table to put the most relevant option first (marked "Recommended"), but always let the user choose.
如果所选模板使用插图或卡片预设当前模式需要生成图片,必须在确认环节前选择预设,以便在摘要中显示。
完全跳过此步骤的情况
  • 口播模板(无视觉预设)
  • 小红书模板且
    preferences.xiaohongshu.mode
    =
    "long-text"
    (无需生成卡片或图片)
其他情况:
  1. 读取模板的预设部分,获取可用预设和主题匹配表。
  2. 如果用户已在提示中指定预设(例如:“用水彩风格”):直接使用该预设。
  3. 如果未指定:通过AskUserQuestion询问用户。先显示一行提示:“配图风格可以随时换,先选一个开始吧”。列出所有可用预设及其中文标签(来自前置元数据的
    label
    字段)。根据主题匹配表将最相关的选项放在最前面(标注“推荐”),但始终让用户自主选择。

Step 4: Confirmation Gate

步骤4:确认环节

Check API key if the pipeline needs remote APIs:
  • WeChat template always needs image-gen → requires API key
  • Xiaohongshu cards mode needs image-gen → requires API key
  • Xiaohongshu long-text only → no API key needed
  • Narration without TTS → no API key needed
  • Web/article URL input → needs content-parser → requires API key (audio/video URLs use local
    coli asr
    , no API key needed)
If API key required and missing: run
shared/authentication.md
interactive setup.
Show confirmation summary:
准备生成内容:

  模板:{WeChat article / Xiaohongshu / Narration}
  输入:{topic description / URL / text excerpt...}
  输出目录:{slug}-{platform}/
  需要 API 调用:{content-parser, image-gen, ...}
  风格偏好:{styles/{platform}.md 已配置 / 使用默认风格}
  配图/卡片预设:{preset label / 不适用}
  本次风格参考:{M条来自参考文章 / 无}

确认开始?
Wait for explicit "yes" / confirmation before proceeding.
检查API密钥(如果工作流需要远程API):
  • 公众号模板始终需要image-gen → 需要API密钥
  • 小红书卡片模式需要image-gen → 需要API密钥
  • 仅小红书长文模式 → 无需API密钥
  • 口播模板不使用TTS → 无需API密钥
  • 网页/文章链接输入 → 需要content-parser → 需要API密钥(音视频链接使用本地
    coli asr
    ,无需API密钥)
如果需要API密钥但缺失:运行
shared/authentication.md
中的交互式设置。
显示确认摘要
准备生成内容:

  模板:{公众号文章 / 小红书 / 口播}
  输入:{主题描述 / 链接 / 文本摘录...}
  输出目录:{slug}-{platform}/
  需要API调用:{content-parser, image-gen, ...}
  风格偏好:{已配置styles/{platform}.md / 使用默认风格}
  配图/卡片预设:{预设标签 / 不适用}
  本次风格参考:{M条来自参考文章 / 无}

确认开始?
等待用户明确回复“是”或确认后再继续。

Step 5: Execute Pipeline

步骤5:执行工作流

Read the selected template file and execute:
bash
undefined
读取所选模板文件并执行:
bash
undefined

The template file path

模板文件路径

TEMPLATE="creator/templates/$PLATFORM/template.md" STYLE="creator/templates/$PLATFORM/style.md"

**For URL inputs — extract content first:**

```bash
TEMPLATE="creator/templates/$PLATFORM/template.md" STYLE="creator/templates/$PLATFORM/style.md"

**对于链接输入——先提取内容**:

```bash

Submit content extraction

提交内容提取请求

RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-H "X-Source: skills"
-d "{"source":{"type":"url","uri":"$INPUT_URL"}}") TASK_ID=$(echo "$RESPONSE" | jq -r '.data.taskId')

Then poll in background. Run this as a **separate Bash call** with `run_in_background: true` and `timeout: 600000` (per `shared/common-patterns.md`). The polling loop itself runs up to 300s (60 polls × 5s); `timeout: 600000` is set higher at the tool level to give the Bash process headroom beyond the poll budget:

```bash
RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-H "X-Source: skills"
-d "{"source":{"type":"url","uri":"$INPUT_URL"}}") TASK_ID=$(echo "$RESPONSE" | jq -r '.data.taskId')

然后在后台轮询。按照`shared/common-patterns.md`的要求,作为**独立Bash调用**执行,设置`run_in_background: true`和`timeout: 600000`。轮询循环最多运行300秒(60次轮询×5秒);工具层面设置`timeout: 600000`是为了给Bash进程留出超出轮询预算的空间:

```bash

Run with: run_in_background: true, timeout: 600000

执行方式:run_in_background: true, timeout: 600000

TASK_ID="<id>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "X-Source: skills" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2

Extract content: `MATERIAL=$(echo "$RESULT" | jq -r '.data.data.content')`

If extraction fails: tell user "URL 解析失败,你可以直接粘贴文字内容给我" and stop.

**Then follow the platform template** — read `template.md` and execute each step. The template specifies the exact writing instructions and API calls. See `creator/templates/{platform}/template.md` for template contents.

**Style application:** When writing content, apply style directives in this priority order (higher overrides lower):
1. `sessionStyle` — directives from the current style reference (Step 3), if any
2. `.listenhub/creator/styles/{platform}.md` — persisted user style directives (if file exists)
3. `templates/{platform}/style.md` — baseline platform style

**For image generation** (called by wechat and xiaohongshu templates):

```bash
RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Source: skills" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "<generated prompt>",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "<ratio>"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
TASK_ID="<id>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "X-Source: skills" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2

提取内容:`MATERIAL=$(echo "$RESULT" | jq -r '.data.data.content')`

如果提取失败:告知用户“URL解析失败,你可以直接粘贴文字内容给我”并停止执行。

**然后遵循平台模板**——读取`template.md`并执行每一步。模板指定了具体的写作指令和API调用。详见`creator/templates/{platform}/template.md`中的模板内容。

**风格应用优先级**:生成内容时,按以下优先级应用风格指令(优先级高的覆盖优先级低的):
1. `sessionStyle`——当前风格参考的指令(步骤3)(如有)
2. `.listenhub/creator/styles/{platform}.md`——用户保存的风格指令(如果文件存在)
3. `templates/{platform}/style.md`——平台基础风格

**图片生成**(由公众号和小红书模板调用):

```bash
RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Source: skills" \
  --max-time 600 \
  -d '{
    "provider": "google",
    "model": "gemini-3-pro-image-preview",
    "prompt": "<generated prompt>",
    "imageConfig": {"imageSize": "2K", "aspectRatio": "<ratio>"}
  }')

BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')

macOS uses -D, Linux uses -d (detect platform)

macOS使用-D,Linux使用-d(自动检测平台)

if [[ "$(uname)" == "Darwin" ]]; then echo "$BASE64_DATA" | base64 -D > "{output-path}/{filename}.jpg" else echo "$BASE64_DATA" | base64 -d > "{output-path}/{filename}.jpg" fi

On 429: exponential backoff (wait 15s → 30s → 60s), retry up to 3 times. On failure after retries: skip this image, annotate in output summary.

Generate images **sequentially** (not parallel) to respect rate limits.

**For TTS** (called by narration template when user wants audio):

Use `@file` pattern per `shared/common-patterns.md` to handle special chars in script text:

```bash
if [[ "$(uname)" == "Darwin" ]]; then echo "$BASE64_DATA" | base64 -D > "{output-path}/{filename}.jpg" else echo "$BASE64_DATA" | base64 -d > "{output-path}/{filename}.jpg" fi

遇到429错误:指数退避等待(15秒→30秒→60秒),最多重试3次。如果重试后仍失败:跳过该图片,在输出摘要中注明。

**按顺序生成图片**(而非并行),以遵守速率限制。

**TTS语音合成**(当用户需要音频时由口播模板调用):

按照`shared/common-patterns.md`中的`@file`模式处理脚本文本中的特殊字符:

```bash

Write TTS request to temp file (handles quotes, newlines safely)

将TTS请求写入临时文件(安全处理引号、换行符)

cat > /tmp/creator-tts-request.json << ENDJSON {"input": $(echo "$SCRIPT_TEXT" | jq -Rs .), "voice": "$SPEAKER_ID"} ENDJSON
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-H "X-Source: skills"
-d @/tmp/creator-tts-request.json
--output "{slug}-narration/audio.mp3"
rm /tmp/creator-tts-request.json
undefined
cat > /tmp/creator-tts-request.json << ENDJSON {"input": $(echo "$SCRIPT_TEXT" | jq -Rs .), "voice": "$SPEAKER_ID"} ENDJSON
curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-H "X-Source: skills"
-d @/tmp/creator-tts-request.json
--output "{slug}-narration/audio.mp3"
rm /tmp/creator-tts-request.json
undefined

Step 6: Assemble Output

步骤6:组装输出

Create the output folder and write all files:
bash
SLUG="{topic-slug}"
OUTPUT_DIR="${SLUG}-{platform}"
创建输出文件夹并写入所有文件:
bash
SLUG="{topic-slug}"
OUTPUT_DIR="${SLUG}-{platform}"

Dedup folder name

文件夹名称去重

i=2; while [ -d "$OUTPUT_DIR" ]; do OUTPUT_DIR="${SLUG}-{platform}-${i}"; i=$((i+1)); done mkdir -p "$OUTPUT_DIR"

Write content files per template spec. Then write `meta.json`:

```json
{
  "title": "...",
  "slug": "...",
  "platform": "wechat|xiaohongshu|narration",
  "date": "YYYY-MM-DD",
  "tags": ["...", "..."],
  "summary": "..."
}
i=2; while [ -d "$OUTPUT_DIR" ]; do OUTPUT_DIR="${SLUG}-{platform}-${i}"; i=$((i+1)); done mkdir -p "$OUTPUT_DIR"

按照模板规范写入内容文件。然后写入`meta.json`:

```json
{
  "title": "...",
  "slug": "...",
  "platform": "wechat|xiaohongshu|narration",
  "date": "YYYY-MM-DD",
  "tags": ["...", "..."],
  "summary": "..."
}

Step 7: Present Result

步骤7:呈现结果

✅ 内容已生成!保存在 {OUTPUT_DIR}/

📄 {main files list}
🖼️ images/ — N 张配图(如有)
📋 meta.json — 标题、标签、摘要
(Adapt language to user's input language per Hard Constraints.)
✅ 内容已生成!保存在 {OUTPUT_DIR}/

📄 {主要文件列表}
🖼️ images/ — N 张配图(如有)
📋 meta.json — 标题、标签、摘要
(根据硬性约束,按照用户输入语言适配表述。)

Step 8: Update Preferences

步骤8:更新偏好设置

Record this generation in history:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq \
  --arg platform "$PLATFORM" \
  --arg date "$(date +%Y-%m-%d)" \
  --arg topic "$TOPIC" \
  '.preferences[$platform].history = (.preferences[$platform].history + [{"date": $date, "topic": $topic}])[-5:]')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
Keep only the last 5 history entries per platform.
Note:
cardStyle
from the spec is deferred — not implemented in V1 config. Can be added later when card style customization is needed.
将本次生成记录到历史中:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq \
  --arg platform "$PLATFORM" \
  --arg date "$(date +%Y-%m-%d)" \
  --arg topic "$TOPIC" \
  '.preferences[$platform].history = (.preferences[$platform].history + [{"date": $date, "topic": $topic}])[-5:]')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
每个平台仅保留最近5条历史记录。
注意:规范中的
cardStyle
已延迟实现——V1版本配置中未包含。后续需要卡片风格自定义时可添加。

Manual Style Tuning

手动风格调整

Adding style directives:
If the user says "记住:{style directive}" or "remember: {style directive}":
  1. Detect which platform it applies to (from context or ask)
  2. Append the directive as a new line to
    .listenhub/creator/styles/{platform}.md
    (create the file if it doesn't exist)
This also applies after Step 3 (Style Extraction): if the user says "记住这个风格" after reviewing extracted directives, write all confirmed directives to
.listenhub/creator/styles/{platform}.md
.
Resetting style:
If the user says "重置风格偏好" or "reset style":
  1. Ask which platform (or all)
  2. Delete
    .listenhub/creator/styles/{platform}.md
添加风格指令
如果用户说“记住:{风格指令}”或“remember: {风格指令}”:
  1. 检测指令适用的平台(从上下文判断或询问用户)
  2. 将指令作为新行追加到
    .listenhub/creator/styles/{platform}.md
    (如果文件不存在则创建)
此规则也适用于步骤3(风格提取)后:如果用户在查看提取的指令后说“记住这个风格”,将所有确认的指令写入
.listenhub/creator/styles/{platform}.md
重置风格
如果用户说“重置风格偏好”或“reset style”:
  1. 询问用户要重置哪个平台(或全部)
  2. 删除
    .listenhub/creator/styles/{platform}.md

API Reference

API参考

  • Authentication & headers:
    shared/authentication.md
  • Image generation:
    shared/api-image.md
  • Content extraction:
    shared/api-content-extract.md
  • TTS (text-to-speech):
    shared/api-tts.md
  • Speaker selection:
    shared/speaker-selection.md
  • Config pattern:
    shared/config-pattern.md
  • Common patterns (polling, errors):
    shared/common-patterns.md
  • Output mode:
    shared/output-mode.md
  • 认证与请求头:
    shared/authentication.md
  • 图片生成:
    shared/api-image.md
  • 内容提取:
    shared/api-content-extract.md
  • 语音合成(TTS):
    shared/api-tts.md
  • 发音人选择:
    shared/speaker-selection.md
  • 配置模式:
    shared/config-pattern.md
  • 通用模式(轮询、错误处理):
    shared/common-patterns.md
  • 输出模式:
    shared/output-mode.md

Composability

可组合性

  • Invokes: content-parser (URL extraction), image-gen (illustrations/cards), tts (narration audio), asr (audio/video transcription via
    coli
    )
  • Invoked by: standalone — user triggers directly
  • Templates:
    creator/templates/{wechat,xiaohongshu,narration}/template.md
    define per-platform pipelines
  • Style guides:
    creator/templates/{wechat,xiaohongshu,narration}/style.md
    define per-platform writing tone
  • 调用的技能:content-parser(链接提取)、image-gen(插图/卡片生成)、tts(口播音频)、asr(通过
    coli
    进行音视频转录)
  • 被调用方式:独立调用——用户直接触发
  • 模板
    creator/templates/{wechat,xiaohongshu,narration}/template.md
    定义了各平台的工作流
  • 风格指南
    creator/templates/{wechat,xiaohongshu,narration}/style.md
    定义了各平台的写作语气