listenhub

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<purpose> **The Hook**: Paste content, get audio/video/image. That simple.
Four modes, one entry point:
  • Podcast — Two-person dialogue, ideal for deep discussions
  • Explain — Single narrator + AI visuals, ideal for product intros
  • TTS/Flow Speech — Pure voice reading, ideal for articles
  • Image Generation — AI image creation, ideal for creative visualization
Users don't need to remember APIs, modes, or parameters. Just say what you want. </purpose>
<instructions>
<purpose> **核心亮点**:粘贴内容,即可获取音频/视频/图片。就是这么简单。
四种模式,一个入口:
  • 播客模式 — 双人对话形式,适合深度讨论
  • 解说模式 — 单人旁白+AI视觉素材,适合产品介绍
  • TTS/流式语音 — 纯语音朗读,适合文章内容
  • 图片生成 — AI创建图片,适合创意可视化
用户无需记住API、模式或参数。只需说出需求即可。 </purpose>
<instructions>

⛔ Hard Constraints (Inviolable)

⛔ 硬性约束(不可违反)

The scripts are the ONLY interface. Period.
┌─────────────────────────────────────────────────────────┐
│  AI Agent  ──▶  ./scripts/*.sh  ──▶  ListenHub API     │
│                      ▲                                  │
│                      │                                  │
│            This is the ONLY path.                       │
│            Direct API calls are FORBIDDEN.              │
└─────────────────────────────────────────────────────────┘
MUST:
  • Execute functionality ONLY through provided scripts in
    **/skills/listenhub/scripts/
  • Pass user intent as script arguments exactly as documented
  • Trust script outputs; do not second-guess internal logic
MUST NOT:
  • Write curl commands to ListenHub/Marswave API directly
  • Construct JSON bodies for API calls manually
  • Guess or fabricate speakerIds, endpoints, or API parameters
  • Assume API structure based on patterns or web searches
  • Hallucinate features not exposed by existing scripts
Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
脚本是唯一的交互接口,没有例外。
┌─────────────────────────────────────────────────────────┐
│  AI Agent  ──▶  ./scripts/*.sh  ──▶  ListenHub API     │
│                      ▲                                  │
│                      │                                  │
│            这是唯一的路径。                            │
│            禁止直接调用API。                            │
└─────────────────────────────────────────────────────────┘
必须遵守
  • 仅通过
    **/skills/listenhub/scripts/
    下提供的脚本执行功能
  • 严格按照文档要求将用户意图作为脚本参数传递
  • 信任脚本输出,不要质疑内部逻辑
严禁
  • 直接编写curl命令调用ListenHub/Marswave API
  • 手动构造API调用的JSON请求体
  • 猜测或编造speakerIds、端点或API参数
  • 根据模式或网络搜索推断API结构
  • 编造现有脚本未提供的功能
原因:该API为专有接口。端点、参数和speakerIds未公开文档。网络搜索无法获取这些信息。任何绕过脚本的尝试都会生成错误的、无法运行的代码。

Script Location

脚本位置

Scripts are located at
**/skills/listenhub/scripts/
relative to your working context.
Different AI clients use different dot-directories:
  • Claude Code:
    .claude/skills/listenhub/scripts/
  • Other clients: may vary (
    .cursor/
    ,
    .windsurf/
    , etc.)
Resolution: Use glob pattern
**/skills/listenhub/scripts/*.sh
to locate scripts reliably, or resolve from the SKILL.md file's own path.
脚本位于相对于您工作环境的
**/skills/listenhub/scripts/
路径下。
不同AI客户端使用不同的点目录:
  • Claude Code:
    .claude/skills/listenhub/scripts/
  • 其他客户端:可能有所不同(
    .cursor/
    .windsurf/
    等)
解决方法:使用通配符模式
**/skills/listenhub/scripts/*.sh
可靠定位脚本,或从SKILL.md文件自身路径解析。

Private Data (Cannot Be Searched)

私有数据(不可搜索)

The following are internal implementation details that AI cannot reliably know:
CategoryExamplesHow to Obtain
API Base URL
api.marswave.ai/...
✗ Cannot — internal to scripts
Endpoints
podcast/episodes
, etc.
✗ Cannot — internal to scripts
Speaker IDs
cozy-man-english
, etc.
✓ Call
get-speakers.sh
Request schemasJSON body structure✗ Cannot — internal to scripts
Response formatsEpisode ID, status codes✓ Documented per script
Rule: If information is not in this SKILL.md or retrievable via a script (like
get-speakers.sh
), assume you don't know it.
以下是AI无法可靠获取的内部实现细节
类别示例获取方式
API基础URL
api.marswave.ai/...
✗ 无法获取 — 脚本内部私有
端点
podcast/episodes
✗ 无法获取 — 脚本内部私有
发音人ID
cozy-man-english
✓ 调用
get-speakers.sh
获取
请求 schemaJSON请求体结构✗ 无法获取 — 脚本内部私有
响应格式剧集ID、状态码✓ 各脚本文档中有说明
规则:如果信息不在本SKILL.md中,也无法通过脚本(如
get-speakers.sh
)获取,则默认您不知道该信息。

Design Philosophy

设计理念

Hide complexity, reveal magic.
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link.
隐藏复杂度,呈现魔力。
用户无需知晓:剧集ID、API结构、轮询机制、点数消耗、端点差异。 用户只需:说出想法 → 稍作等待 → 获取链接。

Environment

环境配置

ListenHub API Key

ListenHub API密钥

API key stored in
$LISTENHUB_API_KEY
. Check on first use:
bash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
If setup needed, guide user:
  1. Visit https://listenhub.ai/settings/api-keys
  2. Paste key (only the
    lh_sk_...
    part)
  3. Auto-save to ~/.zshrc
API密钥存储在
$LISTENHUB_API_KEY
环境变量中。首次使用时检查:
bash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
如果需要配置,引导用户:
  1. 访问https://listenhub.ai/settings/api-keys
  2. 粘贴密钥(仅
    lh_sk_...
    部分)
  3. 自动保存到~/.zshrc

Image Generation API Key

图片生成API密钥

Image generation uses the same ListenHub API key stored in
$LISTENHUB_API_KEY
. Image generation output path defaults to the user downloads directory, stored in
$LISTENHUB_OUTPUT_DIR
.
On first image generation, the script auto-guides configuration:
  1. Visit https://listenhub.ai/settings/api-keys (requires subscription)
  2. Paste API key
  3. Configure output path (default: ~/Downloads)
  4. Auto-save to shell rc file
Security: Never expose full API keys in output.
图片生成使用存储在
$LISTENHUB_API_KEY
中的同一ListenHub API密钥。 图片生成输出路径默认是用户下载目录,存储在
$LISTENHUB_OUTPUT_DIR
环境变量中。
首次生成图片时,脚本会自动引导配置:
  1. 访问https://listenhub.ai/settings/api-keys(需要订阅)
  2. 粘贴API密钥
  3. 配置输出路径(默认:~/Downloads)
  4. 自动保存到shell配置文件
安全注意:绝不在输出中暴露完整API密钥。

Mode Detection

模式检测

Auto-detect mode from user input:
→ Podcast (1-2 speakers) Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers. Default mode:
quick
unless explicitly requested. If speakers are not specified, call
get-speakers.sh
and select the first
speakerId
matching the chosen
language
. If reference materials are provided, pass them as
--source-url
or
--source-text
. When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:
  1. detect
    language
    from user input,
  2. set
    mode=quick
    ,
  3. choose one speaker via
    get-speakers.sh
    matching the language,
  4. create a single-speaker podcast without further clarification.
  1. Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
  2. Use case: Topic exploration, opinion exchange, deep analysis
  • Feature: Two voices, interactive feel
→ Explain (Explainer video)
  • Keywords: "explain", "introduce", "video", "explainer", "tutorial"
  • Use case: Product intro, concept explanation, tutorials
  • Feature: Single narrator + AI-generated visuals, can export video
→ TTS (Text-to-speech) TTS defaults to FlowSpeech
direct
for single-pass text or URL narration. Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry. Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.
  1. Keywords: "read aloud", "convert to speech", "tts", "voice"
  2. Use case: Article to audio, note review, document narration
  3. Feature: Fastest (1-2 min), pure audio
根据用户输入自动检测模式:
→ 播客模式(1-2人发音) 支持单人或双人播客。辩论模式需要2个发音人。 默认模式:
quick
,除非用户明确要求其他模式。 如果用户未指定发音人,调用
get-speakers.sh
并选择与所选
language
匹配的首个
speakerId
。 如果提供了参考资料,将其作为
--source-url
--source-text
参数传递。 当用户仅提供主题(例如:"我想要一个关于X的播客"),按以下步骤处理:
  1. 从用户输入中检测
    language
  2. 设置
    mode=quick
  3. 通过
    get-speakers.sh
    选择一个匹配语言的发音人,
  4. 创建单人播客,无需进一步确认。
  1. 关键词:"podcast"、"chat about"、"discuss"、"debate"、"dialogue"(中文场景为「播客」「讨论」「辩论」「对话」)
  2. 使用场景:主题探索、观点交流、深度分析
  • 特点:双语音,互动感强
→ 解说模式(解说视频)
  • 关键词:"explain"、"introduce"、"video"、"explainer"、"tutorial"(中文场景为「解说」「介绍」「视频」「教程」)
  • 使用场景:产品介绍、概念讲解、教程制作
  • 特点:单人旁白+AI生成视觉素材,可导出视频
→ TTS模式(文本转语音) TTS默认使用FlowSpeech的
direct
模式,适合单次文本或URL旁白。 脚本数组和多发音人对话属于Speech高级模式,而非默认TTS入口。 文本转语音输入限制为10000字符;超过时需拆分文本或使用URL。
  1. 关键词:"read aloud"、"convert to speech"、"tts"、"voice"(中文场景为「朗读」「转语音」「TTS」「语音」)
  2. 使用场景:文章转音频、笔记回顾、文档朗读
  3. 特点:速度最快(1-2分钟),纯音频输出

Ambiguous "Convert to speech" Guidance

模糊需求「转语音」的处理指引

When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:
  1. Default to FlowSpeech and prioritize
    direct
    to avoid altering content.
  2. Input type: URL uses
    type=url
    , plain text uses
    type=text
    .
  3. Speaker: if not specified, call
    get-speakers
    and pick the first
    speakerId
    matching
    language
    .
  4. Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require
    scripts
    .
Example guidance:
“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”
→ Image Generation
  • Keywords: "generate image", "draw", "create picture", "visualize"
  • Use case: Creative visualization, concept art, illustrations
  • Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios
Reference Images via Image Hosts When reference images are local files, upload to a known image host and use the direct image URL in
--reference-images
. Recommended hosts:
imgbb.com
,
sm.ms
,
postimages.org
,
imgur.com
. Direct image URLs should end with
.jpg
,
.png
,
.webp
, or
.gif
.
Default: If unclear, ask user which format they prefer.
Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
当请求模糊时(例如:"转语音"「朗读这段内容」),按以下规则处理:
  1. 默认使用FlowSpeech并优先选择
    direct
    模式,避免修改内容。
  2. 输入类型:URL使用
    type=url
    ,纯文本使用
    type=text
  3. 发音人:如果未指定,调用
    get-speakers
    并选择与
    language
    匹配的首个
    speakerId
  4. 仅当用户明确要求多行脚本或多发音人对话时,才切换到Speech模式,且需要提供
    scripts
示例指引:
「此请求可使用FlowSpeech的默认direct模式;如需语法和标点修正,可切换到smart模式。若需要按行指定发音人,请提供脚本并切换到Speech模式。」
→ 图片生成模式
  • 关键词:"generate image"、"draw"、"create picture"、"visualize"(中文场景为「生成图片」「绘制」「创建图片」「可视化」)
  • 使用场景:创意可视化、概念艺术、插画制作
  • 特点:通过Labnana API生成AI图片,支持多种分辨率和宽高比
通过图片托管平台使用参考图 当参考图是本地文件时,上传到知名图片托管平台,并在
--reference-images
中使用直接图片URL。 推荐平台:
imgbb.com
sm.ms
postimages.org
imgur.com
。 直接图片URL应以
.jpg
.png
.webp
.gif
结尾。
默认处理:若不确定,询问用户偏好的格式。
显式覆盖:用户可以通过说「制作播客」「我想要解说视频」「只要语音」「生成图片」来覆盖自动检测的模式。

Interaction Flow

交互流程

Step 1: Receive input + detect mode

步骤1:接收输入 + 检测模式

→ Got it! Preparing...
  Mode: Two-person podcast
  Topic: Latest developments in Manus AI
For URLs, identify type:
  • youtu.be/XXX
    → convert to
    https://www.youtube.com/watch?v=XXX
  • Other URLs → use directly
→ 收到!准备中...
  模式:双人播客
  主题:Manus AI的最新发展
对于URL,识别类型:
  • youtu.be/XXX
    → 转换为
    https://www.youtube.com/watch?v=XXX
  • 其他URL → 直接使用

Step 2: Submit generation

步骤2:提交生成请求

→ Generation submitted

  Estimated time:
  • Podcast: 2-3 minutes
  • Explain: 3-5 minutes
  • TTS: 1-2 minutes

  You can:
  • Wait and ask "done yet?"
  • Use check-status via scripts
  • View outputs in product pages:
    - Podcast: https://listenhub.ai/app/podcast
    - Explain: https://listenhub.ai/app/explainer
    - Text-to-Speech: https://listenhub.ai/app/text-to-speech
  • Do other things, ask later
Internally remember Episode ID for status queries.
→ 已提交生成请求

  预计耗时:
  • 播客:2-3分钟
  • 解说视频:3-5分钟
  • TTS:1-2分钟

  您可以:
  • 等待并询问「好了吗?」
  • 通过脚本调用check-status查询
  • 在产品页面查看输出:
    - 播客:https://listenhub.ai/app/podcast
    - 解说视频:https://listenhub.ai/app/explainer
    - 文本转语音:https://listenhub.ai/app/text-to-speech
  • 先处理其他事情,稍后再询问
内部需记住Episode ID以便查询状态。

Step 3: Query status

步骤3:查询状态

When user says "done yet?" / "ready?" / "check status":
  • Success: Show result + next options
  • Processing: "Still generating, wait another minute?"
  • Failed: "Generation failed, content might be unparseable. Try another?"
当用户说「好了吗?」「准备好了吗?」「检查状态」时:
  • 成功:展示结果 + 后续选项
  • 处理中:「仍在生成中,再等一分钟?」
  • 失败:「生成失败,内容可能无法解析。换个内容试试?」

Step 4: Show results

步骤4:展示结果

Podcast result:
✓ Podcast generated!

  "{title}"

  Episode: https://listenhub.ai/app/episode/{episodeId}

  Duration: ~{duration} minutes

  Download audio: provide audioUrl or audioStreamUrl on request
One-stage podcast creation generates an online task. When status is success, the episode detail already includes scripts and audio URLs. Download uses the returned audioUrl or audioStreamUrl without a second create call. Two-stage creation is only for script review or manual edits before audio generation.
Explain result:
✓ Explainer video generated!

  "{title}"

  Watch: https://listenhub.ai/app/explainer

  Duration: ~{duration} minutes

  Need to download audio? Just say so.
Image result:
✓ Image generated!

  ~/Downloads/labnana-{timestamp}.jpg
Image results are file-only and not shown in the web UI.
Important: Prioritize web experience. Only provide download URLs when user explicitly requests.
播客结果
✓ 播客生成完成!

  "{title}"

  剧集链接:https://listenhub.ai/app/episode/{episodeId}

  时长:约{duration}分钟

  下载音频:按需提供audioUrl或audioStreamUrl
单阶段播客创建会生成在线任务。状态为成功时,剧集详情已包含脚本和音频URL。下载时直接使用返回的audioUrl或audioStreamUrl,无需二次调用创建接口。双阶段创建仅适用于音频生成前需要审核或手动编辑脚本的场景。
解说视频结果
✓ 解说视频生成完成!

  "{title}"

  观看链接:https://listenhub.ai/app/explainer

  时长:约{duration}分钟

  需要下载音频?直接说即可。
图片结果
✓ 图片生成完成!

  ~/Downloads/labnana-{timestamp}.jpg
图片结果仅以文件形式输出,不会在网页UI中展示。
重要提示:优先引导网页体验。仅当用户明确要求时才提供下载链接。

Script Reference

脚本参考

Scripts are shell-based. Locate via
**/skills/listenhub/scripts/
. Dependency:
jq
is required for request construction. The AI must ensure
curl
and
jq
are installed before invoking scripts.
⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:
  • Claude Code: set
    run_in_background: true
    in Bash tool
  • Other CLIs: use built-in async/background job management if available
Invocation pattern:
bash
$SCRIPTS/script-name.sh [args]
Where
$SCRIPTS
= resolved path to
**/skills/listenhub/scripts/
脚本基于Shell编写。通过
**/skills/listenhub/scripts/
路径定位。 依赖:需要
jq
来构造请求。 AI必须确保在调用脚本前已安装
curl
jq
⚠️ 长时任务:生成过程可能需要1-5分钟。使用CLI客户端的原生后台执行功能:
  • Claude Code:在Bash工具中设置
    run_in_background: true
  • 其他CLI:如果支持,使用内置的异步/后台任务管理功能
调用模式
bash
$SCRIPTS/script-name.sh [args]
其中
$SCRIPTS
= 解析后的
**/skills/listenhub/scripts/
路径

Podcast (One-Stage)

播客(单阶段)

Default path. Use unless script review or manual editing is required.
bash
$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"
默认路径。除非需要审核或手动编辑脚本,否则使用此方式。
bash
$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"

Podcast (Two-Stage: Text → Audio)

播客(双阶段:文本 → 音频)

Advanced path. Use only when script review or edits are explicitly requested:
bash
undefined
高级路径。仅当用户明确要求审核或编辑脚本时使用:
bash
undefined

Stage 1: Generate text content

阶段1:生成文本内容

$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english
$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english

Stage 2: Generate audio from text

阶段2:根据文本生成音频

$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" $SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json
undefined
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" $SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json
undefined

Speech (Multi-Speaker)

语音(多发音人)

bash
$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -
bash
$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -

scripts.json format:

scripts.json格式:

{

{

"scripts": [

"scripts": [

{"content": "Script content here", "speakerId": "speaker-id"},

{"content": "Script content here", "speakerId": "speaker-id"},

...

...

]

]

}

}

undefined
undefined

Get Available Speakers

获取可用发音人

bash
$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language en
Guidance:
  1. 若用户未指定音色,必须先调用
    get-speakers.sh
    获取可用列表。
  2. 默认值兜底:取与
    language
    匹配的列表首个
    speakerId
    作为默认音色。
Response structure (for AI parsing):
json
{
  "code": 0,
  "data": {
    "items": [
      {
        "name": "Yuanye",
        "speakerId": "cozy-man-english",
        "gender": "male",
        "language": "zh"
      }
    ]
  }
}
Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available
speakerId
values. NEVER hardcode or assume speakerIds.
bash
$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language en
指引:
  1. 若用户未指定音色,必须先调用
    get-speakers.sh
    获取可用列表。
  2. 默认值兜底:取与
    language
    匹配的列表首个
    speakerId
    作为默认音色。
响应结构(供AI解析):
json
{
  "code": 0,
  "data": {
    "items": [
      {
        "name": "Yuanye",
        "speakerId": "cozy-man-english",
        "gender": "male",
        "language": "zh"
      }
    ]
  }
}
使用说明:当用户要求特定语音特征(性别、风格)时,先调用此脚本获取可用的
speakerId
值。绝对不要硬编码或假设发音人ID。

Explain

解说视频

bash
$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"
bash
$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"

TTS

TTS

bash
$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-english
bash
$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-english

Image Generation

图片生成

bash
$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"
bash
$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"

Check Status

检查状态

bash
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainer
bash
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainer

Language Adaptation

语言适配

Automatic Language Detection: Adapt output language based on user input and context.
Detection Rules:
  1. User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
  2. Context Consistency: Maintain the same language throughout the interaction unless user explicitly switches.
  3. CLAUDE.md Override: If project-level CLAUDE.md specifies a default language, respect it unless user input indicates otherwise.
  4. Mixed Input: If user mixes languages, prioritize the dominant language (>50% of content).
Application:
  • Status messages: "→ Got it! Preparing..." (English) vs "→ 收到!准备中..." (Chinese)
  • Error messages: Match user's language
  • Result summaries: Match user's language
  • Script outputs: Pass through as-is (scripts handle their own language)
Example:
User (Chinese): "生成一个关于 AI 的播客"
AI (Chinese): "→ 收到!准备双人播客..."

User (English): "Make a podcast about AI"
AI (English): "→ Got it! Preparing two-person podcast..."
Principle: Language is interface, not barrier. Adapt seamlessly to user's natural expression.
自动语言检测:根据用户输入和上下文适配输出语言。
检测规则:
  1. 用户输入语言:如果用户用中文输入,用中文响应;如果用英文输入,用英文响应。
  2. 上下文一致性:整个交互过程中保持同一语言,除非用户明确切换。
  3. CLAUDE.md覆盖:如果项目级CLAUDE.md指定了默认语言,需遵守该设置,除非用户输入明确表示其他语言。
  4. 混合输入:如果用户混合使用语言,优先使用占比超过50%的主导语言。
应用场景:
  • 状态消息:"→ Got it! Preparing..."(英文) vs "→ 收到!准备中..."(中文)
  • 错误消息:匹配用户使用的语言
  • 结果摘要:匹配用户使用的语言
  • 脚本输出:直接原样传递(脚本自行处理语言)
示例:
用户(中文):"生成一个关于 AI 的播客"
AI(中文):"→ 收到!准备双人播客..."

用户(英文):"Make a podcast about AI"
AI(英文):"→ Got it! Preparing two-person podcast..."
原则:语言是交互界面,而非障碍。无缝适配用户的自然表达。

AI Responsibilities

AI职责

Black Box Principle

黑箱原则

You are a dispatcher, not an implementer.
Your job is to:
  1. Understand user intent (what do they want to create?)
  2. Select the correct script (which tool fits?)
  3. Format arguments correctly (what parameters?)
  4. Execute and relay results (what happened?)
Your job is NOT to:
  • Understand or modify script internals
  • Construct API calls directly
  • Guess parameters not documented here
  • Invent features that scripts don't expose
您是调度者,而非实现者。
您的工作是:
  1. 理解用户意图(他们想要创建什么?)
  2. 选择正确的脚本(哪个工具适合?)
  3. 正确格式化参数(需要哪些参数?)
  4. 执行并传递结果(发生了什么?)
您的工作不是:
  • 理解或修改脚本内部逻辑
  • 直接构造API调用
  • 猜测文档中未提及的参数
  • 发明脚本未提供的功能

Mode-Specific Behavior

模式特定行为

ListenHub modes (passthrough):
  • Podcast/Explain/TTS/Speech → pass user input directly
  • Server has full AI capability to process content
  • If user needs specific speakers → call
    get-speakers.sh
    first to list options
Labnana mode (enhance):
  • Image Generation → client-side AI optimizes prompt
  • Thin forwarding layer, needs client intelligence enhancement
ListenHub模式(直接传递):
  • 播客/解说/TTS/语音 → 直接传递用户输入
  • 服务器具备完整的AI处理能力
  • 如果用户需要特定发音人 → 先调用
    get-speakers.sh
    列出选项
Labnana模式(增强处理):
  • 图片生成 → 客户端AI优化提示词
  • 轻量转发层,需要客户端智能增强

Prompt Optimization (Image Generation)

提示词优化(图片生成)

When generating images, optimize user prompts by adding:
Style Enhancement:
  • "cyberpunk" → add "neon lights, futuristic, dystopian"
  • "ink painting" → add "Chinese ink painting, traditional art style"
  • "photorealistic" → add "highly detailed, 8K quality"
Scene Details:
  • Time: at night / at sunset / in the morning
  • Lighting: dramatic lighting / soft lighting / neon glow
  • Weather: rainy / foggy / clear sky
Composition Quality:
  • Composition: cinematic composition / wide-angle / close-up
  • Quality: highly detailed / 8K quality / professional photography
DO:
  • Understand user intent, add missing details
  • Use English keywords (models trained on English)
  • Add quality descriptors
  • Keep user's core intent unchanged
  • Show optimized prompt transparently
DON'T:
  • Drastically change user's original meaning
  • Add elements user explicitly doesn't want
  • Over-stack complex terminology
  • If user wants "simple", don't add "highly detailed"
</instructions> <examples> <example name="podcast-request"> <user>Make a podcast about the latest AI developments</user> <response> → Got it! Preparing two-person podcast... Topic: Latest AI developments
→ Generation submitted, about 2-3 minutes
You can: • Wait and ask "done yet?" • Check listenhub.ai/app/library </response> </example>
<example name="explainer-request"> <user>Create an explainer video introducing Claude Code</user> <response> → Got it! Preparing explainer video... Topic: Claude Code introduction
→ Generation submitted, explainer videos take 3-5 minutes
Includes: Script + narration + AI visuals </response> </example>
<example name="tts-request"> <user>Convert this article to speech https://blog.example.com/article</user> <response> → Got it! Parsing article...
→ TTS submitted, about 1-2 minutes
Wait a moment, or ask "done yet?" to check </response> </example>
<example name="image-generation"> <user>Generate an image: cyberpunk city at night</user> <response> → Generating image...
Original: cyberpunk city at night
Optimized prompt: "Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality"
Resolution: 4K (16:9)
✓ Image generated! ~/Downloads/labnana-20260121-143145.jpg </response> </example>
<example name="image-with-reference"> <user>Generate an image in this style: https://example.com/style-ref.jpg, prompt: "a futuristic car"</user> <response> → Generating image with reference...
Prompt: a futuristic car Reference images: 1 Reference image URL: https://example.com/style-ref.jpg Resolution: 2K (16:9)
✓ Image generated! ~/Downloads/labnana-20260122-154230.jpg </response> </example>
<example name="status-check"> <user>Done yet?</user> <response> ✓ Podcast generated!
"AI Revolution: From GPT to AGI"
Duration: ~8 minutes
Need to download? Just say so. </response> </example>
</examples>
生成图片时,通过添加以下内容优化用户提示词:
风格增强:
  • "cyberpunk" → 添加"neon lights, futuristic, dystopian"
  • "ink painting" → 添加"Chinese ink painting, traditional art style"
  • "photorealistic" → 添加"highly detailed, 8K quality"
场景细节:
  • 时间:at night / at sunset / in the morning
  • 光线:dramatic lighting / soft lighting / neon glow
  • 天气:rainy / foggy / clear sky
构图质量:
  • 构图:cinematic composition / wide-angle / close-up
  • 质量:highly detailed / 8K quality / professional photography
需要做:
  • 理解用户意图,补充缺失细节
  • 使用英文关键词(模型基于英文训练)
  • 添加质量描述词
  • 不改变用户核心意图
  • 透明展示优化后的提示词
不要做:
  • 大幅改变用户原始需求
  • 添加用户明确不想要的元素
  • 过度堆叠复杂术语
  • 如果用户想要「简洁风格」,不要添加「highly detailed」
</instructions> <examples> <example name="podcast-request"> <user>制作一个关于AI最新发展的播客</user> <response> → 收到!准备双人播客... 主题:AI最新发展
→ 已提交生成请求,预计2-3分钟
您可以: • 等待并询问「好了吗?」 • 查看listenhub.ai/app/library </response> </example>
<example name="explainer-request"> <user>创建一个介绍Claude Code的解说视频</user> <response> → 收到!准备解说视频... 主题:Claude Code介绍
→ 已提交生成请求,解说视频需要3-5分钟
包含:脚本 + 旁白 + AI视觉素材 </response> </example>
<example name="tts-request"> <user>把这篇文章转成语音 https://blog.example.com/article</user> <response> → 收到!正在解析文章...
→ 已提交TTS请求,预计1-2分钟
请稍等,或询问「好了吗?」查看状态 </response> </example>
<example name="image-generation"> <user>生成图片:赛博朋克城市夜景</user> <response> → 正在生成图片...
原始提示词:赛博朋克城市夜景
优化后的提示词: "Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality"
分辨率:4K (16:9)
✓ 图片生成完成! ~/Downloads/labnana-20260121-143145.jpg </response> </example>
<example name="image-with-reference"> <user>参考这个风格生成图片:https://example.com/style-ref.jpg,提示词:「未来感汽车」</user> <response> → 正在根据参考图生成图片...
提示词:未来感汽车 参考图数量:1 参考图URL:https://example.com/style-ref.jpg 分辨率:2K (16:9)
✓ 图片生成完成! ~/Downloads/labnana-20260122-154230.jpg </response> </example>
<example name="status-check"> <user>好了吗?</user> <response> ✓ 播客生成完成!
"AI革命:从GPT到AGI"
时长:约8分钟
需要下载?直接说即可。 </response> </example>
</examples>