listenhub
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<purpose>
**The Hook**: Paste content, get audio/video/image. That simple.
Four modes, one entry point:
- Podcast — Two-person dialogue, ideal for deep discussions
- Explain — Single narrator + AI visuals, ideal for product intros
- TTS/Flow Speech — Pure voice reading, ideal for articles
- Image Generation — AI image creation, ideal for creative visualization
Users don't need to remember APIs, modes, or parameters. Just say what you want.
</purpose>
<instructions><purpose>
**核心亮点**:粘贴内容,即可获取音频/视频/图片。就是这么简单。
四种模式,一个入口:
- 播客模式 — 双人对话形式,适合深度讨论
- 解说模式 — 单人旁白+AI视觉素材,适合产品介绍
- TTS/流式语音 — 纯语音朗读,适合文章内容
- 图片生成 — AI创建图片,适合创意可视化
用户无需记住API、模式或参数。只需说出需求即可。
</purpose>
<instructions>⛔ Hard Constraints (Inviolable)
⛔ 硬性约束(不可违反)
The scripts are the ONLY interface. Period.
┌─────────────────────────────────────────────────────────┐
│ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │
│ ▲ │
│ │ │
│ This is the ONLY path. │
│ Direct API calls are FORBIDDEN. │
└─────────────────────────────────────────────────────────┘MUST:
- Execute functionality ONLY through provided scripts in
**/skills/listenhub/scripts/ - Pass user intent as script arguments exactly as documented
- Trust script outputs; do not second-guess internal logic
MUST NOT:
- Write curl commands to ListenHub/Marswave API directly
- Construct JSON bodies for API calls manually
- Guess or fabricate speakerIds, endpoints, or API parameters
- Assume API structure based on patterns or web searches
- Hallucinate features not exposed by existing scripts
Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
脚本是唯一的交互接口,没有例外。
┌─────────────────────────────────────────────────────────┐
│ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │
│ ▲ │
│ │ │
│ 这是唯一的路径。 │
│ 禁止直接调用API。 │
└─────────────────────────────────────────────────────────┘必须遵守:
- 仅通过下提供的脚本执行功能
**/skills/listenhub/scripts/ - 严格按照文档要求将用户意图作为脚本参数传递
- 信任脚本输出,不要质疑内部逻辑
严禁:
- 直接编写curl命令调用ListenHub/Marswave API
- 手动构造API调用的JSON请求体
- 猜测或编造speakerIds、端点或API参数
- 根据模式或网络搜索推断API结构
- 编造现有脚本未提供的功能
原因:该API为专有接口。端点、参数和speakerIds未公开文档。网络搜索无法获取这些信息。任何绕过脚本的尝试都会生成错误的、无法运行的代码。
Script Location
脚本位置
Scripts are located at relative to your working context.
**/skills/listenhub/scripts/Different AI clients use different dot-directories:
- Claude Code:
.claude/skills/listenhub/scripts/ - Other clients: may vary (,
.cursor/, etc.).windsurf/
Resolution: Use glob pattern to locate scripts reliably, or resolve from the SKILL.md file's own path.
**/skills/listenhub/scripts/*.sh脚本位于相对于您工作环境的路径下。
**/skills/listenhub/scripts/不同AI客户端使用不同的点目录:
- Claude Code:
.claude/skills/listenhub/scripts/ - 其他客户端:可能有所不同(、
.cursor/等).windsurf/
解决方法:使用通配符模式可靠定位脚本,或从SKILL.md文件自身路径解析。
**/skills/listenhub/scripts/*.shPrivate Data (Cannot Be Searched)
私有数据(不可搜索)
The following are internal implementation details that AI cannot reliably know:
| Category | Examples | How to Obtain |
|---|---|---|
| API Base URL | | ✗ Cannot — internal to scripts |
| Endpoints | | ✗ Cannot — internal to scripts |
| Speaker IDs | | ✓ Call |
| Request schemas | JSON body structure | ✗ Cannot — internal to scripts |
| Response formats | Episode ID, status codes | ✓ Documented per script |
Rule: If information is not in this SKILL.md or retrievable via a script (like ), assume you don't know it.
get-speakers.sh以下是AI无法可靠获取的内部实现细节:
| 类别 | 示例 | 获取方式 |
|---|---|---|
| API基础URL | | ✗ 无法获取 — 脚本内部私有 |
| 端点 | | ✗ 无法获取 — 脚本内部私有 |
| 发音人ID | | ✓ 调用 |
| 请求 schema | JSON请求体结构 | ✗ 无法获取 — 脚本内部私有 |
| 响应格式 | 剧集ID、状态码 | ✓ 各脚本文档中有说明 |
规则:如果信息不在本SKILL.md中,也无法通过脚本(如)获取,则默认您不知道该信息。
get-speakers.shDesign Philosophy
设计理念
Hide complexity, reveal magic.
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences.
Users only need: Say idea → wait a moment → get the link.
隐藏复杂度,呈现魔力。
用户无需知晓:剧集ID、API结构、轮询机制、点数消耗、端点差异。
用户只需:说出想法 → 稍作等待 → 获取链接。
Environment
环境配置
ListenHub API Key
ListenHub API密钥
API key stored in . Check on first use:
$LISTENHUB_API_KEYbash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"If setup needed, guide user:
- Visit https://listenhub.ai/settings/api-keys
- Paste key (only the part)
lh_sk_... - Auto-save to ~/.zshrc
API密钥存储在环境变量中。首次使用时检查:
$LISTENHUB_API_KEYbash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"如果需要配置,引导用户:
- 访问https://listenhub.ai/settings/api-keys
- 粘贴密钥(仅部分)
lh_sk_... - 自动保存到~/.zshrc
Image Generation API Key
图片生成API密钥
Image generation uses the same ListenHub API key stored in .
Image generation output path defaults to the user downloads directory, stored in .
$LISTENHUB_API_KEY$LISTENHUB_OUTPUT_DIROn first image generation, the script auto-guides configuration:
- Visit https://listenhub.ai/settings/api-keys (requires subscription)
- Paste API key
- Configure output path (default: ~/Downloads)
- Auto-save to shell rc file
Security: Never expose full API keys in output.
图片生成使用存储在中的同一ListenHub API密钥。
图片生成输出路径默认是用户下载目录,存储在环境变量中。
$LISTENHUB_API_KEY$LISTENHUB_OUTPUT_DIR首次生成图片时,脚本会自动引导配置:
- 访问https://listenhub.ai/settings/api-keys(需要订阅)
- 粘贴API密钥
- 配置输出路径(默认:~/Downloads)
- 自动保存到shell配置文件
安全注意:绝不在输出中暴露完整API密钥。
Mode Detection
模式检测
Auto-detect mode from user input:
→ Podcast (1-2 speakers)
Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers.
Default mode: unless explicitly requested.
If speakers are not specified, call and select the first
matching the chosen .
If reference materials are provided, pass them as or .
When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:
quickget-speakers.shspeakerIdlanguage--source-url--source-text- detect from user input,
language - set ,
mode=quick - choose one speaker via matching the language,
get-speakers.sh - create a single-speaker podcast without further clarification.
- Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
- Use case: Topic exploration, opinion exchange, deep analysis
- Feature: Two voices, interactive feel
→ Explain (Explainer video)
- Keywords: "explain", "introduce", "video", "explainer", "tutorial"
- Use case: Product intro, concept explanation, tutorials
- Feature: Single narrator + AI-generated visuals, can export video
→ TTS (Text-to-speech)
TTS defaults to FlowSpeech for single-pass text or URL narration.
Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry.
Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.
direct- Keywords: "read aloud", "convert to speech", "tts", "voice"
- Use case: Article to audio, note review, document narration
- Feature: Fastest (1-2 min), pure audio
根据用户输入自动检测模式:
→ 播客模式(1-2人发音)
支持单人或双人播客。辩论模式需要2个发音人。
默认模式:,除非用户明确要求其他模式。
如果用户未指定发音人,调用并选择与所选匹配的首个。
如果提供了参考资料,将其作为或参数传递。
当用户仅提供主题(例如:"我想要一个关于X的播客"),按以下步骤处理:
quickget-speakers.shlanguagespeakerId--source-url--source-text- 从用户输入中检测,
language - 设置,
mode=quick - 通过选择一个匹配语言的发音人,
get-speakers.sh - 创建单人播客,无需进一步确认。
- 关键词:"podcast"、"chat about"、"discuss"、"debate"、"dialogue"(中文场景为「播客」「讨论」「辩论」「对话」)
- 使用场景:主题探索、观点交流、深度分析
- 特点:双语音,互动感强
→ 解说模式(解说视频)
- 关键词:"explain"、"introduce"、"video"、"explainer"、"tutorial"(中文场景为「解说」「介绍」「视频」「教程」)
- 使用场景:产品介绍、概念讲解、教程制作
- 特点:单人旁白+AI生成视觉素材,可导出视频
→ TTS模式(文本转语音)
TTS默认使用FlowSpeech的模式,适合单次文本或URL旁白。
脚本数组和多发音人对话属于Speech高级模式,而非默认TTS入口。
文本转语音输入限制为10000字符;超过时需拆分文本或使用URL。
direct- 关键词:"read aloud"、"convert to speech"、"tts"、"voice"(中文场景为「朗读」「转语音」「TTS」「语音」)
- 使用场景:文章转音频、笔记回顾、文档朗读
- 特点:速度最快(1-2分钟),纯音频输出
Ambiguous "Convert to speech" Guidance
模糊需求「转语音」的处理指引
When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:
- Default to FlowSpeech and prioritize to avoid altering content.
direct - Input type: URL uses , plain text uses
type=url.type=text - Speaker: if not specified, call and pick the first
get-speakersmatchingspeakerId.language - Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require .
scripts
Example guidance:
“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”
→ Image Generation
- Keywords: "generate image", "draw", "create picture", "visualize"
- Use case: Creative visualization, concept art, illustrations
- Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios
Reference Images via Image Hosts
When reference images are local files, upload to a known image host and use the direct image URL in .
Recommended hosts: , , , .
Direct image URLs should end with , , , or .
--reference-imagesimgbb.comsm.mspostimages.orgimgur.com.jpg.png.webp.gifDefault: If unclear, ask user which format they prefer.
Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
当请求模糊时(例如:"转语音"「朗读这段内容」),按以下规则处理:
- 默认使用FlowSpeech并优先选择模式,避免修改内容。
direct - 输入类型:URL使用,纯文本使用
type=url。type=text - 发音人:如果未指定,调用并选择与
get-speakers匹配的首个language。speakerId - 仅当用户明确要求多行脚本或多发音人对话时,才切换到Speech模式,且需要提供。
scripts
示例指引:
「此请求可使用FlowSpeech的默认direct模式;如需语法和标点修正,可切换到smart模式。若需要按行指定发音人,请提供脚本并切换到Speech模式。」
→ 图片生成模式
- 关键词:"generate image"、"draw"、"create picture"、"visualize"(中文场景为「生成图片」「绘制」「创建图片」「可视化」)
- 使用场景:创意可视化、概念艺术、插画制作
- 特点:通过Labnana API生成AI图片,支持多种分辨率和宽高比
通过图片托管平台使用参考图
当参考图是本地文件时,上传到知名图片托管平台,并在中使用直接图片URL。
推荐平台:、、、。
直接图片URL应以、、或结尾。
--reference-imagesimgbb.comsm.mspostimages.orgimgur.com.jpg.png.webp.gif默认处理:若不确定,询问用户偏好的格式。
显式覆盖:用户可以通过说「制作播客」「我想要解说视频」「只要语音」「生成图片」来覆盖自动检测的模式。
Interaction Flow
交互流程
Step 1: Receive input + detect mode
步骤1:接收输入 + 检测模式
→ Got it! Preparing...
Mode: Two-person podcast
Topic: Latest developments in Manus AIFor URLs, identify type:
- → convert to
youtu.be/XXXhttps://www.youtube.com/watch?v=XXX - Other URLs → use directly
→ 收到!准备中...
模式:双人播客
主题:Manus AI的最新发展对于URL,识别类型:
- → 转换为
youtu.be/XXXhttps://www.youtube.com/watch?v=XXX - 其他URL → 直接使用
Step 2: Submit generation
步骤2:提交生成请求
→ Generation submitted
Estimated time:
• Podcast: 2-3 minutes
• Explain: 3-5 minutes
• TTS: 1-2 minutes
You can:
• Wait and ask "done yet?"
• Use check-status via scripts
• View outputs in product pages:
- Podcast: https://listenhub.ai/app/podcast
- Explain: https://listenhub.ai/app/explainer
- Text-to-Speech: https://listenhub.ai/app/text-to-speech
• Do other things, ask laterInternally remember Episode ID for status queries.
→ 已提交生成请求
预计耗时:
• 播客:2-3分钟
• 解说视频:3-5分钟
• TTS:1-2分钟
您可以:
• 等待并询问「好了吗?」
• 通过脚本调用check-status查询
• 在产品页面查看输出:
- 播客:https://listenhub.ai/app/podcast
- 解说视频:https://listenhub.ai/app/explainer
- 文本转语音:https://listenhub.ai/app/text-to-speech
• 先处理其他事情,稍后再询问内部需记住Episode ID以便查询状态。
Step 3: Query status
步骤3:查询状态
When user says "done yet?" / "ready?" / "check status":
- Success: Show result + next options
- Processing: "Still generating, wait another minute?"
- Failed: "Generation failed, content might be unparseable. Try another?"
当用户说「好了吗?」「准备好了吗?」「检查状态」时:
- 成功:展示结果 + 后续选项
- 处理中:「仍在生成中,再等一分钟?」
- 失败:「生成失败,内容可能无法解析。换个内容试试?」
Step 4: Show results
步骤4:展示结果
Podcast result:
✓ Podcast generated!
"{title}"
Episode: https://listenhub.ai/app/episode/{episodeId}
Duration: ~{duration} minutes
Download audio: provide audioUrl or audioStreamUrl on requestOne-stage podcast creation generates an online task. When status is success,
the episode detail already includes scripts and audio URLs. Download uses the
returned audioUrl or audioStreamUrl without a second create call. Two-stage
creation is only for script review or manual edits before audio generation.
Explain result:
✓ Explainer video generated!
"{title}"
Watch: https://listenhub.ai/app/explainer
Duration: ~{duration} minutes
Need to download audio? Just say so.Image result:
✓ Image generated!
~/Downloads/labnana-{timestamp}.jpgImage results are file-only and not shown in the web UI.
Important: Prioritize web experience. Only provide download URLs when user explicitly requests.
播客结果:
✓ 播客生成完成!
"{title}"
剧集链接:https://listenhub.ai/app/episode/{episodeId}
时长:约{duration}分钟
下载音频:按需提供audioUrl或audioStreamUrl单阶段播客创建会生成在线任务。状态为成功时,剧集详情已包含脚本和音频URL。下载时直接使用返回的audioUrl或audioStreamUrl,无需二次调用创建接口。双阶段创建仅适用于音频生成前需要审核或手动编辑脚本的场景。
解说视频结果:
✓ 解说视频生成完成!
"{title}"
观看链接:https://listenhub.ai/app/explainer
时长:约{duration}分钟
需要下载音频?直接说即可。图片结果:
✓ 图片生成完成!
~/Downloads/labnana-{timestamp}.jpg图片结果仅以文件形式输出,不会在网页UI中展示。
重要提示:优先引导网页体验。仅当用户明确要求时才提供下载链接。
Script Reference
脚本参考
Scripts are shell-based. Locate via .
Dependency: is required for request construction.
The AI must ensure and are installed before invoking scripts.
**/skills/listenhub/scripts/jqcurljq⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:
- Claude Code: set in Bash tool
run_in_background: true - Other CLIs: use built-in async/background job management if available
Invocation pattern:
bash
$SCRIPTS/script-name.sh [args]Where = resolved path to
$SCRIPTS**/skills/listenhub/scripts/脚本基于Shell编写。通过路径定位。
依赖:需要来构造请求。
AI必须确保在调用脚本前已安装和。
**/skills/listenhub/scripts/jqcurljq⚠️ 长时任务:生成过程可能需要1-5分钟。使用CLI客户端的原生后台执行功能:
- Claude Code:在Bash工具中设置
run_in_background: true - 其他CLI:如果支持,使用内置的异步/后台任务管理功能
调用模式:
bash
$SCRIPTS/script-name.sh [args]其中 = 解析后的路径
$SCRIPTS**/skills/listenhub/scripts/Podcast (One-Stage)
播客(单阶段)
Default path. Use unless script review or manual editing is required.
bash
$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"默认路径。除非需要审核或手动编辑脚本,否则使用此方式。
bash
$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"Podcast (Two-Stage: Text → Audio)
播客(双阶段:文本 → 音频)
Advanced path. Use only when script review or edits are explicitly requested:
bash
undefined高级路径。仅当用户明确要求审核或编辑脚本时使用:
bash
undefinedStage 1: Generate text content
阶段1:生成文本内容
$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english
$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english
Stage 2: Generate audio from text
阶段2:根据文本生成音频
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>"
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json
undefined$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>"
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json
undefinedSpeech (Multi-Speaker)
语音(多发音人)
bash
$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -bash
$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -scripts.json format:
scripts.json格式:
{
{
"scripts": [
"scripts": [
{"content": "Script content here", "speakerId": "speaker-id"},
{"content": "Script content here", "speakerId": "speaker-id"},
...
...
]
]
}
}
undefinedundefinedGet Available Speakers
获取可用发音人
bash
$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language enGuidance:
- 若用户未指定音色,必须先调用 获取可用列表。
get-speakers.sh - 默认值兜底:取与 匹配的列表首个
language作为默认音色。speakerId
Response structure (for AI parsing):
json
{
"code": 0,
"data": {
"items": [
{
"name": "Yuanye",
"speakerId": "cozy-man-english",
"gender": "male",
"language": "zh"
}
]
}
}Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available values. NEVER hardcode or assume speakerIds.
speakerIdbash
$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language en指引:
- 若用户未指定音色,必须先调用获取可用列表。
get-speakers.sh - 默认值兜底:取与匹配的列表首个
language作为默认音色。speakerId
响应结构(供AI解析):
json
{
"code": 0,
"data": {
"items": [
{
"name": "Yuanye",
"speakerId": "cozy-man-english",
"gender": "male",
"language": "zh"
}
]
}
}使用说明:当用户要求特定语音特征(性别、风格)时,先调用此脚本获取可用的值。绝对不要硬编码或假设发音人ID。
speakerIdExplain
解说视频
bash
$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"bash
$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"TTS
TTS
bash
$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-englishbash
$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-englishImage Generation
图片生成
bash
$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"bash
$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"Check Status
检查状态
bash
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainerbash
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainerLanguage Adaptation
语言适配
Automatic Language Detection: Adapt output language based on user input and context.
Detection Rules:
- User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
- Context Consistency: Maintain the same language throughout the interaction unless user explicitly switches.
- CLAUDE.md Override: If project-level CLAUDE.md specifies a default language, respect it unless user input indicates otherwise.
- Mixed Input: If user mixes languages, prioritize the dominant language (>50% of content).
Application:
- Status messages: "→ Got it! Preparing..." (English) vs "→ 收到!准备中..." (Chinese)
- Error messages: Match user's language
- Result summaries: Match user's language
- Script outputs: Pass through as-is (scripts handle their own language)
Example:
User (Chinese): "生成一个关于 AI 的播客"
AI (Chinese): "→ 收到!准备双人播客..."
User (English): "Make a podcast about AI"
AI (English): "→ Got it! Preparing two-person podcast..."Principle: Language is interface, not barrier. Adapt seamlessly to user's natural expression.
自动语言检测:根据用户输入和上下文适配输出语言。
检测规则:
- 用户输入语言:如果用户用中文输入,用中文响应;如果用英文输入,用英文响应。
- 上下文一致性:整个交互过程中保持同一语言,除非用户明确切换。
- CLAUDE.md覆盖:如果项目级CLAUDE.md指定了默认语言,需遵守该设置,除非用户输入明确表示其他语言。
- 混合输入:如果用户混合使用语言,优先使用占比超过50%的主导语言。
应用场景:
- 状态消息:"→ Got it! Preparing..."(英文) vs "→ 收到!准备中..."(中文)
- 错误消息:匹配用户使用的语言
- 结果摘要:匹配用户使用的语言
- 脚本输出:直接原样传递(脚本自行处理语言)
示例:
用户(中文):"生成一个关于 AI 的播客"
AI(中文):"→ 收到!准备双人播客..."
用户(英文):"Make a podcast about AI"
AI(英文):"→ Got it! Preparing two-person podcast..."原则:语言是交互界面,而非障碍。无缝适配用户的自然表达。
AI Responsibilities
AI职责
Black Box Principle
黑箱原则
You are a dispatcher, not an implementer.
Your job is to:
- Understand user intent (what do they want to create?)
- Select the correct script (which tool fits?)
- Format arguments correctly (what parameters?)
- Execute and relay results (what happened?)
Your job is NOT to:
- Understand or modify script internals
- Construct API calls directly
- Guess parameters not documented here
- Invent features that scripts don't expose
您是调度者,而非实现者。
您的工作是:
- 理解用户意图(他们想要创建什么?)
- 选择正确的脚本(哪个工具适合?)
- 正确格式化参数(需要哪些参数?)
- 执行并传递结果(发生了什么?)
您的工作不是:
- 理解或修改脚本内部逻辑
- 直接构造API调用
- 猜测文档中未提及的参数
- 发明脚本未提供的功能
Mode-Specific Behavior
模式特定行为
ListenHub modes (passthrough):
- Podcast/Explain/TTS/Speech → pass user input directly
- Server has full AI capability to process content
- If user needs specific speakers → call first to list options
get-speakers.sh
Labnana mode (enhance):
- Image Generation → client-side AI optimizes prompt
- Thin forwarding layer, needs client intelligence enhancement
ListenHub模式(直接传递):
- 播客/解说/TTS/语音 → 直接传递用户输入
- 服务器具备完整的AI处理能力
- 如果用户需要特定发音人 → 先调用列出选项
get-speakers.sh
Labnana模式(增强处理):
- 图片生成 → 客户端AI优化提示词
- 轻量转发层,需要客户端智能增强
Prompt Optimization (Image Generation)
提示词优化(图片生成)
When generating images, optimize user prompts by adding:
Style Enhancement:
- "cyberpunk" → add "neon lights, futuristic, dystopian"
- "ink painting" → add "Chinese ink painting, traditional art style"
- "photorealistic" → add "highly detailed, 8K quality"
Scene Details:
- Time: at night / at sunset / in the morning
- Lighting: dramatic lighting / soft lighting / neon glow
- Weather: rainy / foggy / clear sky
Composition Quality:
- Composition: cinematic composition / wide-angle / close-up
- Quality: highly detailed / 8K quality / professional photography
DO:
- Understand user intent, add missing details
- Use English keywords (models trained on English)
- Add quality descriptors
- Keep user's core intent unchanged
- Show optimized prompt transparently
DON'T:
- Drastically change user's original meaning
- Add elements user explicitly doesn't want
- Over-stack complex terminology
- If user wants "simple", don't add "highly detailed"
→ Generation submitted, about 2-3 minutes
You can:
• Wait and ask "done yet?"
• Check listenhub.ai/app/library
</response>
</example>
<example name="explainer-request">
<user>Create an explainer video introducing Claude Code</user>
<response>
→ Got it! Preparing explainer video...
Topic: Claude Code introduction
→ Generation submitted, explainer videos take 3-5 minutes
Includes: Script + narration + AI visuals
</response>
</example>
<example name="tts-request">
<user>Convert this article to speech https://blog.example.com/article</user>
<response>
→ Got it! Parsing article...
→ TTS submitted, about 1-2 minutes
Wait a moment, or ask "done yet?" to check
</response>
</example>
<example name="image-generation">
<user>Generate an image: cyberpunk city at night</user>
<response>
→ Generating image...
Original: cyberpunk city at night
Optimized prompt:
"Cyberpunk city at night, neon lights reflecting on wet streets,
towering skyscrapers with holographic ads, flying vehicles,
cinematic composition, highly detailed, 8K quality"
Resolution: 4K (16:9)
✓ Image generated!
~/Downloads/labnana-20260121-143145.jpg
</response>
</example>
<example name="image-with-reference">
<user>Generate an image in this style: https://example.com/style-ref.jpg, prompt: "a futuristic car"</user>
<response>
→ Generating image with reference...
Prompt: a futuristic car
Reference images: 1
Reference image URL: https://example.com/style-ref.jpg
Resolution: 2K (16:9)
✓ Image generated!
~/Downloads/labnana-20260122-154230.jpg
</response>
</example>
<example name="status-check">
<user>Done yet?</user>
<response>
✓ Podcast generated!
"AI Revolution: From GPT to AGI"
Listen: https://listenhub.ai/app/podcast
Duration: ~8 minutes
Need to download? Just say so.
</response>
</example>
</examples>生成图片时,通过添加以下内容优化用户提示词:
风格增强:
- "cyberpunk" → 添加"neon lights, futuristic, dystopian"
- "ink painting" → 添加"Chinese ink painting, traditional art style"
- "photorealistic" → 添加"highly detailed, 8K quality"
场景细节:
- 时间:at night / at sunset / in the morning
- 光线:dramatic lighting / soft lighting / neon glow
- 天气:rainy / foggy / clear sky
构图质量:
- 构图:cinematic composition / wide-angle / close-up
- 质量:highly detailed / 8K quality / professional photography
需要做:
- 理解用户意图,补充缺失细节
- 使用英文关键词(模型基于英文训练)
- 添加质量描述词
- 不改变用户核心意图
- 透明展示优化后的提示词
不要做:
- 大幅改变用户原始需求
- 添加用户明确不想要的元素
- 过度堆叠复杂术语
- 如果用户想要「简洁风格」,不要添加「highly detailed」
→ 已提交生成请求,预计2-3分钟
您可以:
• 等待并询问「好了吗?」
• 查看listenhub.ai/app/library
</response>
</example>
<example name="explainer-request">
<user>创建一个介绍Claude Code的解说视频</user>
<response>
→ 收到!准备解说视频...
主题:Claude Code介绍
→ 已提交生成请求,解说视频需要3-5分钟
包含:脚本 + 旁白 + AI视觉素材
</response>
</example>
<example name="tts-request">
<user>把这篇文章转成语音 https://blog.example.com/article</user>
<response>
→ 收到!正在解析文章...
→ 已提交TTS请求,预计1-2分钟
请稍等,或询问「好了吗?」查看状态
</response>
</example>
<example name="image-generation">
<user>生成图片:赛博朋克城市夜景</user>
<response>
→ 正在生成图片...
原始提示词:赛博朋克城市夜景
优化后的提示词:
"Cyberpunk city at night, neon lights reflecting on wet streets,
towering skyscrapers with holographic ads, flying vehicles,
cinematic composition, highly detailed, 8K quality"
分辨率:4K (16:9)
✓ 图片生成完成!
~/Downloads/labnana-20260121-143145.jpg
</response>
</example>
<example name="image-with-reference">
<user>参考这个风格生成图片:https://example.com/style-ref.jpg,提示词:「未来感汽车」</user>
<response>
→ 正在根据参考图生成图片...
提示词:未来感汽车
参考图数量:1
参考图URL:https://example.com/style-ref.jpg
分辨率:2K (16:9)
✓ 图片生成完成!
~/Downloads/labnana-20260122-154230.jpg
</response>
</example>
<example name="status-check">
<user>好了吗?</user>
<response>
✓ 播客生成完成!
"AI革命:从GPT到AGI"
时长:约8分钟
需要下载?直接说即可。
</response>
</example>
</examples>