muapi-youtube-shorts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYouTube Shorts Generator
YouTube Shorts 生成器
End-to-end pipeline: Long Video → Transcript → Ranked Highlights → Vertical Clips.
Turns one long video into N viral-ready vertical mp4s. Each clip ships with a viral score (0–100), an opening hook line, and a one-sentence reason it should perform.
Reference implementation: https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator
Underlying API: https://muapi.ai/playground/ai-clipping
端到端流程:长视频 → 转录文本 → 高光片段评分 → 垂直短视频。
将单条长视频转换为N条可直接用于打造爆款的垂直mp4短视频。每条片段都会附带一个爆款评分(0–100)、一句开场钩子文案,以及一段说明其传播潜力的理由。
参考实现:https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator
底层API:https://muapi.ai/playground/ai-clipping
Agent Execution Protocol
Agent执行协议
Step 1 — Collect Inputs
步骤1 — 收集输入
Ask once, then proceed:
| Input | Default | Notes |
|---|---|---|
| — | YouTube URL, or hosted mp4 URL, or local file path |
| | How many shorts to render |
| | |
| | |
| auto | Whisper language code (e.g. |
| — | Optional path; if set, dump full result there |
If the user gave only a URL, use defaults and don't block on questions.
一次性询问后即可执行:
| 输入项 | 默认值 | 说明 |
|---|---|---|
| — | YouTube链接、托管mp4链接或本地文件路径 |
| | 要生成的短视频数量 |
| | TikTok/Reels/Shorts用 |
| | 可选值: |
| auto | Whisper语言代码(例如 |
| — | 可选路径;若设置,会将完整结果导出至该文件 |
若用户仅提供了链接,直接使用默认参数,无需额外询问。
Step 2 — Verify Prerequisites
步骤2 — 验证前置条件
- installed and authed (
muapi-cli)muapi auth configure - on PATH (Whisper needs it for audio decoding)
ffmpeg - Python 3.10+ with installed (only if running the local transcribe stage)
openai-whisper
If is missing, stop and ask the user. Never invent a key.
MUAPI_API_KEY- 已安装并授权(执行
muapi-cli)muapi auth configure - PATH环境变量中存在(Whisper需要它解码音频)
ffmpeg - 已安装Python 3.10+及(仅在运行本地转录阶段时需要)
openai-whisper
若缺少,请停止操作并询问用户。切勿自行生成密钥。
MUAPI_API_KEYStep 3 — Run the Pipeline
步骤3 — 运行流程
The standard path is the orchestrator script — it handles all eight stages in order:
bash
bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
--source "<YOUTUBE_URL>" \
--num-clips 5 \
--aspect-ratio 9:16 \
--whisper-model base \
--output-json result.json \
--viewThe eight stages:
- Download — pull the source video at the requested resolution (/
360/480/720, default1080). For local files, skip.720 - Transcribe — local Whisper produces timestamped segments. Audio stays on the machine.
- Classify content type — LLM tags the video (podcast / interview / tutorial / vlog / lecture / monologue) and density. Tunes the highlight prompt per type.
- Chunk if long — videos > (1800s default) are split into
LONG_VIDEO_THRESHOLD(1200s default) windows withCHUNK_SIZE_SECONDS(60s default) overlap so cross-boundary highlights aren't missed.CHUNK_OVERLAP_SECONDS - Rank highlights — LLM scans each chunk through :
VIRALITY_CRITERIA- Hook moments — strong opening line that stops the scroll
- Emotional peaks — laughter, anger, vulnerability, awe
- Opinion bombs — spicy, contrarian, debate-bait takes
- Revelation moments — "wait, what?" reframes
- Conflict — disagreement, tension, callouts
- Quotable lines — tight, screenshot-worthy phrasing
- Story peaks — climax of a narrative arc
- Practical value — actionable insight a viewer will save
Each candidate gets ,
start_time,end_time0–100,score,title,hook_sentence. Aim for 30–75s clips unless content dictates otherwise.virality_reason
- Dedupe — collapse overlaps. Rule: if two candidates overlap > 50%, keep the higher score, drop the other.
- Top-N selection — sort surviving candidates by score, take .
num_clips - Vertical auto-crop — render each highlight at via
aspect_ratio. Auto-handles face tracking and screen recordings.muapi edit clipping
标准执行方式是使用编排脚本,它会按顺序处理全部8个阶段:
bash
bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
--source "<YOUTUBE_URL>" \
--num-clips 5 \
--aspect-ratio 9:16 \
--whisper-model base \
--output-json result.json \
--view8个阶段详情:
- 下载 — 按指定分辨率(/
360/480/720,默认1080)拉取源视频。若为本地文件则跳过此步骤。720 - 转录 — 通过本地Whisper生成带时间戳的文本片段。音频全程保留在本地设备。
- 内容类型分类 — LLM为视频打上标签(播客/访谈/教程/日常vlog/讲座/独白)并标记内容密度,根据类型调整高光片段的评估提示词。
- 长视频分块 — 时长超过(默认1800秒)的视频会被分割为
LONG_VIDEO_THRESHOLD(默认1200秒)的片段,且片段间保留CHUNK_SIZE_SECONDS(默认60秒)的重叠部分,避免遗漏跨片段的高光内容。CHUNK_OVERLAP_SECONDS - 高光片段评分 — LLM依据扫描每个分块:
VIRALITY_CRITERIA- 钩子时刻 — 能立刻抓住用户注意力的开场语句
- 情绪峰值 — 笑声、愤怒、脆弱感、惊叹等情绪爆发点
- 争议观点 — 尖锐、反向、引发讨论的观点
- 颠覆性发现 — 让人发出“等等,这怎么回事?”的认知重构内容
- 冲突 — 分歧、紧张、抨击等对立场景
- 金句 — 简洁、适合截图传播的语句
- 故事高潮 — 叙事弧中的顶点
- 实用价值 — 用户会收藏的可操作见解
每个候选片段都会获得、
start_time、0–100的end_time、score、title、hook_sentence。除非内容特殊,否则目标片段时长为30–75秒。virality_reason
- 去重 — 合并重叠片段。规则:若两个候选片段重叠率超过50%,保留评分较高的片段,丢弃另一个。
- 筛选前N片段 — 将留存的候选片段按评分排序,选取前个。
num_clips - 自动垂直裁剪 — 通过按指定
muapi edit clipping渲染每个高光片段,自动处理人脸追踪和屏幕录制内容。aspect_ratio
Quick Invocation Patterns
快速调用方式
Single video, defaults:
bash
bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"Tuned for high-density podcast (more clips, larger Whisper model):
bash
bash scripts/run-youtube-shorts.sh \
--source "<URL>" --num-clips 8 --whisper-model medium --viewSquare clips for Instagram feed:
bash
bash scripts/run-youtube-shorts.sh \
--source "<URL>" --aspect-ratio 1:1 --num-clips 3Batch — with one URL per line:
urls.txtbash
xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"Async submit (returns request_id, poll later):
bash
REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
--source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputs单视频,使用默认参数:
bash
bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"针对高密度播客优化(更多片段、更大的Whisper模型):
bash
bash scripts/run-youtube-shorts.sh \
--source "<URL>" --num-clips 8 --whisper-model medium --view适合Instagram动态的方形片段:
bash
bash scripts/run-youtube-shorts.sh \
--source "<URL>" --aspect-ratio 1:1 --num-clips 3批量处理 — 文件中每行一个链接:
urls.txtbash
xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"异步提交(返回request_id,后续轮询状态):
bash
REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
--source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputsPlatform Specs
平台规格
| Platform | Aspect | Sweet-spot duration | Notes |
|---|---|---|---|
| YouTube Shorts | 9:16 | 30–60s | Hook in first 1s, max quality |
| TikTok | 9:16 | 30–75s | High energy; longer is fine if hook lands |
| Instagram Reels | 9:16 | 30–60s | Hook in first 1s |
| Instagram Feed | 1:1 | 15–45s | Static-feel works well |
| 16:9 or 1:1 | 30–60s | Professional tone | |
| Twitter/X | 16:9 | 15–60s | Punchy, direct |
| 平台 | 比例 | 最佳时长 | 说明 |
|---|---|---|---|
| YouTube Shorts | 9:16 | 30–60秒 | 第1秒就要有钩子,画质拉满 |
| TikTok | 9:16 | 30–75秒 | 高能量;钩子吸引人的话可以更长 |
| Instagram Reels | 9:16 | 30–60秒 | 第1秒就要有钩子 |
| Instagram动态 | 1:1 | 15–45秒 | 静态风格表现良好 |
| 16:9或1:1 | 30–60秒 | 专业语气 | |
| Twitter/X | 16:9 | 15–60秒 | 简洁直接 |
Output Schema
输出结构
json
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ /* every candidate, before top-N cut */ ],
"shorts": [
{
"title": "The one mistake that cost me $50K",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "Nobody talks about this, but it killed my first startup...",
"virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
"clip_url": "https://.../short_1.mp4"
}
]
}When reporting back to the user, surface for each clip: rank, score, time range, title, hook, and clip URL. Skip the raw transcript unless asked.
json
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ /* 所有候选片段,未经过前N筛选 */ ],
"shorts": [
{
"title": "The one mistake that cost me $50K",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "Nobody talks about this, but it killed my first startup...",
"virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
"clip_url": "https://.../short_1.mp4"
}
]
}向用户反馈时,需展示每条片段的排名、评分、时间范围、标题、钩子文案和片段链接。除非用户要求,否则无需提供原始转录文本。
Tunable Knobs
可调参数
Edit defaults inside the orchestrator or pass via flags:
| Knob | Default | Purpose |
|---|---|---|
| | Chunk length for long videos |
| | Videos longer than this get chunked |
| | Overlap between chunks |
| | Seconds between job-status polls |
| | Give up after this long |
| | Min IoU to collapse overlapping candidates |
可在编排脚本中修改默认值,或通过命令行参数传递:
| 参数 | 默认值 | 用途 |
|---|---|---|
| | 长视频的分块时长 |
| | 超过此时长的视频会被分块 |
| | 分块之间的重叠时长 |
| | 轮询任务状态的间隔(秒) |
| | 超时时间(秒),超过则放弃 |
| | 合并重叠候选片段的最小IoU阈值 |
Whisper Model Selection
Whisper模型选择
- /
tiny— fast, English-leaning, fine for clean studio audiobase - /
small— better for accents and music bedsmedium - — highest accuracy, much slower; only worth it on a GPU
large
Pick unless transcript quality is poor, then bump to .
basemedium- /
tiny— 速度快,偏向英文,适合清晰的工作室音频base - /
small— 对口音和背景音乐的处理更好medium - — 准确率最高,但速度慢很多;仅在GPU环境下值得使用
large
除非转录质量不佳,否则默认选择,若质量差则升级为。
basemediumCommon Mistakes to Avoid
需避免的常见错误
- Skipping the dedupe step — without it, you ship near-duplicate clips that all came from the same hot moment.
- Generic virality prompt — the highlight ranker must score against the eight signals above, not "interestingness."
- Wrong aspect ratio for the platform — YouTube Shorts and TikTok are ; LinkedIn often
9:16. Default to16:9only if the platform isn't specified.9:16 - Crop without face tracking — vertical crops on talking-head content must follow the speaker's face; static center-crop loses the subject.
- Padding to hit — if dedupe leaves fewer survivors than requested, return what you have. Don't ship low-score filler.
num_clips - Re-running the full pipeline on a 404'd clip URL — re-run only the crop stage for that highlight.
- 跳过去重步骤 — 若跳过,会输出多个来自同一高光时刻的近乎重复片段。
- 通用化爆款评估提示词 — 高光片段评分必须基于上述8个信号,而非笼统的“趣味性”。
- 平台比例错误 — YouTube Shorts和TikTok为;LinkedIn常用
9:16。仅当未指定平台时默认使用16:9。9:16 - 无脸追踪的裁剪 — 对访谈类内容进行垂直裁剪时必须追踪说话者的面部;静态居中裁剪会丢失主体。
- 凑数达到数量 — 若去重后留存的片段数量少于请求数,返回现有片段即可,不要输出低评分的填充内容。
num_clips - 片段链接404后重新运行全流程 — 仅需重新运行该高光片段的裁剪阶段。
Failure Modes
故障场景
- — stop and tell the user to install (
ffmpeg not found on PATH/brew install ffmpeg).apt install ffmpeg - Whisper produced no segments — likely no detectable speech or a hard language. Retry with before declaring failure.
--whisper-model medium --language <code> - API key missing or rejected — surface the exact error; don't fabricate a key.
- Job timed out — bump and retry; don't silently truncate.
MUAPI_POLL_TIMEOUT - Highlight ranker returned <— return what survived dedupe with a note.
num_clips
- — 停止操作并告知用户安装ffmpeg(执行
ffmpeg not found on PATH/brew install ffmpeg)。apt install ffmpeg - Whisper未生成任何文本片段 — 可能是检测不到语音或语言识别困难。先尝试使用重新运行,再判定失败。
--whisper-model medium --language <code> - API密钥缺失或被拒绝 — 显示具体错误信息;切勿自行生成密钥。
- 任务超时 — 增大后重试;不要静默截断结果。
MUAPI_POLL_TIMEOUT - 高光片段评分器返回的数量<— 返回去重后的留存片段并附上说明。
num_clips
Done Criteria
完成标准
The skill is done when:
- has up to
result.shortsentries, each with a workingnum_clips.clip_url - The user has been shown the ranked list (score, time range, title, hook, URL).
- If was set, the file exists and parses.
--output-json
当满足以下条件时,技能执行完成:
- 中包含最多
result.shorts条记录,每条记录的num_clips均可正常访问。clip_url - 已向用户展示排名列表(评分、时间范围、标题、钩子文案、链接)。
- 若设置了,对应的文件已生成且可正常解析。
--output-json