muapi-youtube-shorts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

YouTube Shorts Generator

YouTube Shorts 生成器

End-to-end pipeline: Long Video → Transcript → Ranked Highlights → Vertical Clips.
Turns one long video into N viral-ready vertical mp4s. Each clip ships with a viral score (0–100), an opening hook line, and a one-sentence reason it should perform.

端到端流程:长视频 → 转录文本 → 高光片段评分 → 垂直短视频。
将单条长视频转换为N条可直接用于打造爆款的垂直mp4短视频。每条片段都会附带一个爆款评分(0–100)、一句开场钩子文案,以及一段说明其传播潜力的理由。

Agent Execution Protocol

Agent执行协议

Step 1 — Collect Inputs

步骤1 — 收集输入

Ask once, then proceed:
InputDefaultNotes
source
YouTube URL, or hosted mp4 URL, or local file path
num_clips
3
How many shorts to render
aspect_ratio
9:16
9:16
for TikTok/Reels/Shorts,
1:1
square,
4:5
portrait
whisper_model
base
tiny
/
base
/
small
/
medium
/
large
language
autoWhisper language code (e.g.
en
)
output_json
Optional path; if set, dump full result there
If the user gave only a URL, use defaults and don't block on questions.

一次性询问后即可执行:
输入项默认值说明
source
YouTube链接、托管mp4链接或本地文件路径
num_clips
3
要生成的短视频数量
aspect_ratio
9:16
TikTok/Reels/Shorts用
9:16
,方形用
1:1
,竖屏用
4:5
whisper_model
base
可选值:
tiny
/
base
/
small
/
medium
/
large
language
autoWhisper语言代码(例如
en
output_json
可选路径;若设置,会将完整结果导出至该文件
若用户仅提供了链接,直接使用默认参数,无需额外询问。

Step 2 — Verify Prerequisites

步骤2 — 验证前置条件

  • muapi-cli
    installed and authed (
    muapi auth configure
    )
  • ffmpeg
    on PATH (Whisper needs it for audio decoding)
  • Python 3.10+ with
    openai-whisper
    installed (only if running the local transcribe stage)
If
MUAPI_API_KEY
is missing, stop and ask the user. Never invent a key.

  • 已安装并授权
    muapi-cli
    (执行
    muapi auth configure
  • PATH环境变量中存在
    ffmpeg
    (Whisper需要它解码音频)
  • 已安装Python 3.10+及
    openai-whisper
    (仅在运行本地转录阶段时需要)
若缺少
MUAPI_API_KEY
,请停止操作并询问用户。切勿自行生成密钥。

Step 3 — Run the Pipeline

步骤3 — 运行流程

The standard path is the orchestrator script — it handles all eight stages in order:
bash
bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
  --source "<YOUTUBE_URL>" \
  --num-clips 5 \
  --aspect-ratio 9:16 \
  --whisper-model base \
  --output-json result.json \
  --view
The eight stages:
  1. Download — pull the source video at the requested resolution (
    360
    /
    480
    /
    720
    /
    1080
    , default
    720
    ). For local files, skip.
  2. Transcribe — local Whisper produces timestamped segments. Audio stays on the machine.
  3. Classify content type — LLM tags the video (podcast / interview / tutorial / vlog / lecture / monologue) and density. Tunes the highlight prompt per type.
  4. Chunk if long — videos >
    LONG_VIDEO_THRESHOLD
    (1800s default) are split into
    CHUNK_SIZE_SECONDS
    (1200s default) windows with
    CHUNK_OVERLAP_SECONDS
    (60s default) overlap so cross-boundary highlights aren't missed.
  5. Rank highlights — LLM scans each chunk through
    VIRALITY_CRITERIA
    :
    • Hook moments — strong opening line that stops the scroll
    • Emotional peaks — laughter, anger, vulnerability, awe
    • Opinion bombs — spicy, contrarian, debate-bait takes
    • Revelation moments — "wait, what?" reframes
    • Conflict — disagreement, tension, callouts
    • Quotable lines — tight, screenshot-worthy phrasing
    • Story peaks — climax of a narrative arc
    • Practical value — actionable insight a viewer will save Each candidate gets
      start_time
      ,
      end_time
      ,
      score
      0–100,
      title
      ,
      hook_sentence
      ,
      virality_reason
      . Aim for 30–75s clips unless content dictates otherwise.
  6. Dedupe — collapse overlaps. Rule: if two candidates overlap > 50%, keep the higher score, drop the other.
  7. Top-N selection — sort surviving candidates by score, take
    num_clips
    .
  8. Vertical auto-crop — render each highlight at
    aspect_ratio
    via
    muapi edit clipping
    . Auto-handles face tracking and screen recordings.

标准执行方式是使用编排脚本,它会按顺序处理全部8个阶段:
bash
bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
  --source "<YOUTUBE_URL>" \
  --num-clips 5 \
  --aspect-ratio 9:16 \
  --whisper-model base \
  --output-json result.json \
  --view
8个阶段详情:
  1. 下载 — 按指定分辨率(
    360
    /
    480
    /
    720
    /
    1080
    ,默认
    720
    )拉取源视频。若为本地文件则跳过此步骤。
  2. 转录 — 通过本地Whisper生成带时间戳的文本片段。音频全程保留在本地设备。
  3. 内容类型分类 — LLM为视频打上标签(播客/访谈/教程/日常vlog/讲座/独白)并标记内容密度,根据类型调整高光片段的评估提示词。
  4. 长视频分块 — 时长超过
    LONG_VIDEO_THRESHOLD
    (默认1800秒)的视频会被分割为
    CHUNK_SIZE_SECONDS
    (默认1200秒)的片段,且片段间保留
    CHUNK_OVERLAP_SECONDS
    (默认60秒)的重叠部分,避免遗漏跨片段的高光内容。
  5. 高光片段评分 — LLM依据
    VIRALITY_CRITERIA
    扫描每个分块:
    • 钩子时刻 — 能立刻抓住用户注意力的开场语句
    • 情绪峰值 — 笑声、愤怒、脆弱感、惊叹等情绪爆发点
    • 争议观点 — 尖锐、反向、引发讨论的观点
    • 颠覆性发现 — 让人发出“等等,这怎么回事?”的认知重构内容
    • 冲突 — 分歧、紧张、抨击等对立场景
    • 金句 — 简洁、适合截图传播的语句
    • 故事高潮 — 叙事弧中的顶点
    • 实用价值 — 用户会收藏的可操作见解 每个候选片段都会获得
      start_time
      end_time
      、0–100的
      score
      title
      hook_sentence
      virality_reason
      。除非内容特殊,否则目标片段时长为30–75秒。
  6. 去重 — 合并重叠片段。规则:若两个候选片段重叠率超过50%,保留评分较高的片段,丢弃另一个。
  7. 筛选前N片段 — 将留存的候选片段按评分排序,选取前
    num_clips
    个。
  8. 自动垂直裁剪 — 通过
    muapi edit clipping
    按指定
    aspect_ratio
    渲染每个高光片段,自动处理人脸追踪和屏幕录制内容。

Quick Invocation Patterns

快速调用方式

Single video, defaults:
bash
bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"
Tuned for high-density podcast (more clips, larger Whisper model):
bash
bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --num-clips 8 --whisper-model medium --view
Square clips for Instagram feed:
bash
bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --aspect-ratio 1:1 --num-clips 3
Batch —
urls.txt
with one URL per line:
bash
xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"
Async submit (returns request_id, poll later):
bash
REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputs

单视频,使用默认参数:
bash
bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"
针对高密度播客优化(更多片段、更大的Whisper模型):
bash
bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --num-clips 8 --whisper-model medium --view
适合Instagram动态的方形片段:
bash
bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --aspect-ratio 1:1 --num-clips 3
批量处理 —
urls.txt
文件中每行一个链接:
bash
xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"
异步提交(返回request_id,后续轮询状态):
bash
REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputs

Platform Specs

平台规格

PlatformAspectSweet-spot durationNotes
YouTube Shorts9:1630–60sHook in first 1s, max quality
TikTok9:1630–75sHigh energy; longer is fine if hook lands
Instagram Reels9:1630–60sHook in first 1s
Instagram Feed1:115–45sStatic-feel works well
LinkedIn16:9 or 1:130–60sProfessional tone
Twitter/X16:915–60sPunchy, direct

平台比例最佳时长说明
YouTube Shorts9:1630–60秒第1秒就要有钩子,画质拉满
TikTok9:1630–75秒高能量;钩子吸引人的话可以更长
Instagram Reels9:1630–60秒第1秒就要有钩子
Instagram动态1:115–45秒静态风格表现良好
LinkedIn16:9或1:130–60秒专业语气
Twitter/X16:915–60秒简洁直接

Output Schema

输出结构

json
{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ /* every candidate, before top-N cut */ ],
  "shorts": [
    {
      "title": "The one mistake that cost me $50K",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "Nobody talks about this, but it killed my first startup...",
      "virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}
When reporting back to the user, surface for each clip: rank, score, time range, title, hook, and clip URL. Skip the raw transcript unless asked.

json
{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ /* 所有候选片段,未经过前N筛选 */ ],
  "shorts": [
    {
      "title": "The one mistake that cost me $50K",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "Nobody talks about this, but it killed my first startup...",
      "virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}
向用户反馈时,需展示每条片段的排名、评分、时间范围、标题、钩子文案和片段链接。除非用户要求,否则无需提供原始转录文本。

Tunable Knobs

可调参数

Edit defaults inside the orchestrator or pass via flags:
KnobDefaultPurpose
CHUNK_SIZE_SECONDS
1200
Chunk length for long videos
LONG_VIDEO_THRESHOLD
1800
Videos longer than this get chunked
CHUNK_OVERLAP_SECONDS
60
Overlap between chunks
MUAPI_POLL_INTERVAL
5
Seconds between job-status polls
MUAPI_POLL_TIMEOUT
1800
Give up after this long
OVERLAP_DEDUPE_THRESHOLD
0.5
Min IoU to collapse overlapping candidates

可在编排脚本中修改默认值,或通过命令行参数传递:
参数默认值用途
CHUNK_SIZE_SECONDS
1200
长视频的分块时长
LONG_VIDEO_THRESHOLD
1800
超过此时长的视频会被分块
CHUNK_OVERLAP_SECONDS
60
分块之间的重叠时长
MUAPI_POLL_INTERVAL
5
轮询任务状态的间隔(秒)
MUAPI_POLL_TIMEOUT
1800
超时时间(秒),超过则放弃
OVERLAP_DEDUPE_THRESHOLD
0.5
合并重叠候选片段的最小IoU阈值

Whisper Model Selection

Whisper模型选择

  • tiny
    /
    base
    — fast, English-leaning, fine for clean studio audio
  • small
    /
    medium
    — better for accents and music beds
  • large
    — highest accuracy, much slower; only worth it on a GPU
Pick
base
unless transcript quality is poor, then bump to
medium
.

  • tiny
    /
    base
    — 速度快,偏向英文,适合清晰的工作室音频
  • small
    /
    medium
    — 对口音和背景音乐的处理更好
  • large
    — 准确率最高,但速度慢很多;仅在GPU环境下值得使用
除非转录质量不佳,否则默认选择
base
,若质量差则升级为
medium

Common Mistakes to Avoid

需避免的常见错误

  1. Skipping the dedupe step — without it, you ship near-duplicate clips that all came from the same hot moment.
  2. Generic virality prompt — the highlight ranker must score against the eight signals above, not "interestingness."
  3. Wrong aspect ratio for the platform — YouTube Shorts and TikTok are
    9:16
    ; LinkedIn often
    16:9
    . Default to
    9:16
    only if the platform isn't specified.
  4. Crop without face tracking — vertical crops on talking-head content must follow the speaker's face; static center-crop loses the subject.
  5. Padding to hit
    num_clips
    — if dedupe leaves fewer survivors than requested, return what you have. Don't ship low-score filler.
  6. Re-running the full pipeline on a 404'd clip URL — re-run only the crop stage for that highlight.

  1. 跳过去重步骤 — 若跳过,会输出多个来自同一高光时刻的近乎重复片段。
  2. 通用化爆款评估提示词 — 高光片段评分必须基于上述8个信号,而非笼统的“趣味性”。
  3. 平台比例错误 — YouTube Shorts和TikTok为
    9:16
    ;LinkedIn常用
    16:9
    。仅当未指定平台时默认使用
    9:16
  4. 无脸追踪的裁剪 — 对访谈类内容进行垂直裁剪时必须追踪说话者的面部;静态居中裁剪会丢失主体。
  5. 凑数达到
    num_clips
    数量
    — 若去重后留存的片段数量少于请求数,返回现有片段即可,不要输出低评分的填充内容。
  6. 片段链接404后重新运行全流程 — 仅需重新运行该高光片段的裁剪阶段。

Failure Modes

故障场景

  • ffmpeg not found on PATH
    — stop and tell the user to install (
    brew install ffmpeg
    /
    apt install ffmpeg
    ).
  • Whisper produced no segments — likely no detectable speech or a hard language. Retry with
    --whisper-model medium --language <code>
    before declaring failure.
  • API key missing or rejected — surface the exact error; don't fabricate a key.
  • Job timed out — bump
    MUAPI_POLL_TIMEOUT
    and retry; don't silently truncate.
  • Highlight ranker returned <
    num_clips
    — return what survived dedupe with a note.

  • ffmpeg not found on PATH
    — 停止操作并告知用户安装ffmpeg(执行
    brew install ffmpeg
    /
    apt install ffmpeg
    )。
  • Whisper未生成任何文本片段 — 可能是检测不到语音或语言识别困难。先尝试使用
    --whisper-model medium --language <code>
    重新运行,再判定失败。
  • API密钥缺失或被拒绝 — 显示具体错误信息;切勿自行生成密钥。
  • 任务超时 — 增大
    MUAPI_POLL_TIMEOUT
    后重试;不要静默截断结果。
  • 高光片段评分器返回的数量<
    num_clips
    — 返回去重后的留存片段并附上说明。

Done Criteria

完成标准

The skill is done when:
  1. result.shorts
    has up to
    num_clips
    entries, each with a working
    clip_url
    .
  2. The user has been shown the ranked list (score, time range, title, hook, URL).
  3. If
    --output-json
    was set, the file exists and parses.
当满足以下条件时,技能执行完成:
  1. result.shorts
    中包含最多
    num_clips
    条记录,每条记录的
    clip_url
    均可正常访问。
  2. 已向用户展示排名列表(评分、时间范围、标题、钩子文案、链接)。
  3. 若设置了
    --output-json
    ,对应的文件已生成且可正常解析。