muapi-youtube-shorts

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

YouTube Shorts Generator

YouTube Shorts 生成器

End-to-end pipeline: Long Video → Transcript → Ranked Highlights → Vertical Clips.

Turns one long video into N viral-ready vertical mp4s. Each clip ships with a viral score (0–100), an opening hook line, and a one-sentence reason it should perform.

Reference implementation: https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator Underlying API: https://muapi.ai/playground/ai-clipping

端到端流程：长视频 → 转录文本 → 高光片段评分 → 垂直短视频。

将单条长视频转换为N条可直接用于打造爆款的垂直mp4短视频。每条片段都会附带一个爆款评分（0–100）、一句开场钩子文案，以及一段说明其传播潜力的理由。

参考实现：https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator 底层API：https://muapi.ai/playground/ai-clipping

Agent Execution Protocol

Agent执行协议

Step 1 — Collect Inputs

步骤1 — 收集输入

Ask once, then proceed:

Input	Default	Notes
`source`	—	YouTube URL, or hosted mp4 URL, or local file path
`num_clips`	`3`	How many shorts to render
`aspect_ratio`	`9:16`	`9:16` for TikTok/Reels/Shorts, `1:1` square, `4:5` portrait
`whisper_model`	`base`	`tiny` / `base` / `small` / `medium` / `large`
`language`	auto	Whisper language code (e.g. `en` )
`output_json`	—	Optional path; if set, dump full result there

If the user gave only a URL, use defaults and don't block on questions.

一次性询问后即可执行：

输入项	默认值	说明
`source`	—	YouTube链接、托管mp4链接或本地文件路径
`num_clips`	`3`	要生成的短视频数量
`aspect_ratio`	`9:16`	TikTok/Reels/Shorts用 `9:16` ，方形用 `1:1` ，竖屏用 `4:5`
`whisper_model`	`base`	可选值： `tiny` / `base` / `small` / `medium` / `large`
`language`	auto	Whisper语言代码（例如 `en` ）
`output_json`	—	可选路径；若设置，会将完整结果导出至该文件

若用户仅提供了链接，直接使用默认参数，无需额外询问。

Step 2 — Verify Prerequisites

步骤2 — 验证前置条件

```
muapi-cli
```
installed and authed (
```
muapi auth configure
```
)
```
ffmpeg
```
on PATH (Whisper needs it for audio decoding)
Python 3.10+ with
```
openai-whisper
```
installed (only if running the local transcribe stage)

MUAPI_API_KEY

is missing, stop and ask the user. Never invent a key.

已安装并授权
```
muapi-cli
```
（执行
```
muapi auth configure
```
）
PATH环境变量中存在
```
ffmpeg
```
（Whisper需要它解码音频）
已安装Python 3.10+及
```
openai-whisper
```
（仅在运行本地转录阶段时需要）

若缺少

MUAPI_API_KEY

，请停止操作并询问用户。切勿自行生成密钥。

Step 3 — Run the Pipeline

步骤3 — 运行流程

The standard path is the orchestrator script — it handles all eight stages in order:

bash

bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
  --source "<YOUTUBE_URL>" \
  --num-clips 5 \
  --aspect-ratio 9:16 \
  --whisper-model base \
  --output-json result.json \
  --view

The eight stages:

Download — pull the source video at the requested resolution (
```
360
```
/
```
480
```
/
```
720
```
/
```
1080
```
, default
```
720
```
). For local files, skip.
Transcribe — local Whisper produces timestamped segments. Audio stays on the machine.
Classify content type — LLM tags the video (podcast / interview / tutorial / vlog / lecture / monologue) and density. Tunes the highlight prompt per type.
Chunk if long — videos >
```
LONG_VIDEO_THRESHOLD
```
(1800s default) are split into
```
CHUNK_SIZE_SECONDS
```
(1200s default) windows with
```
CHUNK_OVERLAP_SECONDS
```
(60s default) overlap so cross-boundary highlights aren't missed.
Rank highlights — LLM scans each chunk through
```
VIRALITY_CRITERIA
```
:
- Hook moments — strong opening line that stops the scroll
- Emotional peaks — laughter, anger, vulnerability, awe
- Opinion bombs — spicy, contrarian, debate-bait takes
- Revelation moments — "wait, what?" reframes
- Conflict — disagreement, tension, callouts
- Quotable lines — tight, screenshot-worthy phrasing
- Story peaks — climax of a narrative arc
- Practical value — actionable insight a viewer will save Each candidate gets
```
start_time
```
  ,
```
end_time
```
  ,
```
score
```
  0–100,
```
title
```
  ,
```
hook_sentence
```
  ,
```
virality_reason
```
  . Aim for 30–75s clips unless content dictates otherwise.
Dedupe — collapse overlaps. Rule: if two candidates overlap > 50%, keep the higher score, drop the other.
Top-N selection — sort surviving candidates by score, take
```
num_clips
```
.
Vertical auto-crop — render each highlight at
```
aspect_ratio
```
via
```
muapi edit clipping
```
. Auto-handles face tracking and screen recordings.

标准执行方式是使用编排脚本，它会按顺序处理全部8个阶段：

bash

bash library/social/youtube-shorts/scripts/run-youtube-shorts.sh \
  --source "<YOUTUBE_URL>" \
  --num-clips 5 \
  --aspect-ratio 9:16 \
  --whisper-model base \
  --output-json result.json \
  --view

8个阶段详情：

下载 — 按指定分辨率（
```
360
```
/
```
480
```
/
```
720
```
/
```
1080
```
，默认
```
720
```
）拉取源视频。若为本地文件则跳过此步骤。
转录 — 通过本地Whisper生成带时间戳的文本片段。音频全程保留在本地设备。
内容类型分类 — LLM为视频打上标签（播客/访谈/教程/日常vlog/讲座/独白）并标记内容密度，根据类型调整高光片段的评估提示词。
长视频分块 — 时长超过
```
LONG_VIDEO_THRESHOLD
```
（默认1800秒）的视频会被分割为
```
CHUNK_SIZE_SECONDS
```
（默认1200秒）的片段，且片段间保留
```
CHUNK_OVERLAP_SECONDS
```
（默认60秒）的重叠部分，避免遗漏跨片段的高光内容。
高光片段评分 — LLM依据
```
VIRALITY_CRITERIA
```
扫描每个分块：
- 钩子时刻 — 能立刻抓住用户注意力的开场语句
- 情绪峰值 — 笑声、愤怒、脆弱感、惊叹等情绪爆发点
- 争议观点 — 尖锐、反向、引发讨论的观点
- 颠覆性发现 — 让人发出“等等，这怎么回事？”的认知重构内容
- 冲突 — 分歧、紧张、抨击等对立场景
- 金句 — 简洁、适合截图传播的语句
- 故事高潮 — 叙事弧中的顶点
- 实用价值 — 用户会收藏的可操作见解每个候选片段都会获得
```
start_time
```
  、
```
end_time
```
  、0–100的
```
score
```
  、
```
title
```
  、
```
hook_sentence
```
  、
```
virality_reason
```
  。除非内容特殊，否则目标片段时长为30–75秒。
去重 — 合并重叠片段。规则：若两个候选片段重叠率超过50%，保留评分较高的片段，丢弃另一个。
筛选前N片段 — 将留存的候选片段按评分排序，选取前
```
num_clips
```
个。
自动垂直裁剪 — 通过
```
muapi edit clipping
```
按指定
```
aspect_ratio
```
渲染每个高光片段，自动处理人脸追踪和屏幕录制内容。

Quick Invocation Patterns

快速调用方式

Single video, defaults:

bash

bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"

Tuned for high-density podcast (more clips, larger Whisper model):

bash

bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --num-clips 8 --whisper-model medium --view

Square clips for Instagram feed:

bash

bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --aspect-ratio 1:1 --num-clips 3

Batch —
urls.txt
with one URL per line:

bash

xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"

Async submit (returns request_id, poll later):

bash

REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputs

单视频，使用默认参数：

bash

bash scripts/run-youtube-shorts.sh --source "https://youtube.com/watch?v=VIDEO_ID"

针对高密度播客优化（更多片段、更大的Whisper模型）：

bash

bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --num-clips 8 --whisper-model medium --view

适合Instagram动态的方形片段：

bash

bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --aspect-ratio 1:1 --num-clips 3

批量处理 —
urls.txt
文件中每行一个链接：

bash

xargs -a urls.txt -I{} bash scripts/run-youtube-shorts.sh --source "{}"

异步提交（返回request_id，后续轮询状态）：

bash

REQUEST_ID=$(bash scripts/run-youtube-shorts.sh \
  --source "<URL>" --async --output-json - --jq '.request_id' | tr -d '"')
muapi predict wait "$REQUEST_ID" --download ./outputs

Platform Specs

平台规格

Platform	Aspect	Sweet-spot duration	Notes
YouTube Shorts	9:16	30–60s	Hook in first 1s, max quality
TikTok	9:16	30–75s	High energy; longer is fine if hook lands
Instagram Reels	9:16	30–60s	Hook in first 1s
Instagram Feed	1:1	15–45s	Static-feel works well
LinkedIn	16:9 or 1:1	30–60s	Professional tone
Twitter/X	16:9	15–60s	Punchy, direct

平台	比例	最佳时长	说明
YouTube Shorts	9:16	30–60秒	第1秒就要有钩子，画质拉满
TikTok	9:16	30–75秒	高能量；钩子吸引人的话可以更长
Instagram Reels	9:16	30–60秒	第1秒就要有钩子
Instagram动态	1:1	15–45秒	静态风格表现良好
LinkedIn	16:9或1:1	30–60秒	专业语气
Twitter/X	16:9	15–60秒	简洁直接

Output Schema

输出结构

json

{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ /* every candidate, before top-N cut */ ],
  "shorts": [
    {
      "title": "The one mistake that cost me $50K",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "Nobody talks about this, but it killed my first startup...",
      "virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}

When reporting back to the user, surface for each clip: rank, score, time range, title, hook, and clip URL. Skip the raw transcript unless asked.

json

{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ /* 所有候选片段，未经过前N筛选 */ ],
  "shorts": [
    {
      "title": "The one mistake that cost me $50K",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "Nobody talks about this, but it killed my first startup...",
      "virality_reason": "Opens with a number + regret, peaks on a contrarian lesson",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}

向用户反馈时，需展示每条片段的排名、评分、时间范围、标题、钩子文案和片段链接。除非用户要求，否则无需提供原始转录文本。

Tunable Knobs

可调参数

Edit defaults inside the orchestrator or pass via flags:

Knob	Default	Purpose
`CHUNK_SIZE_SECONDS`	`1200`	Chunk length for long videos
`LONG_VIDEO_THRESHOLD`	`1800`	Videos longer than this get chunked
`CHUNK_OVERLAP_SECONDS`	`60`	Overlap between chunks
`MUAPI_POLL_INTERVAL`	`5`	Seconds between job-status polls
`MUAPI_POLL_TIMEOUT`	`1800`	Give up after this long
`OVERLAP_DEDUPE_THRESHOLD`	`0.5`	Min IoU to collapse overlapping candidates

可在编排脚本中修改默认值，或通过命令行参数传递：

参数	默认值	用途
`CHUNK_SIZE_SECONDS`	`1200`	长视频的分块时长
`LONG_VIDEO_THRESHOLD`	`1800`	超过此时长的视频会被分块
`CHUNK_OVERLAP_SECONDS`	`60`	分块之间的重叠时长
`MUAPI_POLL_INTERVAL`	`5`	轮询任务状态的间隔（秒）
`MUAPI_POLL_TIMEOUT`	`1800`	超时时间（秒），超过则放弃
`OVERLAP_DEDUPE_THRESHOLD`	`0.5`	合并重叠候选片段的最小IoU阈值

Whisper Model Selection

Whisper模型选择

```
tiny
```
/
```
base
```
— fast, English-leaning, fine for clean studio audio
```
small
```
/
```
medium
```
— better for accents and music beds
```
large
```
— highest accuracy, much slower; only worth it on a GPU

Pick

base

unless transcript quality is poor, then bump to

medium

```
tiny
```
/
```
base
```
— 速度快，偏向英文，适合清晰的工作室音频
```
small
```
/
```
medium
```
— 对口音和背景音乐的处理更好
```
large
```
— 准确率最高，但速度慢很多；仅在GPU环境下值得使用

除非转录质量不佳，否则默认选择

base

，若质量差则升级为

medium

。

Common Mistakes to Avoid

需避免的常见错误

Skipping the dedupe step — without it, you ship near-duplicate clips that all came from the same hot moment.
Generic virality prompt — the highlight ranker must score against the eight signals above, not "interestingness."
Wrong aspect ratio for the platform — YouTube Shorts and TikTok are
```
9:16
```
; LinkedIn often
```
16:9
```
. Default to
```
9:16
```
only if the platform isn't specified.
Crop without face tracking — vertical crops on talking-head content must follow the speaker's face; static center-crop loses the subject.
Padding to hit
num_clips
— if dedupe leaves fewer survivors than requested, return what you have. Don't ship low-score filler.
Re-running the full pipeline on a 404'd clip URL — re-run only the crop stage for that highlight.

跳过去重步骤 — 若跳过，会输出多个来自同一高光时刻的近乎重复片段。
通用化爆款评估提示词 — 高光片段评分必须基于上述8个信号，而非笼统的“趣味性”。
平台比例错误 — YouTube Shorts和TikTok为
```
9:16
```
；LinkedIn常用
```
16:9
```
。仅当未指定平台时默认使用
```
9:16
```
。
无脸追踪的裁剪 — 对访谈类内容进行垂直裁剪时必须追踪说话者的面部；静态居中裁剪会丢失主体。
凑数达到
num_clips
数量 — 若去重后留存的片段数量少于请求数，返回现有片段即可，不要输出低评分的填充内容。
片段链接404后重新运行全流程 — 仅需重新运行该高光片段的裁剪阶段。

Failure Modes

故障场景

ffmpeg not found on PATH
— stop and tell the user to install (

brew install ffmpeg

apt install ffmpeg

Whisper produced no segments — likely no detectable speech or a hard language. Retry with
```
--whisper-model medium --language <code>
```
before declaring failure.
API key missing or rejected — surface the exact error; don't fabricate a key.
Job timed out — bump
```
MUAPI_POLL_TIMEOUT
```
and retry; don't silently truncate.
Highlight ranker returned <
num_clips
— return what survived dedupe with a note.

ffmpeg not found on PATH
— 停止操作并告知用户安装ffmpeg（执行

brew install ffmpeg

apt install ffmpeg

）。

Whisper未生成任何文本片段 — 可能是检测不到语音或语言识别困难。先尝试使用
```
--whisper-model medium --language <code>
```
重新运行，再判定失败。
API密钥缺失或被拒绝 — 显示具体错误信息；切勿自行生成密钥。
任务超时 — 增大
```
MUAPI_POLL_TIMEOUT
```
后重试；不要静默截断结果。
高光片段评分器返回的数量<
num_clips
— 返回去重后的留存片段并附上说明。

Done Criteria

完成标准

The skill is done when:

```
result.shorts
```
has up to
```
num_clips
```
entries, each with a working
```
clip_url
```
.
The user has been shown the ranked list (score, time range, title, hook, URL).
If
```
--output-json
```
was set, the file exists and parses.

当满足以下条件时，技能执行完成：

```
result.shorts
```
中包含最多
```
num_clips
```
条记录，每条记录的
```
clip_url
```
均可正常访问。
已向用户展示排名列表（评分、时间范围、标题、钩子文案、链接）。
若设置了
```
--output-json
```
，对应的文件已生成且可正常解析。