video-editing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Editing
视频编辑
AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.
AI辅助的真实素材编辑,而非通过提示词生成视频。快速编辑现有视频。
When to Activate
激活场景
- User wants to edit, cut, or structure video footage
- Turning long recordings into short-form content
- Building vlogs, tutorials, or demo videos from raw capture
- Adding overlays, subtitles, music, or voiceover to existing video
- Reframing video for different platforms (YouTube, TikTok, Instagram)
- User says "edit video", "cut this footage", "make a vlog", or "video workflow"
- 用户想要编辑、剪辑或结构化视频素材
- 将长视频转换为短视频内容
- 从原始素材制作vlog、教程或演示视频
- 为现有视频添加叠加层、字幕、音乐或旁白
- 为不同平台(YouTube、TikTok、Instagram)重新调整视频画幅
- 用户提及“编辑视频”、“剪辑这段素材”、“制作vlog”或“视频工作流”
Core Thesis
核心理念
AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.
当你不再让AI生成完整视频,而是用它来压缩、结构化和增强真实素材时,AI视频编辑才真正有用。其价值不在于生成,而在于压缩(简化流程)。
The Pipeline
工作流
Screen Studio / raw footage
→ Claude / Codex
→ FFmpeg
→ Remotion
→ ElevenLabs / fal.ai
→ Descript or CapCutEach layer has a specific job. Do not skip layers. Do not try to make one tool do everything.
Screen Studio / 原始素材
→ Claude / Codex
→ FFmpeg
→ Remotion
→ ElevenLabs / fal.ai
→ Descript 或 CapCut每个环节都有特定作用,不要跳过任何环节,也不要试图用单一工具完成所有工作。
Layer 1: Capture (Screen Studio / Raw Footage)
环节1:素材采集(Screen Studio / 原始素材)
Collect the source material:
- Screen Studio: polished screen recordings for app demos, coding sessions, browser workflows
- Raw camera footage: vlog footage, interviews, event recordings
- Desktop capture via VideoDB: session recording with real-time context (see skill)
videodb
Output: raw files ready for organization.
收集源素材:
- Screen Studio:用于应用演示、编码会话、浏览器操作流程的高质量屏幕录制
- 原始相机素材:vlog素材、采访、活动录制内容
- 通过VideoDB进行桌面采集:带实时上下文的会话录制(详见技能)
videodb
输出:可用于整理的原始文件。
Layer 2: Organization (Claude / Codex)
环节2:素材整理(Claude / Codex)
Use Claude Code or Codex to:
- Transcribe and label: generate transcript, identify topics and themes
- Plan structure: decide what stays, what gets cut, what order works
- Identify dead sections: find pauses, tangents, repeated takes
- Generate edit decision list: timestamps for cuts, segments to keep
- Scaffold FFmpeg and Remotion code: generate the commands and compositions
Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."This layer is about structure, not final creative taste.
使用Claude Code或Codex完成以下工作:
- 转录与标记:生成转录文本,识别主题和核心内容
- 规划结构:决定保留、剪辑的内容以及最佳顺序
- 识别无效片段:找出停顿、离题内容和重复片段
- 生成编辑决策列表:剪辑时间戳和需保留的片段
- 生成FFmpeg和Remotion代码框架:生成命令和合成模板
示例提示词:
"这是一段4小时录制内容的转录文本。找出8个最适合制作24分钟vlog的精彩片段。为每个片段提供FFmpeg剪辑命令。"此环节专注于结构梳理,而非最终的创意风格。
Layer 3: Deterministic Cuts (FFmpeg)
环节3:确定性剪辑(FFmpeg)
FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.
FFmpeg负责完成枯燥但关键的工作:拆分、修剪、拼接和预处理。
Extract segment by timestamp
按时间戳提取片段
bash
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4bash
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4Batch cut from edit decision list
批量剪辑(基于编辑决策列表)
bash
#!/bin/bashbash
#!/bin/bashcuts.txt: start,end,label
cuts.txt: 开始时间,结束时间,标签
while IFS=, read -r start end label; do
ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt
undefinedwhile IFS=, read -r start end label; do
ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt
undefinedConcatenate segments
拼接片段
bash
undefinedbash
undefinedCreate file list
创建文件列表
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
undefinedfor f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
undefinedCreate proxy for faster editing
创建代理文件以加快编辑速度
bash
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4bash
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4Extract audio for transcription
提取音频用于转录
bash
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wavbash
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wavNormalize audio levels
音频电平归一化
bash
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4bash
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4Layer 4: Programmable Composition (Remotion)
环节4:可编程合成(Remotion)
Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:
Remotion将编辑问题转化为可组合的代码。用于解决传统编辑器难以处理的场景:
When to use Remotion
何时使用Remotion
- Overlays: text, images, branding, lower thirds
- Data visualizations: charts, stats, animated numbers
- Motion graphics: transitions, explainer animations
- Composable scenes: reusable templates across videos
- Product demos: annotated screenshots, UI highlights
- 叠加层:文本、图片、品牌标识、下三分之一字幕
- 数据可视化:图表、统计数据、动画数字
- 动态图形:转场、解说动画
- 可组合场景:可跨视频复用的模板
- 产品演示:带注释的截图、UI高亮
Basic Remotion composition
基础Remotion合成
tsx
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";
export const VlogComposition: React.FC = () => {
const frame = useCurrentFrame();
return (
<AbsoluteFill>
{/* Main footage */}
<Sequence from={0} durationInFrames={300}>
<Video src="/segments/intro.mp4" />
</Sequence>
{/* Title overlay */}
<Sequence from={30} durationInFrames={90}>
<AbsoluteFill style={{
justifyContent: "center",
alignItems: "center",
}}>
<h1 style={{
fontSize: 72,
color: "white",
textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
}}>
The AI Editing Stack
</h1>
</AbsoluteFill>
</Sequence>
{/* Next segment */}
<Sequence from={300} durationInFrames={450}>
<Video src="/segments/demo.mp4" />
</Sequence>
</AbsoluteFill>
);
};tsx
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";
export const VlogComposition: React.FC = () => {
const frame = useCurrentFrame();
return (
<AbsoluteFill>
{/* 主素材 */}
<Sequence from={0} durationInFrames={300}>
<Video src="/segments/intro.mp4" />
</Sequence>
{/* 标题叠加层 */}
<Sequence from={30} durationInFrames={90}>
<AbsoluteFill style={{
justifyContent: "center",
alignItems: "center",
}}>
<h1 style={{
fontSize: 72,
color: "white",
textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
}}>
The AI Editing Stack
</h1>
</AbsoluteFill>
</Sequence>
{/* 下一个片段 */}
<Sequence from={300} durationInFrames={450}>
<Video src="/segments/demo.mp4" />
</Sequence>
</AbsoluteFill>
);
};Render output
渲染输出
bash
npx remotion render src/index.ts VlogComposition output.mp4See the Remotion docs for detailed patterns and API reference.
bash
npx remotion render src/index.ts VlogComposition output.mp4查看Remotion文档获取详细模式和API参考。
Layer 5: Generated Assets (ElevenLabs / fal.ai)
环节5:生成资产(ElevenLabs / fal.ai)
Generate only what you need. Do not generate the whole video.
仅生成所需内容,不要生成完整视频。
Voiceover with ElevenLabs
使用ElevenLabs生成旁白
python
import os
import requests
resp = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your narration text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("voiceover.mp3", "wb") as f:
f.write(resp.content)python
import os
import requests
resp = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "你的旁白文本",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("voiceover.mp3", "wb") as f:
f.write(resp.content)Music and SFX with fal.ai
使用fal.ai生成音乐和音效
Use the skill for:
fal-ai-media- Background music generation
- Sound effects (ThinkSound model for video-to-audio)
- Transition sounds
使用技能完成:
fal-ai-media- 背景音乐生成
- 音效(使用ThinkSound模型实现视频转音频)
- 转场音效
Generated visuals with fal.ai
使用fal.ai生成视觉素材
Use for insert shots, thumbnails, or b-roll that doesn't exist:
generate(app_id: "fal-ai/nano-banana-pro", input_data: {
"prompt": "professional thumbnail for tech vlog, dark background, code on screen",
"image_size": "landscape_16_9"
})用于创建不存在的插入镜头、缩略图或B-roll素材:
generate(app_id: "fal-ai/nano-banana-pro", input_data: {
"prompt": "科技vlog专业缩略图,深色背景,屏幕显示代码",
"image_size": "landscape_16_9"
})VideoDB generative audio
VideoDB生成式音频
If VideoDB is configured:
python
voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")如果已配置VideoDB:
python
voiceover = coll.generate_voice(text="旁白内容", voice="alloy")
music = coll.generate_music(prompt="编程vlog的lo-fi背景音乐", duration=120)
sfx = coll.generate_sound_effect(prompt="柔和的嗖声转场音效")Layer 6: Final Polish (Descript / CapCut)
环节6:最终打磨(Descript / CapCut)
The last layer is human. Use a traditional editor for:
- Pacing: adjust cuts that feel too fast or slow
- Captions: auto-generated, then manually cleaned
- Color grading: basic correction and mood
- Final audio mix: balance voice, music, and SFX levels
- Export: platform-specific formats and quality settings
This is where taste lives. AI clears the repetitive work. You make the final calls.
最后一环由人工完成。使用传统编辑器处理:
- 节奏调整:修改过快或过慢的剪辑
- 字幕:自动生成后手动校对
- 色彩分级:基础校正和氛围调整
- 最终混音:平衡旁白、音乐和音效的音量
- 导出:适配平台的格式和质量设置
这是体现品味的环节。AI完成重复工作,你做出最终创意决策。
Social Media Reframing
社交媒体画幅调整
Different platforms need different aspect ratios:
| Platform | Aspect Ratio | Resolution |
|---|---|---|
| YouTube | 16:9 | 1920x1080 |
| TikTok / Reels | 9:16 | 1080x1920 |
| Instagram Feed | 1:1 | 1080x1080 |
| X / Twitter | 16:9 or 1:1 | 1280x720 or 720x720 |
不同平台需要不同的宽高比:
| 平台 | 宽高比 | 分辨率 |
|---|---|---|
| YouTube | 16:9 | 1920x1080 |
| TikTok / Reels | 9:16 | 1080x1920 |
| Instagram Feed | 1:1 | 1080x1080 |
| X / Twitter | 16:9 或 1:1 | 1280x720 或 720x720 |
Reframe with FFmpeg
使用FFmpeg调整画幅
bash
undefinedbash
undefined16:9 to 9:16 (center crop)
16:9 转 9:16(居中裁剪)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
16:9 to 1:1 (center crop)
16:9 转 1:1(居中裁剪)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
undefinedffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
undefinedReframe with VideoDB
使用VideoDB调整画幅
python
from videodb import ReframeModepython
from videodb import ReframeModeSmart reframe (AI-guided subject tracking)
智能调整画幅(AI引导主体跟踪)
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
undefinedreframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
undefinedScene Detection and Auto-Cut
场景检测与自动剪辑
FFmpeg scene detection
FFmpeg场景检测
bash
undefinedbash
undefinedDetect scene changes (threshold 0.3 = moderate sensitivity)
检测场景变化(阈值0.3=中等灵敏度)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
undefinedffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
undefinedSilence detection for auto-cut
静音检测与自动剪辑
bash
undefinedbash
undefinedFind silent segments (useful for cutting dead air)
检测静音片段(用于剪辑无效停顿)
ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
undefinedffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
undefinedHighlight extraction
精彩片段提取
Use Claude to analyze transcript + scene timestamps:
"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."使用Claude分析转录文本+场景时间戳:
"基于带时间戳的转录文本和这些场景变化点,找出5个最适合社交媒体的30秒精彩片段。"What Each Tool Does Best
各工具的优势与劣势
| Tool | Strength | Weakness |
|---|---|---|
| Claude / Codex | Organization, planning, code generation | Not the creative taste layer |
| FFmpeg | Deterministic cuts, batch processing, format conversion | No visual editing UI |
| Remotion | Programmable overlays, composable scenes, reusable templates | Learning curve for non-devs |
| Screen Studio | Polished screen recordings immediately | Only screen capture |
| ElevenLabs | Voice, narration, music, SFX | Not the center of the workflow |
| Descript / CapCut | Final pacing, captions, polish | Manual, not automatable |
| 工具 | 优势 | 劣势 |
|---|---|---|
| Claude / Codex | 素材整理、规划、代码生成 | 无创意审美能力 |
| FFmpeg | 确定性剪辑、批量处理、格式转换 | 无可视化编辑界面 |
| Remotion | 可编程叠加层、可组合场景、可复用模板 | 非开发者有学习曲线 |
| Screen Studio | 直接生成高质量屏幕录制内容 | 仅支持屏幕录制 |
| ElevenLabs | 语音、旁白、音乐、音效生成 | 非工作流核心 |
| Descript / CapCut | 最终节奏调整、字幕、打磨 | 手动操作,无法自动化 |
Key Principles
核心原则
- Edit, don't generate. This workflow is for cutting real footage, not creating from prompts.
- Structure before style. Get the story right in Layer 2 before touching anything visual.
- FFmpeg is the backbone. Boring but critical. Where long footage becomes manageable.
- Remotion for repeatability. If you'll do it more than once, make it a Remotion component.
- Generate selectively. Only use AI generation for assets that don't exist, not for everything.
- Taste is the last layer. AI clears repetitive work. You make the final creative calls.
- 剪辑而非生成:此工作流用于剪辑真实素材,而非通过提示词创建视频。
- 先结构后风格:在环节2确定故事结构后,再处理视觉内容。
- FFmpeg是核心:虽枯燥但关键,将长素材转化为可管理的片段。
- Remotion实现可重复性:如果某操作要做多次,将其做成Remotion组件。
- 选择性生成:仅用AI生成不存在的资产,而非所有内容。
- 审美是最后一环:AI完成重复工作,你做出最终创意决策。
Related Skills
相关技能
- — AI image, video, and audio generation
fal-ai-media - — Server-side video processing, indexing, and streaming
videodb - — Platform-native content distribution
content-engine
- — AI图像、视频和音频生成
fal-ai-media - — 服务端视频处理、索引和流传输
videodb - — 原生平台内容分发
content-engine