video-editing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Editing

视频编辑

AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.
AI辅助的真实素材编辑,而非通过提示词生成视频。快速编辑现有视频。

When to Activate

激活场景

  • User wants to edit, cut, or structure video footage
  • Turning long recordings into short-form content
  • Building vlogs, tutorials, or demo videos from raw capture
  • Adding overlays, subtitles, music, or voiceover to existing video
  • Reframing video for different platforms (YouTube, TikTok, Instagram)
  • User says "edit video", "cut this footage", "make a vlog", or "video workflow"
  • 用户想要编辑、剪辑或结构化视频素材
  • 将长视频转换为短视频内容
  • 从原始素材制作vlog、教程或演示视频
  • 为现有视频添加叠加层、字幕、音乐或旁白
  • 为不同平台(YouTube、TikTok、Instagram)重新调整视频画幅
  • 用户提及“编辑视频”、“剪辑这段素材”、“制作vlog”或“视频工作流”

Core Thesis

核心理念

AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.
当你不再让AI生成完整视频,而是用它来压缩、结构化和增强真实素材时,AI视频编辑才真正有用。其价值不在于生成,而在于压缩(简化流程)。

The Pipeline

工作流

Screen Studio / raw footage
  → Claude / Codex
  → FFmpeg
  → Remotion
  → ElevenLabs / fal.ai
  → Descript or CapCut
Each layer has a specific job. Do not skip layers. Do not try to make one tool do everything.
Screen Studio / 原始素材
  → Claude / Codex
  → FFmpeg
  → Remotion
  → ElevenLabs / fal.ai
  → Descript 或 CapCut
每个环节都有特定作用,不要跳过任何环节,也不要试图用单一工具完成所有工作。

Layer 1: Capture (Screen Studio / Raw Footage)

环节1:素材采集(Screen Studio / 原始素材)

Collect the source material:
  • Screen Studio: polished screen recordings for app demos, coding sessions, browser workflows
  • Raw camera footage: vlog footage, interviews, event recordings
  • Desktop capture via VideoDB: session recording with real-time context (see
    videodb
    skill)
Output: raw files ready for organization.
收集源素材:
  • Screen Studio:用于应用演示、编码会话、浏览器操作流程的高质量屏幕录制
  • 原始相机素材:vlog素材、采访、活动录制内容
  • 通过VideoDB进行桌面采集:带实时上下文的会话录制(详见
    videodb
    技能)
输出:可用于整理的原始文件。

Layer 2: Organization (Claude / Codex)

环节2:素材整理(Claude / Codex)

Use Claude Code or Codex to:
  • Transcribe and label: generate transcript, identify topics and themes
  • Plan structure: decide what stays, what gets cut, what order works
  • Identify dead sections: find pauses, tangents, repeated takes
  • Generate edit decision list: timestamps for cuts, segments to keep
  • Scaffold FFmpeg and Remotion code: generate the commands and compositions
Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."
This layer is about structure, not final creative taste.
使用Claude Code或Codex完成以下工作:
  • 转录与标记:生成转录文本,识别主题和核心内容
  • 规划结构:决定保留、剪辑的内容以及最佳顺序
  • 识别无效片段:找出停顿、离题内容和重复片段
  • 生成编辑决策列表:剪辑时间戳和需保留的片段
  • 生成FFmpeg和Remotion代码框架:生成命令和合成模板
示例提示词:
"这是一段4小时录制内容的转录文本。找出8个最适合制作24分钟vlog的精彩片段。为每个片段提供FFmpeg剪辑命令。"
此环节专注于结构梳理,而非最终的创意风格。

Layer 3: Deterministic Cuts (FFmpeg)

环节3:确定性剪辑(FFmpeg)

FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.
FFmpeg负责完成枯燥但关键的工作:拆分、修剪、拼接和预处理。

Extract segment by timestamp

按时间戳提取片段

bash
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4
bash
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4

Batch cut from edit decision list

批量剪辑(基于编辑决策列表)

bash
#!/bin/bash
bash
#!/bin/bash

cuts.txt: start,end,label

cuts.txt: 开始时间,结束时间,标签

while IFS=, read -r start end label; do ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4" done < cuts.txt
undefined
while IFS=, read -r start end label; do ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4" done < cuts.txt
undefined

Concatenate segments

拼接片段

bash
undefined
bash
undefined

Create file list

创建文件列表

for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
undefined
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
undefined

Create proxy for faster editing

创建代理文件以加快编辑速度

bash
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4
bash
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4

Extract audio for transcription

提取音频用于转录

bash
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav
bash
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav

Normalize audio levels

音频电平归一化

bash
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4
bash
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4

Layer 4: Programmable Composition (Remotion)

环节4:可编程合成(Remotion)

Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:
Remotion将编辑问题转化为可组合的代码。用于解决传统编辑器难以处理的场景:

When to use Remotion

何时使用Remotion

  • Overlays: text, images, branding, lower thirds
  • Data visualizations: charts, stats, animated numbers
  • Motion graphics: transitions, explainer animations
  • Composable scenes: reusable templates across videos
  • Product demos: annotated screenshots, UI highlights
  • 叠加层:文本、图片、品牌标识、下三分之一字幕
  • 数据可视化:图表、统计数据、动画数字
  • 动态图形:转场、解说动画
  • 可组合场景:可跨视频复用的模板
  • 产品演示:带注释的截图、UI高亮

Basic Remotion composition

基础Remotion合成

tsx
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";

export const VlogComposition: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <AbsoluteFill>
      {/* Main footage */}
      <Sequence from={0} durationInFrames={300}>
        <Video src="/segments/intro.mp4" />
      </Sequence>

      {/* Title overlay */}
      <Sequence from={30} durationInFrames={90}>
        <AbsoluteFill style={{
          justifyContent: "center",
          alignItems: "center",
        }}>
          <h1 style={{
            fontSize: 72,
            color: "white",
            textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
          }}>
            The AI Editing Stack
          </h1>
        </AbsoluteFill>
      </Sequence>

      {/* Next segment */}
      <Sequence from={300} durationInFrames={450}>
        <Video src="/segments/demo.mp4" />
      </Sequence>
    </AbsoluteFill>
  );
};
tsx
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";

export const VlogComposition: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <AbsoluteFill>
      {/* 主素材 */}
      <Sequence from={0} durationInFrames={300}>
        <Video src="/segments/intro.mp4" />
      </Sequence>

      {/* 标题叠加层 */}
      <Sequence from={30} durationInFrames={90}>
        <AbsoluteFill style={{
          justifyContent: "center",
          alignItems: "center",
        }}>
          <h1 style={{
            fontSize: 72,
            color: "white",
            textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
          }}>
            The AI Editing Stack
          </h1>
        </AbsoluteFill>
      </Sequence>

      {/* 下一个片段 */}
      <Sequence from={300} durationInFrames={450}>
        <Video src="/segments/demo.mp4" />
      </Sequence>
    </AbsoluteFill>
  );
};

Render output

渲染输出

bash
npx remotion render src/index.ts VlogComposition output.mp4
See the Remotion docs for detailed patterns and API reference.
bash
npx remotion render src/index.ts VlogComposition output.mp4
查看Remotion文档获取详细模式和API参考。

Layer 5: Generated Assets (ElevenLabs / fal.ai)

环节5:生成资产(ElevenLabs / fal.ai)

Generate only what you need. Do not generate the whole video.
仅生成所需内容,不要生成完整视频。

Voiceover with ElevenLabs

使用ElevenLabs生成旁白

python
import os
import requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your narration text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)
python
import os
import requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "你的旁白文本",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)

Music and SFX with fal.ai

使用fal.ai生成音乐和音效

Use the
fal-ai-media
skill for:
  • Background music generation
  • Sound effects (ThinkSound model for video-to-audio)
  • Transition sounds
使用
fal-ai-media
技能完成:
  • 背景音乐生成
  • 音效(使用ThinkSound模型实现视频转音频)
  • 转场音效

Generated visuals with fal.ai

使用fal.ai生成视觉素材

Use for insert shots, thumbnails, or b-roll that doesn't exist:
generate(app_id: "fal-ai/nano-banana-pro", input_data: {
  "prompt": "professional thumbnail for tech vlog, dark background, code on screen",
  "image_size": "landscape_16_9"
})
用于创建不存在的插入镜头、缩略图或B-roll素材:
generate(app_id: "fal-ai/nano-banana-pro", input_data: {
  "prompt": "科技vlog专业缩略图,深色背景,屏幕显示代码",
  "image_size": "landscape_16_9"
})

VideoDB generative audio

VideoDB生成式音频

If VideoDB is configured:
python
voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")
如果已配置VideoDB:
python
voiceover = coll.generate_voice(text="旁白内容", voice="alloy")
music = coll.generate_music(prompt="编程vlog的lo-fi背景音乐", duration=120)
sfx = coll.generate_sound_effect(prompt="柔和的嗖声转场音效")

Layer 6: Final Polish (Descript / CapCut)

环节6:最终打磨(Descript / CapCut)

The last layer is human. Use a traditional editor for:
  • Pacing: adjust cuts that feel too fast or slow
  • Captions: auto-generated, then manually cleaned
  • Color grading: basic correction and mood
  • Final audio mix: balance voice, music, and SFX levels
  • Export: platform-specific formats and quality settings
This is where taste lives. AI clears the repetitive work. You make the final calls.
最后一环由人工完成。使用传统编辑器处理:
  • 节奏调整:修改过快或过慢的剪辑
  • 字幕:自动生成后手动校对
  • 色彩分级:基础校正和氛围调整
  • 最终混音:平衡旁白、音乐和音效的音量
  • 导出:适配平台的格式和质量设置
这是体现品味的环节。AI完成重复工作,你做出最终创意决策。

Social Media Reframing

社交媒体画幅调整

Different platforms need different aspect ratios:
PlatformAspect RatioResolution
YouTube16:91920x1080
TikTok / Reels9:161080x1920
Instagram Feed1:11080x1080
X / Twitter16:9 or 1:11280x720 or 720x720
不同平台需要不同的宽高比:
平台宽高比分辨率
YouTube16:91920x1080
TikTok / Reels9:161080x1920
Instagram Feed1:11080x1080
X / Twitter16:9 或 1:11280x720 或 720x720

Reframe with FFmpeg

使用FFmpeg调整画幅

bash
undefined
bash
undefined

16:9 to 9:16 (center crop)

16:9 转 9:16(居中裁剪)

ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4

16:9 to 1:1 (center crop)

16:9 转 1:1(居中裁剪)

ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
undefined
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
undefined

Reframe with VideoDB

使用VideoDB调整画幅

python
from videodb import ReframeMode
python
from videodb import ReframeMode

Smart reframe (AI-guided subject tracking)

智能调整画幅(AI引导主体跟踪)

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
undefined
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
undefined

Scene Detection and Auto-Cut

场景检测与自动剪辑

FFmpeg scene detection

FFmpeg场景检测

bash
undefined
bash
undefined

Detect scene changes (threshold 0.3 = moderate sensitivity)

检测场景变化(阈值0.3=中等灵敏度)

ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
undefined
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
undefined

Silence detection for auto-cut

静音检测与自动剪辑

bash
undefined
bash
undefined

Find silent segments (useful for cutting dead air)

检测静音片段(用于剪辑无效停顿)

ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
undefined
ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
undefined

Highlight extraction

精彩片段提取

Use Claude to analyze transcript + scene timestamps:
"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."
使用Claude分析转录文本+场景时间戳:
"基于带时间戳的转录文本和这些场景变化点,找出5个最适合社交媒体的30秒精彩片段。"

What Each Tool Does Best

各工具的优势与劣势

ToolStrengthWeakness
Claude / CodexOrganization, planning, code generationNot the creative taste layer
FFmpegDeterministic cuts, batch processing, format conversionNo visual editing UI
RemotionProgrammable overlays, composable scenes, reusable templatesLearning curve for non-devs
Screen StudioPolished screen recordings immediatelyOnly screen capture
ElevenLabsVoice, narration, music, SFXNot the center of the workflow
Descript / CapCutFinal pacing, captions, polishManual, not automatable
工具优势劣势
Claude / Codex素材整理、规划、代码生成无创意审美能力
FFmpeg确定性剪辑、批量处理、格式转换无可视化编辑界面
Remotion可编程叠加层、可组合场景、可复用模板非开发者有学习曲线
Screen Studio直接生成高质量屏幕录制内容仅支持屏幕录制
ElevenLabs语音、旁白、音乐、音效生成非工作流核心
Descript / CapCut最终节奏调整、字幕、打磨手动操作,无法自动化

Key Principles

核心原则

  1. Edit, don't generate. This workflow is for cutting real footage, not creating from prompts.
  2. Structure before style. Get the story right in Layer 2 before touching anything visual.
  3. FFmpeg is the backbone. Boring but critical. Where long footage becomes manageable.
  4. Remotion for repeatability. If you'll do it more than once, make it a Remotion component.
  5. Generate selectively. Only use AI generation for assets that don't exist, not for everything.
  6. Taste is the last layer. AI clears repetitive work. You make the final creative calls.
  1. 剪辑而非生成:此工作流用于剪辑真实素材,而非通过提示词创建视频。
  2. 先结构后风格:在环节2确定故事结构后,再处理视觉内容。
  3. FFmpeg是核心:虽枯燥但关键,将长素材转化为可管理的片段。
  4. Remotion实现可重复性:如果某操作要做多次,将其做成Remotion组件。
  5. 选择性生成:仅用AI生成不存在的资产,而非所有内容。
  6. 审美是最后一环:AI完成重复工作,你做出最终创意决策。

Related Skills

相关技能

  • fal-ai-media
    — AI image, video, and audio generation
  • videodb
    — Server-side video processing, indexing, and streaming
  • content-engine
    — Platform-native content distribution
  • fal-ai-media
    — AI图像、视频和音频生成
  • videodb
    — 服务端视频处理、索引和流传输
  • content-engine
    — 原生平台内容分发