video-editing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video Editing

视频编辑

AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.

AI辅助的真实素材编辑，而非通过提示词生成视频。快速编辑现有视频。

When to Activate

激活场景

User wants to edit, cut, or structure video footage
Turning long recordings into short-form content
Building vlogs, tutorials, or demo videos from raw capture
Adding overlays, subtitles, music, or voiceover to existing video
Reframing video for different platforms (YouTube, TikTok, Instagram)
User says "edit video", "cut this footage", "make a vlog", or "video workflow"

用户想要编辑、剪辑或结构化视频素材
将长视频转换为短视频内容
从原始素材制作vlog、教程或演示视频
为现有视频添加叠加层、字幕、音乐或旁白
为不同平台（YouTube、TikTok、Instagram）重新调整视频画幅
用户提及“编辑视频”、“剪辑这段素材”、“制作vlog”或“视频工作流”

Core Thesis

核心理念

AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.

当你不再让AI生成完整视频，而是用它来压缩、结构化和增强真实素材时，AI视频编辑才真正有用。其价值不在于生成，而在于压缩（简化流程）。

The Pipeline

工作流

Screen Studio / raw footage
  → Claude / Codex
  → FFmpeg
  → Remotion
  → ElevenLabs / fal.ai
  → Descript or CapCut

Each layer has a specific job. Do not skip layers. Do not try to make one tool do everything.

Screen Studio / 原始素材
  → Claude / Codex
  → FFmpeg
  → Remotion
  → ElevenLabs / fal.ai
  → Descript 或 CapCut

每个环节都有特定作用，不要跳过任何环节，也不要试图用单一工具完成所有工作。

Layer 1: Capture (Screen Studio / Raw Footage)

环节1：素材采集（Screen Studio / 原始素材）

Collect the source material:

Screen Studio: polished screen recordings for app demos, coding sessions, browser workflows
Raw camera footage: vlog footage, interviews, event recordings
Desktop capture via VideoDB: session recording with real-time context (see
```
videodb
```
skill)

Output: raw files ready for organization.

收集源素材：

Screen Studio：用于应用演示、编码会话、浏览器操作流程的高质量屏幕录制
原始相机素材：vlog素材、采访、活动录制内容
通过VideoDB进行桌面采集：带实时上下文的会话录制（详见
```
videodb
```
技能）

输出：可用于整理的原始文件。

Layer 2: Organization (Claude / Codex)

环节2：素材整理（Claude / Codex）

Use Claude Code or Codex to:

Transcribe and label: generate transcript, identify topics and themes
Plan structure: decide what stays, what gets cut, what order works
Identify dead sections: find pauses, tangents, repeated takes
Generate edit decision list: timestamps for cuts, segments to keep
Scaffold FFmpeg and Remotion code: generate the commands and compositions

Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."

This layer is about structure, not final creative taste.

使用Claude Code或Codex完成以下工作：

转录与标记：生成转录文本，识别主题和核心内容
规划结构：决定保留、剪辑的内容以及最佳顺序
识别无效片段：找出停顿、离题内容和重复片段
生成编辑决策列表：剪辑时间戳和需保留的片段
生成FFmpeg和Remotion代码框架：生成命令和合成模板

示例提示词：
"这是一段4小时录制内容的转录文本。找出8个最适合制作24分钟vlog的精彩片段。为每个片段提供FFmpeg剪辑命令。"

此环节专注于结构梳理，而非最终的创意风格。

Layer 3: Deterministic Cuts (FFmpeg)

环节3：确定性剪辑（FFmpeg）

FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.

FFmpeg负责完成枯燥但关键的工作：拆分、修剪、拼接和预处理。

Extract segment by timestamp

按时间戳提取片段

bash

ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4

bash

ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4

Batch cut from edit decision list

批量剪辑（基于编辑决策列表）

bash

#!/bin/bash

bash

#!/bin/bash

cuts.txt: start,end,label

cuts.txt: 开始时间,结束时间,标签

while IFS=, read -r start end label; do ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4" done < cuts.txt

undefined

while IFS=, read -r start end label; do ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4" done < cuts.txt

undefined

Concatenate segments

拼接片段

bash

undefined

bash

undefined

Create file list

创建文件列表

for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4

undefined

for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4

undefined

Create proxy for faster editing

创建代理文件以加快编辑速度

bash

ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4

bash

ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4

Extract audio for transcription

提取音频用于转录

bash

ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav

bash

ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav

Normalize audio levels

音频电平归一化

bash

ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4

bash

ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4

Layer 4: Programmable Composition (Remotion)

环节4：可编程合成（Remotion）

Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:

Remotion将编辑问题转化为可组合的代码。用于解决传统编辑器难以处理的场景：

When to use Remotion

何时使用Remotion

Overlays: text, images, branding, lower thirds
Data visualizations: charts, stats, animated numbers
Motion graphics: transitions, explainer animations
Composable scenes: reusable templates across videos
Product demos: annotated screenshots, UI highlights

叠加层：文本、图片、品牌标识、下三分之一字幕
数据可视化：图表、统计数据、动画数字
动态图形：转场、解说动画
可组合场景：可跨视频复用的模板
产品演示：带注释的截图、UI高亮

Basic Remotion composition

基础Remotion合成

tsx

import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";

export const VlogComposition: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <AbsoluteFill>
      {/* Main footage */}
      <Sequence from={0} durationInFrames={300}>
        <Video src="/segments/intro.mp4" />
      </Sequence>

      {/* Title overlay */}
      <Sequence from={30} durationInFrames={90}>
        <AbsoluteFill style={{
          justifyContent: "center",
          alignItems: "center",
        }}>
          <h1 style={{
            fontSize: 72,
            color: "white",
            textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
          }}>
            The AI Editing Stack
          </h1>
        </AbsoluteFill>
      </Sequence>

      {/* Next segment */}
      <Sequence from={300} durationInFrames={450}>
        <Video src="/segments/demo.mp4" />
      </Sequence>
    </AbsoluteFill>
  );
};

tsx

import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";

export const VlogComposition: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <AbsoluteFill>
      {/* 主素材 */}
      <Sequence from={0} durationInFrames={300}>
        <Video src="/segments/intro.mp4" />
      </Sequence>

      {/* 标题叠加层 */}
      <Sequence from={30} durationInFrames={90}>
        <AbsoluteFill style={{
          justifyContent: "center",
          alignItems: "center",
        }}>
          <h1 style={{
            fontSize: 72,
            color: "white",
            textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
          }}>
            The AI Editing Stack
          </h1>
        </AbsoluteFill>
      </Sequence>

      {/* 下一个片段 */}
      <Sequence from={300} durationInFrames={450}>
        <Video src="/segments/demo.mp4" />
      </Sequence>
    </AbsoluteFill>
  );
};

Render output

渲染输出

bash

npx remotion render src/index.ts VlogComposition output.mp4

See the Remotion docs for detailed patterns and API reference.

bash

npx remotion render src/index.ts VlogComposition output.mp4

查看Remotion文档获取详细模式和API参考。

Layer 5: Generated Assets (ElevenLabs / fal.ai)

环节5：生成资产（ElevenLabs / fal.ai）

Generate only what you need. Do not generate the whole video.

仅生成所需内容，不要生成完整视频。

Voiceover with ElevenLabs

使用ElevenLabs生成旁白

python

import os
import requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your narration text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)

python

import os
import requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "你的旁白文本",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)

Music and SFX with fal.ai

使用fal.ai生成音乐和音效

Use the

fal-ai-media

skill for:

Background music generation
Sound effects (ThinkSound model for video-to-audio)
Transition sounds

使用

fal-ai-media

技能完成：

背景音乐生成
音效（使用ThinkSound模型实现视频转音频）
转场音效

Generated visuals with fal.ai

使用fal.ai生成视觉素材

Use for insert shots, thumbnails, or b-roll that doesn't exist:

generate(app_id: "fal-ai/nano-banana-pro", input_data: {
  "prompt": "professional thumbnail for tech vlog, dark background, code on screen",
  "image_size": "landscape_16_9"
})

用于创建不存在的插入镜头、缩略图或B-roll素材：

generate(app_id: "fal-ai/nano-banana-pro", input_data: {
  "prompt": "科技vlog专业缩略图，深色背景，屏幕显示代码",
  "image_size": "landscape_16_9"
})

VideoDB generative audio

VideoDB生成式音频

If VideoDB is configured:

python

voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")

如果已配置VideoDB：

python

voiceover = coll.generate_voice(text="旁白内容", voice="alloy")
music = coll.generate_music(prompt="编程vlog的lo-fi背景音乐", duration=120)
sfx = coll.generate_sound_effect(prompt="柔和的嗖声转场音效")

Layer 6: Final Polish (Descript / CapCut)

环节6：最终打磨（Descript / CapCut）

The last layer is human. Use a traditional editor for:

Pacing: adjust cuts that feel too fast or slow
Captions: auto-generated, then manually cleaned
Color grading: basic correction and mood
Final audio mix: balance voice, music, and SFX levels
Export: platform-specific formats and quality settings

This is where taste lives. AI clears the repetitive work. You make the final calls.

最后一环由人工完成。使用传统编辑器处理：

节奏调整：修改过快或过慢的剪辑
字幕：自动生成后手动校对
色彩分级：基础校正和氛围调整
最终混音：平衡旁白、音乐和音效的音量
导出：适配平台的格式和质量设置

这是体现品味的环节。AI完成重复工作，你做出最终创意决策。

Social Media Reframing

社交媒体画幅调整

Different platforms need different aspect ratios:

Platform	Aspect Ratio	Resolution
YouTube	16:9	1920x1080
TikTok / Reels	9:16	1080x1920
Instagram Feed	1:1	1080x1080
X / Twitter	16:9 or 1:1	1280x720 or 720x720

不同平台需要不同的宽高比：

平台	宽高比	分辨率
YouTube	16:9	1920x1080
TikTok / Reels	9:16	1080x1920
Instagram Feed	1:1	1080x1080
X / Twitter	16:9 或 1:1	1280x720 或 720x720

Reframe with FFmpeg

使用FFmpeg调整画幅

bash

undefined

bash

undefined

16:9 to 9:16 (center crop)

16:9 转 9:16（居中裁剪）

ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4

16:9 to 1:1 (center crop)

16:9 转 1:1（居中裁剪）

ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4

undefined

ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4

undefined

Reframe with VideoDB

使用VideoDB调整画幅

python

from videodb import ReframeMode

python

from videodb import ReframeMode

Smart reframe (AI-guided subject tracking)

智能调整画幅（AI引导主体跟踪）

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

undefined

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

undefined

Scene Detection and Auto-Cut

场景检测与自动剪辑

FFmpeg scene detection

FFmpeg场景检测

bash

undefined

bash

undefined

Detect scene changes (threshold 0.3 = moderate sensitivity)

检测场景变化（阈值0.3=中等灵敏度）

ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo

undefined

ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo

undefined

Silence detection for auto-cut

静音检测与自动剪辑

bash

undefined

bash

undefined

Find silent segments (useful for cutting dead air)

检测静音片段（用于剪辑无效停顿）

ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence

undefined

ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence

undefined

Highlight extraction

精彩片段提取

Use Claude to analyze transcript + scene timestamps:

"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."

使用Claude分析转录文本+场景时间戳：

"基于带时间戳的转录文本和这些场景变化点，找出5个最适合社交媒体的30秒精彩片段。"

What Each Tool Does Best

各工具的优势与劣势

Tool	Strength	Weakness
Claude / Codex	Organization, planning, code generation	Not the creative taste layer
FFmpeg	Deterministic cuts, batch processing, format conversion	No visual editing UI
Remotion	Programmable overlays, composable scenes, reusable templates	Learning curve for non-devs
Screen Studio	Polished screen recordings immediately	Only screen capture
ElevenLabs	Voice, narration, music, SFX	Not the center of the workflow
Descript / CapCut	Final pacing, captions, polish	Manual, not automatable

工具	优势	劣势
Claude / Codex	素材整理、规划、代码生成	无创意审美能力
FFmpeg	确定性剪辑、批量处理、格式转换	无可视化编辑界面
Remotion	可编程叠加层、可组合场景、可复用模板	非开发者有学习曲线
Screen Studio	直接生成高质量屏幕录制内容	仅支持屏幕录制
ElevenLabs	语音、旁白、音乐、音效生成	非工作流核心
Descript / CapCut	最终节奏调整、字幕、打磨	手动操作，无法自动化

Key Principles

核心原则

Edit, don't generate. This workflow is for cutting real footage, not creating from prompts.
Structure before style. Get the story right in Layer 2 before touching anything visual.
FFmpeg is the backbone. Boring but critical. Where long footage becomes manageable.
Remotion for repeatability. If you'll do it more than once, make it a Remotion component.
Generate selectively. Only use AI generation for assets that don't exist, not for everything.
Taste is the last layer. AI clears repetitive work. You make the final creative calls.

剪辑而非生成：此工作流用于剪辑真实素材，而非通过提示词创建视频。
先结构后风格：在环节2确定故事结构后，再处理视觉内容。
FFmpeg是核心：虽枯燥但关键，将长素材转化为可管理的片段。
Remotion实现可重复性：如果某操作要做多次，将其做成Remotion组件。
选择性生成：仅用AI生成不存在的资产，而非所有内容。
审美是最后一环：AI完成重复工作，你做出最终创意决策。