video-translation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Translation

视频翻译

Translate a video's speech into another language, using TTS to generate the dubbed audio and replacing the original audio track.
将视频中的语音翻译成另一种语言,使用TTS生成配音音频并替换原始音轨。

Triggers

触发词

  • translate this video
  • dub this video to English
  • 把视频从 X 语译成 Y 语
  • 视频翻译
  • 翻译这个视频
  • 将这个视频配音为英语
  • 把视频从 X 语译成 Y 语
  • 视频翻译

Use Cases

使用场景

  • The user wants to watch a foreign language YouTube video but prefers to hear it in their native language.
  • The user provides a video link and explicitly requests changing the audio language.
  • 用户想要观看外语YouTube视频,但更希望用母语收听。
  • 用户提供视频链接,并明确要求更改音频语言。

Workflow

工作流程

When the user asks to translate a video:
  1. Download Video & Subtitles: Use the
    youtube-downloader
    skill to download the video and its subtitles as SRT. Make sure you specify the source language to fetch the correct subtitle.
    bash
    python path/to/youtube-downloader/scripts/download_video.py "VIDEO_URL" --subtitles --sub-lang <source_lang_code> -o /tmp/video-translation
  2. Translate Subtitles: Read the downloaded
    .srt
    file. Translate its contents sentence by sentence into the target language using the following fixed prompt. Keep the exact same SRT index and timestamp format!
    Translation Prompt:
    Translate the following subtitle text from <Source Language> to <Target Language>. Provide ONLY the translated text. Do not explain, do not add notes, do not add index numbers. The translation must be colloquial, natural-sounding, and suitable for video dubbing.
    Save the translated text into a new file
    translated.srt
    .
  3. Generate Dubbed Audio: Use the
    tts
    skill to render the timeline-accurate audio from the translated SRT. The Noiz backend automatically aligns the duration of each sentence to the original video's subtitle timestamps.
    To ensure the cloned voice matches the original speaker's exact tone and emotion for each sentence, pass the original video file to
    --ref-audio-track
    . The TTS engine will automatically slice the original audio at each subtitle's exact timestamp and use it as the reference for that specific segment.
    Create a basic
    voice_map.json
    :
    json
    {
      "default": {
        "target_lang": "<target_lang_code>"
      }
    }
    Render the timeline-accurate audio:
    bash
    bash skills/tts/scripts/tts.sh render --srt translated.srt --voice-map voice_map.json --backend noiz --auto-emotion --ref-audio-track original_video.mp4 -o dubbed.wav
  4. Replace Audio in Video: Use the
    replace_audio.sh
    script to merge the original video with the new dubbed audio. To keep the original video's non-speech audio background outside of translated segments, pass the
    --srt
    file.
    bash
    bash skills/video-translation/scripts/replace_audio.sh --video original_video.mp4 --audio dubbed.wav --output final_video.mp4 --srt translated.srt
  5. Present the Result: Return the
    final_video.mp4
    file path to the user.
当用户要求翻译视频时:
  1. 下载视频与字幕: 使用
    youtube-downloader
    技能下载视频及其SRT格式的字幕。请确保指定源语言以获取正确的字幕。
    bash
    python path/to/youtube-downloader/scripts/download_video.py "VIDEO_URL" --subtitles --sub-lang <source_lang_code> -o /tmp/video-translation
  2. 翻译字幕: 读取下载的
    .srt
    文件。使用以下固定提示词将内容逐句翻译成目标语言。请严格保留SRT的索引和时间戳格式!
    翻译提示词:
    将以下字幕文本从<源语言>翻译为<目标语言>。 仅提供翻译后的文本。不要解释,不要添加注释,不要添加索引号。 翻译内容需口语化、听起来自然,且适合视频配音。
    将翻译后的文本保存到新文件
    translated.srt
    中。
  3. 生成配音音频: 使用
    tts
    技能根据翻译后的SRT生成时间线对齐的音频。Noiz后端会自动将每个句子的时长与原视频字幕的时间戳对齐。
    为确保克隆的语音与原说话者每个句子的语气和情绪完全匹配,请将原视频文件传递给
    --ref-audio-track
    参数。TTS引擎会自动在每个字幕的精确时间戳处切割原始音频,并将其作为该特定片段的参考。
    创建基础的
    voice_map.json
    文件:
    json
    {
      "default": {
        "target_lang": "<target_lang_code>"
      }
    }
    生成时间线对齐的音频:
    bash
    bash skills/tts/scripts/tts.sh render --srt translated.srt --voice-map voice_map.json --backend noiz --auto-emotion --ref-audio-track original_video.mp4 -o dubbed.wav
  4. 替换视频中的音频: 使用
    replace_audio.sh
    脚本将原视频与新的配音音频合并。若要在翻译片段之外保留原视频的非语音背景音,请传递
    --srt
    文件。
    bash
    bash skills/video-translation/scripts/replace_audio.sh --video original_video.mp4 --audio dubbed.wav --output final_video.mp4 --srt translated.srt
  5. 交付结果: 将
    final_video.mp4
    的文件路径返回给用户。

Inputs

输入项

  • Required inputs:
    • VIDEO_URL
      : The URL of the video to translate.
    • target_language
      : The language to translate the audio to.
  • Optional inputs:
    • source_language
      : The language of the original video (if not auto-detected or specified).
    • reference_audio
      : Specific audio file/URL to use for voice cloning instead of the dynamic original video track.
  • 必填输入项:
    • VIDEO_URL
      : 待翻译视频的URL。
    • target_language
      : 音频要翻译成的目标语言。
  • 可选输入项:
    • source_language
      : 原始视频的语言(如果未自动检测或指定)。
    • reference_audio
      : 用于语音克隆的特定音频文件/URL,替代动态的原始视频音轨。

Outputs

输出项

  • Success: Path to the final video file with replaced audio.
  • Failure: Clear error message specifying whether download, TTS, or audio replacement failed.
  • 成功:替换音频后的最终视频文件路径。
  • 失败:明确的错误信息,说明是下载、TTS生成还是音频替换环节失败。

Requirements

要求

  • Dependencies (other skills)
    • youtube-downloader (crazynomad/skills) — SKILL.md
      Install: clone or copy the
      skills/youtube-downloader
      directory from crazynomad/skills into your
      skills/
      folder so that
      skills/youtube-downloader/scripts/download_video.py
      is available.
    • tts (NoizAI/skills) — SKILL.md
      If not already in this repo: clone or copy the
      skills/tts
      directory from NoizAI/skills into your
      skills/
      folder. Ensure
      skills/tts/scripts/tts.sh
      and related scripts are present.
  • NOIZ_API_KEY
    configured for the Noiz backend. If it is not set, first guide the user to get an API key from
    https://developers.noiz.ai/api-keys
    . After the user provides the key, ask whether they want to persist it; if they agree, either write/update
    NOIZ_API_KEY=...
    in the project's
    .env
    file or run
    bash skills/tts/scripts/tts.sh config --set-api-key YOUR_KEY
    to store it.
  • ffmpeg
    installed.
  • 依赖项(其他技能)
    • youtube-downloader (crazynomad/skills) — SKILL.md
      安装:从crazynomad/skills克隆或复制
      skills/youtube-downloader
      目录到你的
      skills/
      文件夹中,确保
      skills/youtube-downloader/scripts/download_video.py
      可用。
    • tts (NoizAI/skills) — SKILL.md
      如果本仓库中没有该技能:从NoizAI/skills克隆或复制
      skills/tts
      目录到你的
      skills/
      文件夹中。确保
      skills/tts/scripts/tts.sh
      及相关脚本存在。
  • 为Noiz后端配置
    NOIZ_API_KEY
    。如果尚未设置,请先引导用户从
    https://developers.noiz.ai/api-keys
    获取API密钥。用户提供密钥后,询问是否要持久化存储;若用户同意,可在项目的
    .env
    文件中写入/更新
    NOIZ_API_KEY=...
    ,或运行
    bash skills/tts/scripts/tts.sh config --set-api-key YOUR_KEY
    来存储密钥。
  • 已安装
    ffmpeg

Limitations

局限性

  • The source video must have subtitles (or auto-generated subtitles) available on the platform for the source language.
  • Very long videos may take a significant amount of time to translate and dub.
  • 源视频必须在平台上有对应源语言的字幕(或自动生成的字幕)。
  • 超长视频的翻译和配音可能需要大量时间。