omnicaptions-transcribe

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Transcription

Gemini 转录

Transcribe audio/video using Google Gemini API with structured markdown output.

借助Google Gemini API将音频/视频转录为结构化Markdown格式的文本。

YouTube Video Workflow

YouTube视频转录流程

Important: Check for existing captions before transcribing:

1. Check captions: yt-dlp --list-subs "URL"
2. Has caption → Use /omnicaptions:download to get existing captions (better quality)
3. No caption → Transcribe directly with URL (don't download first!)

Confirm with user: Before transcribing, ask if they want to check for existing captions first.

重要提示：转录前请检查是否已有字幕：

1. 检查字幕：yt-dlp --list-subs "URL"
2. 已有字幕 → 使用 /omnicaptions:download 获取现有字幕（质量更高）
3. 无字幕 → 直接通过链接进行转录（无需先下载！）

与用户确认：转录前，请询问用户是否需要先检查现有字幕。

URL & Local File Support

链接与本地文件支持

Gemini natively supports YouTube URLs - no need to download, just pass the URL directly:

bash

undefined

Gemini原生支持YouTube链接 - 无需下载，直接传入链接即可：

bash

undefined

YouTube URL (recommended, no download needed)

YouTube链接（推荐，无需下载）

omnicaptions transcribe "https://www.youtube.com/watch?v=VIDEO_ID"

Local files

本地文件

omnicaptions transcribe video.mp4


**Note**: Output defaults to current directory unless user specifies `-o`.

omnicaptions transcribe video.mp4


**注意**：除非用户通过`-o`指定，否则输出文件默认保存至当前目录。

When to Use

适用场景

Video URLs - YouTube, direct video links (Gemini native support)
Transcribing podcasts, interviews, lectures
Need verbatim transcript with timestamps and speaker labels
Want auto-generated chapters from content
Mixed-language audio (code-switching preserved)

视频链接 - YouTube、直接视频链接（Gemini原生支持）
播客、访谈、讲座等内容的转录
需要带时间戳和说话人标签的逐字转录文本
希望根据内容自动生成章节
混合语言音频（保留语码转换内容）

When NOT to Use

不适用场景

Video has existing captions - Use
```
/omnicaptions:download
```
to get existing captions first
Need real-time streaming transcription (use Whisper)
Audio >2 hours (Gemini upload limit)
Want translation instead of transcription

视频已有字幕 - 优先使用
```
/omnicaptions:download
```
获取现有字幕
需要实时流转录（请使用Whisper）
音频时长超过2小时（Gemini上传限制）
需要的是翻译而非转录

Quick Reference

快速参考

Method	Description
`transcribe(path)`	Transcribe file or URL (sync)
`translate(in, out, lang)`	Translate captions
`write(text, path)`	Save text to file

方法	描述
`transcribe(path)`	转录文件或链接（同步）
`translate(in, out, lang)`	翻译字幕
`write(text, path)`	将文本保存至文件

Setup

安装配置

bash

pip install omni-captions-skills

bash

pip install omni-captions-skills

API Key

API密钥

Priority:

GEMINI_API_KEY

env →

.env

file →

~/.config/omnicaptions/config.json

If not set, ask user:

Please enter your Gemini API key (get from https://aistudio.google.com/apikey):

Then run with

-k <key>

. Key will be saved to config file automatically.

优先级：

GEMINI_API_KEY

环境变量 →

.env

文件 →

~/.config/omnicaptions/config.json

若未设置，需询问用户：

请输入您的Gemini API密钥（获取地址：https://aistudio.google.com/apikey）：

随后使用

-k <密钥>

参数运行命令，密钥将自动保存至配置文件。

CLI Usage

命令行使用方法

IMPORTANT: CLI requires subcommand (

transcribe

translate

convert

)

bash

undefined

重要提示：命令行需要指定子命令（

transcribe

、

translate

、

convert

）

bash

undefined

Transcribe (auto-output to same directory)

转录（自动输出至当前目录）

omnicaptions transcribe video.mp4 # → ./video_GeminiUnd.md omnicaptions transcribe "https://youtu.be/abc" # → ./abc_GeminiUnd.md

Specify output file or directory

指定输出文件或目录

omnicaptions transcribe video.mp4 -o output/ # → output/video_GeminiUnd.md omnicaptions transcribe video.mp4 -o my.md # → my.md

Options

可选参数

omnicaptions transcribe -m gemini-3-pro-preview video.mp4 omnicaptions transcribe -l zh video.mp4 # Force Chinese


| Option | Description |
|--------|-------------|
| `-k, --api-key` | Gemini API key (auto-prompted if missing) |
| `-o, --output` | Output file or directory (default: auto) |
| `-m, --model` | Model (default: gemini-3-flash-preview) |
| `-l, --language` | Force language (zh, en, ja) |
| `-t, --translate LANG` | Translate to language (one-step) |
| `--bilingual` | Bilingual output (with -t) |
| `-v, --verbose` | Verbose output |

omnicaptions transcribe -m gemini-3-pro-preview video.mp4 omnicaptions transcribe -l zh video.mp4 # 强制指定中文


| 参数 | 描述 |
|--------|-------------|
| `-k, --api-key` | Gemini API密钥（若缺失将自动提示输入） |
| `-o, --output` | 输出文件或目录（默认：自动生成） |
| `-m, --model` | 使用的模型（默认：gemini-3-flash-preview） |
| `-l, --language` | 强制指定语言（zh、en、ja） |
| `-t, --translate LANG` | 转录并翻译为指定语言（一步完成） |
| `--bilingual` | 生成双语字幕（需配合-t参数使用） |
| `-v, --verbose` | 输出详细日志 |

Bilingual Captions (Optional)

双语字幕（可选）

If user requests bilingual output, add

-t <lang> --bilingual

bash

omnicaptions transcribe video.mp4 -t zh --bilingual

For precise timing, use separate workflow: transcribe → LaiCut → translate (see Related Skills).

若用户需要双语输出，添加参数

-t <目标语言> --bilingual

：

bash

omnicaptions transcribe video.mp4 -t zh --bilingual

若需要更精准的时间轴，可使用分步流程：转录 → LaiCut 对齐 → 翻译（详见相关功能）。

Output Format

输出格式

markdown

undefined

markdown

undefined

[00:00:00] Introduction

[00:00:00] 开场介绍

Host: Welcome to the show. [00:00:01]

Guest: Thanks for having me. [00:00:05]

[Applause] [00:00:08]


Key features:
- `## [HH:MM:SS] Title` chapter headers
- `**Speaker:**` labels (auto-detected)
- `[HH:MM:SS]` timestamp at paragraph end
- `[Event]` for non-speech (laughter, music)

主持人： 欢迎来到本期节目。 [00:00:01]

嘉宾： 感谢邀请。 [00:00:05]

[掌声] [00:00:08]


核心特性：
- 以 `## [HH:MM:SS] 标题` 格式作为章节标题
- 自动识别并添加 `**说话人：**` 标签
- 段落末尾添加 `[HH:MM:SS]` 时间戳
- 用 `[事件]` 标注非语音内容（如笑声、音乐）

Common Mistakes

常见问题与解决方法

Mistake	Fix
No API key error	Use `-k YOUR_KEY` or follow the prompt
Empty response	Check file format (mp3/mp4/wav/m4a supported)
Upload timeout	File too large (>2GB); split first
Wrong language	Use `-l en` to force language

问题	解决方法
无API密钥错误	使用 `-k 您的密钥` 参数或按照提示输入
无返回结果	检查文件格式（支持mp3/mp4/wav/m4a）
上传超时	文件过大（超过2GB）；请先分割文件
语言识别错误	使用 `-l en` 等参数强制指定语言

Related Skills

Skill	Use When
`/omnicaptions:convert`	Convert output to SRT/VTT/ASS
`/omnicaptions:translate`	Translate (Gemini API or Claude native)
`/omnicaptions:download`	Download video/audio first

功能	适用场景
`/omnicaptions:convert`	将输出转换为SRT/VTT/ASS格式
`/omnicaptions:translate`	翻译字幕（支持Gemini API或原生Claude）
`/omnicaptions:download`	先下载视频/音频文件

Basic transcription

基础转录流程

omnicaptions transcribe video.mp4

→ video_GeminiUnd.md

Precise timing needed: transcribe → LaiCut align → convert

需要精准时间轴：转录 → LaiCut 对齐 → 转换格式

omnicaptions transcribe video.mp4 omnicaptions LaiCut video.mp4 video_GeminiUnd.md

→ video_GeminiUnd_LaiCut.json

omnicaptions convert video_GeminiUnd_LaiCut.json -o video_GeminiUnd_LaiCut.srt


> **Note**: For translation, use `/omnicaptions:translate` (default: Claude, optional: Gemini API)

omnicaptions convert video_GeminiUnd_LaiCut.json -o video_GeminiUnd_LaiCut.srt


> **注意**：如需翻译字幕，请使用 `/omnicaptions:translate`（默认使用Claude，可选Gemini API）

omnicaptions-transcribe

Original

Translation

Gemini Transcription

Gemini 转录

YouTube Video Workflow

YouTube视频转录流程

URL & Local File Support

链接与本地文件支持

YouTube URL (recommended, no download needed)

YouTube链接（推荐，无需下载）

Local files

本地文件

When to Use

适用场景

When NOT to Use

不适用场景

Quick Reference

快速参考

Setup

安装配置

API Key

API密钥

CLI Usage

命令行使用方法

Transcribe (auto-output to same directory)

转录（自动输出至当前目录）

Specify output file or directory

指定输出文件或目录

Options

可选参数

Bilingual Captions (Optional)

双语字幕（可选）

Output Format

输出格式

Table of Contents

目录

[00:00:00] Introduction

[00:00:00] 开场介绍

Common Mistakes

常见问题与解决方法

Related Skills

相关功能

Workflow Examples

流程示例

Basic transcription

基础转录流程

→ video_GeminiUnd.md

→ video_GeminiUnd.md

Precise timing needed: transcribe → LaiCut align → convert

需要精准时间轴：转录 → LaiCut 对齐 → 转换格式

→ video_GeminiUnd_LaiCut.json

→ video_GeminiUnd_LaiCut.json