gemini-translate

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Translate

Batch-translate content files (markdown, JSON, YAML frontmatter) using Gemini CLI as a translation subagent. Claude orchestrates the pipeline: identifies gaps, builds prompts with glossary context, dispatches to Gemini in a single CLI call, validates output structure, and writes files.

使用Gemini CLI作为翻译子代理，批量翻译内容文件（markdown、JSON、YAML前置元数据）。Claude统筹整个流程：识别翻译缺口、结合术语表上下文构建提示词、通过单次CLI调用分发任务给Gemini、验证输出结构并写入文件。

Why Gemini CLI

为何选择Gemini CLI

Uses your Google AI Ultra plan via OAuth (no API key needed)
1M token context fits entire glossaries + dozens of source files in one call
Single startup cost (~13s) instead of per-file overhead
Claude stays in control of orchestration, validation, and file writes

通过OAuth调用你的Google AI Ultra套餐（无需API密钥）
100万token上下文窗口可容纳完整术语表+数十个源文件，实现单次调用
仅需单次启动成本（约13秒），无需为每个文件单独付出开销
由Claude全程把控流程编排、验证与文件写入操作

Prerequisites

前置条件

Gemini CLI installed and authenticated (
```
gemini
```
on PATH, OAuth configured)
Source content files in a consistent structure (markdown with frontmatter, JSON, etc.)

已安装并完成Gemini CLI认证（
```
gemini
```
已添加至PATH，OAuth配置完成）
源内容文件结构统一（如带前置元数据的markdown、JSON等）

Pipeline Overview

流程概览

Claude: find translation gaps (missing .es.* files, parity tests)
  |
Claude: read source files + glossary + existing translations for tone
  |
Claude: build batch prompt and call gemini-translate.sh
  |
Gemini: translate all files in one shot, return JSON
  |
Claude: parse response, validate structure, write .es.* files
  |
Claude: run project tests (i18n symmetry, coverage)

Claude: find translation gaps (missing .es.* files, parity tests)
  |
Claude: read source files + glossary + existing translations for tone
  |
Claude: build batch prompt and call gemini-translate.sh
  |
Gemini: translate all files in one shot, return JSON
  |
Claude: parse response, validate structure, write .es.* files
  |
Claude: run project tests (i18n symmetry, coverage)

Usage

使用方法

Step 1: Identify gaps

步骤1：识别翻译缺口

Find content files missing their locale counterpart:

bash

undefined

查找缺少对应语言版本的内容文件：

bash

undefined

Generic pattern -- adjust paths and extensions for your project

for f in content/**/.md; do base=$(basename "$f") [[ "$base" == .es. ]] && continue name="${base%.}" dir=$(dirname "$f") [ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f" done


Or run your project's i18n parity tests if they exist.

for f in content/**/.md; do base=$(basename "$f") [[ "$base" == .es. ]] && continue name="${base%.}" dir=$(dirname "$f") [ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f" done


或者如果项目已有国际化一致性测试，直接运行该测试即可。

Step 2: Prepare the glossary

步骤2：准备术语表

Create a glossary of terms that must be translated consistently. The glossary is a simple text block embedded in the prompt:

undefined

创建一份需保持翻译一致性的术语表。术语表是嵌入提示词中的简单文本块：

undefined

Glossary (EN -> ES)

"OB/GYN Physician" -> "Medico OB/GYN"
"High-risk pregnancy" -> "Embarazo de alto riesgo"
"Certified Nurse Midwife" -> "Enfermera Partera Certificada"


If your project has an existing glossary (Python dict, CSV, JSON), convert it to this format before calling the script. The script accepts a glossary file via `--glossary`.

"OB/GYN Physician" -> "Medico OB/GYN"
"High-risk pregnancy" -> "Embarazo de alto riesgo"
"Certified Nurse Midwife" -> "Enfermera Partera Certificada"


如果项目已有现成术语表（如Python字典、CSV、JSON格式），请先转换为此格式再调用脚本。脚本可通过`--glossary`参数接收术语表文件。

Step 3: Run the batch translation

步骤3：运行批量翻译

bash

bash gemini-translate.sh \
  --source-lang en \
  --target-lang es \
  --glossary glossary.txt \
  --model gemini-2.5-pro \
  --instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
  file1.md file2.md file3.md

The script:

Reads all source files
Builds a single prompt with glossary + instructions + all file contents

Calls

gemini -p "..." -o json --approval-mode plan

Parses the JSON response and prints each translation to stdout as a JSON array

bash

bash gemini-translate.sh \
  --source-lang en \
  --target-lang es \
  --glossary glossary.txt \
  --model gemini-2.5-pro \
  --instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
  file1.md file2.md file3.md

该脚本会执行以下操作：

读取所有源文件
构建包含术语表+指令+所有文件内容的单次提示词

调用

gemini -p "..." -o json --approval-mode plan

解析JSON响应并将每个翻译结果以JSON数组形式输出至标准输出

Step 4: Claude validates and writes

步骤4：Claude验证并写入文件

After the script returns, Claude should:

Parse the JSON output
For each translated file:
- Verify frontmatter keys match the source exactly
- Verify links, image paths, and brand names are preserved
- Verify no
```

```
  flags from Gemini (or surface them to the user)
Write the
```
.es.*
```
files
Run the project's i18n tests

脚本返回结果后，Claude需要执行以下操作：

解析JSON输出
针对每个翻译文件：
- 验证前置元数据键与源文件完全匹配
- 验证链接、图片路径和品牌名称均已保留
- 验证Gemini未添加
```

```
  标记（若有则告知用户）
写入
```
.es.*
```
格式的文件
运行项目的国际化测试

Script Reference

脚本参考

gemini-translate.sh

gemini-translate.sh

Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]

Options:
  --source-lang LANG    Source language code (default: en)
  --target-lang LANG    Target language code (default: es)
  --glossary FILE       Path to glossary file (term mappings, one per line)
  --instructions TEXT   Additional translation instructions for tone/style
  --model MODEL         Gemini model override (default: system default)
  --max-tokens N        Max estimated input tokens per batch (default: 80000)
  --gemini-bin PATH     Path to gemini binary (bypasses wrapper detection)
  --dry-run             Print the prompt without calling Gemini

Output: JSON array to stdout
  [
    {"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
    {"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
  ]

Exit codes:
  0  Success
  1  Gemini CLI not found or not authenticated
  2  No input files provided
  3  Gemini returned an error or unparseable output

Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]

Options:
  --source-lang LANG    Source language code (default: en)
  --target-lang LANG    Target language code (default: es)
  --glossary FILE       Path to glossary file (term mappings, one per line)
  --instructions TEXT   Additional translation instructions for tone/style
  --model MODEL         Gemini model override (default: system default)
  --max-tokens N        Max estimated input tokens per batch (default: 80000)
  --gemini-bin PATH     Path to gemini binary (bypasses wrapper detection)
  --dry-run             Print the prompt without calling Gemini

Output: JSON array to stdout
  [
    {"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
    {"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
  ]

Exit codes:
  0  Success
  1  Gemini CLI not found or not authenticated
  2  No input files provided
  3  Gemini returned an error or unparseable output

Translation Quality Rules

翻译质量规则

These rules are embedded in the prompt sent to Gemini:

Preserve structure exactly: frontmatter keys, markdown formatting, links, HTML tags, image paths
Never translate: brand names, proper nouns, URLs, file paths, credentials (MD, DO, CNM, etc.)
Medical terms: Use the glossary. When a term is not in the glossary and you are uncertain, wrap it in
```

```
Tone: Match the source document's tone. For medical patient-facing content, be warm, reassuring, and professional
Output format: Return the complete translated file content (frontmatter + body), not just the changed parts

以下规则已嵌入发送给Gemini的提示词中：

严格保留结构：前置元数据键、markdown格式、链接、HTML标签、图片路径均需完全保留
绝不翻译：品牌名称、专有名词、URL、文件路径、资质标识（如MD、DO、CNM等）
医学术语：使用术语表翻译。若术语未收录且不确定译法，请用
```

```
包裹
语气风格：匹配源文档语气。面向患者的医疗内容需亲切、贴心且专业
输出格式：返回完整的翻译文件内容（前置元数据+正文），而非仅返回修改部分

Adapting for Other Projects

适配其他项目

This skill is project-agnostic. To use it on a new codebase:

File convention: Set your project's locale file naming pattern (
```
.es.md
```
,
```
.es.json
```
,
```
locales/es/
```
, etc.)
Glossary: Extract domain-specific terms into a glossary file
Instructions: Write a one-paragraph style guide for the target language
Validation: Point Claude at your project's i18n tests or write a simple key-comparison check

该工具具有项目通用性，如需在新代码库中使用：

文件命名规范：设置项目的语言版本文件命名规则（如
```
.es.md
```
、
```
.es.json
```
、
```
locales/es/
```
等）
术语表：提取领域专属术语至术语表文件
翻译指令：为目标语言撰写一段风格指南
验证环节：让Claude调用项目的国际化测试，或编写简单的键值对比检查

Gemini CLI Wrapper Compatibility

Gemini CLI 包装器兼容性

Many users have a shell wrapper (e.g.,

~/bin/gemini

) that adds

--yolo

-y

by default. This conflicts with

--approval-mode

. The script avoids this by:

Preferring
```
pnpx @google/gemini-cli
```
(calls the package directly, no wrapper)
Falling back to
```
gemini
```
on PATH only if
```
pnpx
```
is unavailable
Accepting
```
--gemini-bin /path/to/binary
```
to override detection entirely

The script uses

-o json

for structured output, which returns a

{session_id, response, stats}

envelope. The embedded Python parser extracts the

response

field and handles markdown code fences, null bytes, and MCP warning prefixes automatically.

许多用户会使用shell包装器（如

~/bin/gemini

）默认添加

--yolo

-y

参数，这会与

--approval-mode

冲突。脚本通过以下方式避免此问题：

优先使用
```
pnpx @google/gemini-cli
```
（直接调用包，不使用包装器）
仅当
```
pnpx
```
不可用时，才回退使用PATH中的
```
gemini
```
支持通过
```
--gemini-bin /path/to/binary
```
参数手动指定二进制文件路径

脚本使用

-o json

获取结构化输出，返回包含

{session_id, response, stats}

的信封格式。内置Python解析器会提取

response

字段，并自动处理markdown代码块、空字节和MCP警告前缀。

Token-Based Batching

基于Token的批量处理

Instead of a fixed file count, the script estimates input tokens (1 token ~ 4 chars) and stops adding files when the budget is reached. The default

--max-tokens 80000

leaves room for the translation output (roughly 1.2x the input for EN->ES). Files that exceed the budget are listed as skipped so the caller can run a follow-up batch.

脚本并非按固定文件数量批量处理，而是估算输入token数（1token约等于4个字符），当达到预算上限时停止添加文件。默认

--max-tokens 80000

为翻译输出预留了空间（英西翻译的输出量约为输入的1.2倍）。超出预算的文件会被标记为跳过，方便调用者后续进行补批处理。

Truncation Recovery

截断恢复

When Gemini hits its output token limit and truncates the JSON mid-entry, the script recovers by:

Detecting incomplete JSON
Progressively trimming from the end to find valid JSON boundaries
Dropping the last (likely truncated) entry
Reporting how many complete translations were recovered

当Gemini达到输出token限制导致JSON中途截断时，脚本会通过以下方式恢复：

检测不完整的JSON
逐步从末尾截断以寻找有效的JSON边界
丢弃最后一个（可能已截断的）条目
报告已恢复的完整翻译数量

Agentic Workflow & Vibe Coding

智能代理工作流与风格编码

Iterative Translation: Do not expect perfect linguistic tone or structural preservation on the first batch run. Draft a small test batch, review the output for tone and formatting, isolate any consistent translation errors, refine the glossary or prompt instructions ONE variable at a time, and rerun the test before processing the entire project.
Vibe Coding: Commit your working source content and glossary updates locally before running the translation batch, and commit the generated
```
.es.*
```
files separately so you can easily revert if the model hallucinated structure.

迭代式翻译：不要期望首次批量运行就能获得完美的语言语气或结构保留效果。先起草一个小型测试批次，检查输出的语气与格式，找出任何持续出现的翻译错误，每次仅优化一个变量（术语表或提示词指令），然后重新运行测试，再处理整个项目。
风格编码：在运行翻译批次前，先在本地提交你的工作源内容和术语表更新，然后单独提交生成的
```
.es.*
```
文件，这样如果模型生成的结构出现幻觉，你可以轻松回退。

Limitations

局限性

Single language pair per call: The script handles one source/target pair. For multi-language projects, run once per target language.
Gemini CLI startup: ~13s overhead per batch call. Batching amortizes this.
Output token limit: 80K input tokens is the default budget. If truncation occurs, reduce
```
--max-tokens
```
.
No streaming: The script waits for the full response. Large batches may take 30-60s of model time on top of startup.
Python 3 required: The JSON extraction uses an embedded Python script.

单次调用仅支持一对语言：脚本仅处理一组源语言/目标语言。多语言项目需针对每个目标语言单独运行一次。
Gemini CLI启动开销：每次批量调用约有13秒的启动开销。批量处理可分摊此成本。
输出token限制：默认预算为80K输入token。若出现截断情况，请降低
```
--max-tokens
```
值。
无流式输出：脚本需等待完整响应。大型批次可能在启动开销之外额外花费30-60秒的模型处理时间。
依赖Python 3：JSON提取功能使用内置Python脚本实现。