gemini-translate
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Translate
Gemini Translate
Batch-translate content files (markdown, JSON, YAML frontmatter) using Gemini CLI as a translation subagent. Claude orchestrates the pipeline: identifies gaps, builds prompts with glossary context, dispatches to Gemini in a single CLI call, validates output structure, and writes files.
使用Gemini CLI作为翻译子代理,批量翻译内容文件(markdown、JSON、YAML前置元数据)。Claude统筹整个流程:识别翻译缺口、结合术语表上下文构建提示词、通过单次CLI调用分发任务给Gemini、验证输出结构并写入文件。
Why Gemini CLI
为何选择Gemini CLI
- Uses your Google AI Ultra plan via OAuth (no API key needed)
- 1M token context fits entire glossaries + dozens of source files in one call
- Single startup cost (~13s) instead of per-file overhead
- Claude stays in control of orchestration, validation, and file writes
- 通过OAuth调用你的Google AI Ultra套餐(无需API密钥)
- 100万token上下文窗口可容纳完整术语表+数十个源文件,实现单次调用
- 仅需单次启动成本(约13秒),无需为每个文件单独付出开销
- 由Claude全程把控流程编排、验证与文件写入操作
Prerequisites
前置条件
- Gemini CLI installed and authenticated (on PATH, OAuth configured)
gemini - Source content files in a consistent structure (markdown with frontmatter, JSON, etc.)
- 已安装并完成Gemini CLI认证(已添加至PATH,OAuth配置完成)
gemini - 源内容文件结构统一(如带前置元数据的markdown、JSON等)
Pipeline Overview
流程概览
Claude: find translation gaps (missing .es.* files, parity tests)
|
Claude: read source files + glossary + existing translations for tone
|
Claude: build batch prompt and call gemini-translate.sh
|
Gemini: translate all files in one shot, return JSON
|
Claude: parse response, validate structure, write .es.* files
|
Claude: run project tests (i18n symmetry, coverage)Claude: find translation gaps (missing .es.* files, parity tests)
|
Claude: read source files + glossary + existing translations for tone
|
Claude: build batch prompt and call gemini-translate.sh
|
Gemini: translate all files in one shot, return JSON
|
Claude: parse response, validate structure, write .es.* files
|
Claude: run project tests (i18n symmetry, coverage)Usage
使用方法
Step 1: Identify gaps
步骤1:识别翻译缺口
Find content files missing their locale counterpart:
bash
undefined查找缺少对应语言版本的内容文件:
bash
undefinedGeneric pattern -- adjust paths and extensions for your project
Generic pattern -- adjust paths and extensions for your project
for f in content/**/.md; do
base=$(basename "$f")
[[ "$base" == .es. ]] && continue
name="${base%.}"
dir=$(dirname "$f")
[ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f"
done
Or run your project's i18n parity tests if they exist.for f in content/**/.md; do
base=$(basename "$f")
[[ "$base" == .es. ]] && continue
name="${base%.}"
dir=$(dirname "$f")
[ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f"
done
或者如果项目已有国际化一致性测试,直接运行该测试即可。Step 2: Prepare the glossary
步骤2:准备术语表
Create a glossary of terms that must be translated consistently. The glossary is a simple text block embedded in the prompt:
undefined创建一份需保持翻译一致性的术语表。术语表是嵌入提示词中的简单文本块:
undefinedGlossary (EN -> ES)
Glossary (EN -> ES)
- "OB/GYN Physician" -> "Medico OB/GYN"
- "High-risk pregnancy" -> "Embarazo de alto riesgo"
- "Certified Nurse Midwife" -> "Enfermera Partera Certificada"
If your project has an existing glossary (Python dict, CSV, JSON), convert it to this format before calling the script. The script accepts a glossary file via `--glossary`.- "OB/GYN Physician" -> "Medico OB/GYN"
- "High-risk pregnancy" -> "Embarazo de alto riesgo"
- "Certified Nurse Midwife" -> "Enfermera Partera Certificada"
如果项目已有现成术语表(如Python字典、CSV、JSON格式),请先转换为此格式再调用脚本。脚本可通过`--glossary`参数接收术语表文件。Step 3: Run the batch translation
步骤3:运行批量翻译
bash
bash gemini-translate.sh \
--source-lang en \
--target-lang es \
--glossary glossary.txt \
--model gemini-2.5-pro \
--instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
file1.md file2.md file3.mdThe script:
- Reads all source files
- Builds a single prompt with glossary + instructions + all file contents
- Calls
gemini -p "..." -o json --approval-mode plan - Parses the JSON response and prints each translation to stdout as a JSON array
bash
bash gemini-translate.sh \
--source-lang en \
--target-lang es \
--glossary glossary.txt \
--model gemini-2.5-pro \
--instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
file1.md file2.md file3.md该脚本会执行以下操作:
- 读取所有源文件
- 构建包含术语表+指令+所有文件内容的单次提示词
- 调用
gemini -p "..." -o json --approval-mode plan - 解析JSON响应并将每个翻译结果以JSON数组形式输出至标准输出
Step 4: Claude validates and writes
步骤4:Claude验证并写入文件
After the script returns, Claude should:
- Parse the JSON output
- For each translated file:
- Verify frontmatter keys match the source exactly
- Verify links, image paths, and brand names are preserved
- Verify no flags from Gemini (or surface them to the user)
<!-- REVIEW: -->
- Write the files
.es.* - Run the project's i18n tests
脚本返回结果后,Claude需要执行以下操作:
- 解析JSON输出
- 针对每个翻译文件:
- 验证前置元数据键与源文件完全匹配
- 验证链接、图片路径和品牌名称均已保留
- 验证Gemini未添加标记(若有则告知用户)
<!-- REVIEW: -->
- 写入格式的文件
.es.* - 运行项目的国际化测试
Script Reference
脚本参考
gemini-translate.sh
gemini-translate.shgemini-translate.sh
gemini-translate.shUsage: gemini-translate.sh [OPTIONS] FILE [FILE...]
Options:
--source-lang LANG Source language code (default: en)
--target-lang LANG Target language code (default: es)
--glossary FILE Path to glossary file (term mappings, one per line)
--instructions TEXT Additional translation instructions for tone/style
--model MODEL Gemini model override (default: system default)
--max-tokens N Max estimated input tokens per batch (default: 80000)
--gemini-bin PATH Path to gemini binary (bypasses wrapper detection)
--dry-run Print the prompt without calling Gemini
Output: JSON array to stdout
[
{"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
{"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
]
Exit codes:
0 Success
1 Gemini CLI not found or not authenticated
2 No input files provided
3 Gemini returned an error or unparseable outputUsage: gemini-translate.sh [OPTIONS] FILE [FILE...]
Options:
--source-lang LANG Source language code (default: en)
--target-lang LANG Target language code (default: es)
--glossary FILE Path to glossary file (term mappings, one per line)
--instructions TEXT Additional translation instructions for tone/style
--model MODEL Gemini model override (default: system default)
--max-tokens N Max estimated input tokens per batch (default: 80000)
--gemini-bin PATH Path to gemini binary (bypasses wrapper detection)
--dry-run Print the prompt without calling Gemini
Output: JSON array to stdout
[
{"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
{"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
]
Exit codes:
0 Success
1 Gemini CLI not found or not authenticated
2 No input files provided
3 Gemini returned an error or unparseable outputTranslation Quality Rules
翻译质量规则
These rules are embedded in the prompt sent to Gemini:
- Preserve structure exactly: frontmatter keys, markdown formatting, links, HTML tags, image paths
- Never translate: brand names, proper nouns, URLs, file paths, credentials (MD, DO, CNM, etc.)
- Medical terms: Use the glossary. When a term is not in the glossary and you are uncertain, wrap it in
<!-- REVIEW: original term --> - Tone: Match the source document's tone. For medical patient-facing content, be warm, reassuring, and professional
- Output format: Return the complete translated file content (frontmatter + body), not just the changed parts
以下规则已嵌入发送给Gemini的提示词中:
- 严格保留结构:前置元数据键、markdown格式、链接、HTML标签、图片路径均需完全保留
- 绝不翻译:品牌名称、专有名词、URL、文件路径、资质标识(如MD、DO、CNM等)
- 医学术语:使用术语表翻译。若术语未收录且不确定译法,请用包裹
<!-- REVIEW: 原术语 --> - 语气风格:匹配源文档语气。面向患者的医疗内容需亲切、贴心且专业
- 输出格式:返回完整的翻译文件内容(前置元数据+正文),而非仅返回修改部分
Adapting for Other Projects
适配其他项目
This skill is project-agnostic. To use it on a new codebase:
- File convention: Set your project's locale file naming pattern (,
.es.md,.es.json, etc.)locales/es/ - Glossary: Extract domain-specific terms into a glossary file
- Instructions: Write a one-paragraph style guide for the target language
- Validation: Point Claude at your project's i18n tests or write a simple key-comparison check
该工具具有项目通用性,如需在新代码库中使用:
- 文件命名规范:设置项目的语言版本文件命名规则(如、
.es.md、.es.json等)locales/es/ - 术语表:提取领域专属术语至术语表文件
- 翻译指令:为目标语言撰写一段风格指南
- 验证环节:让Claude调用项目的国际化测试,或编写简单的键值对比检查
Gemini CLI Wrapper Compatibility
Gemini CLI 包装器兼容性
Many users have a shell wrapper (e.g., ) that adds / by default. This conflicts with . The script avoids this by:
~/bin/gemini--yolo-y--approval-mode- Preferring (calls the package directly, no wrapper)
pnpx @google/gemini-cli - Falling back to on PATH only if
geminiis unavailablepnpx - Accepting to override detection entirely
--gemini-bin /path/to/binary
The script uses for structured output, which returns a envelope. The embedded Python parser extracts the field and handles markdown code fences, null bytes, and MCP warning prefixes automatically.
-o json{session_id, response, stats}response许多用户会使用shell包装器(如)默认添加/参数,这会与冲突。脚本通过以下方式避免此问题:
~/bin/gemini--yolo-y--approval-mode- 优先使用(直接调用包,不使用包装器)
pnpx @google/gemini-cli - 仅当不可用时,才回退使用PATH中的
pnpxgemini - 支持通过参数手动指定二进制文件路径
--gemini-bin /path/to/binary
脚本使用获取结构化输出,返回包含的信封格式。内置Python解析器会提取字段,并自动处理markdown代码块、空字节和MCP警告前缀。
-o json{session_id, response, stats}responseToken-Based Batching
基于Token的批量处理
Instead of a fixed file count, the script estimates input tokens (1 token ~ 4 chars) and stops adding files when the budget is reached. The default leaves room for the translation output (roughly 1.2x the input for EN->ES). Files that exceed the budget are listed as skipped so the caller can run a follow-up batch.
--max-tokens 80000脚本并非按固定文件数量批量处理,而是估算输入token数(1token约等于4个字符),当达到预算上限时停止添加文件。默认为翻译输出预留了空间(英西翻译的输出量约为输入的1.2倍)。超出预算的文件会被标记为跳过,方便调用者后续进行补批处理。
--max-tokens 80000Truncation Recovery
截断恢复
When Gemini hits its output token limit and truncates the JSON mid-entry, the script recovers by:
- Detecting incomplete JSON
- Progressively trimming from the end to find valid JSON boundaries
- Dropping the last (likely truncated) entry
- Reporting how many complete translations were recovered
当Gemini达到输出token限制导致JSON中途截断时,脚本会通过以下方式恢复:
- 检测不完整的JSON
- 逐步从末尾截断以寻找有效的JSON边界
- 丢弃最后一个(可能已截断的)条目
- 报告已恢复的完整翻译数量
Agentic Workflow & Vibe Coding
智能代理工作流与风格编码
- Iterative Translation: Do not expect perfect linguistic tone or structural preservation on the first batch run. Draft a small test batch, review the output for tone and formatting, isolate any consistent translation errors, refine the glossary or prompt instructions ONE variable at a time, and rerun the test before processing the entire project.
- Vibe Coding: Commit your working source content and glossary updates locally before running the translation batch, and commit the generated files separately so you can easily revert if the model hallucinated structure.
.es.*
- 迭代式翻译:不要期望首次批量运行就能获得完美的语言语气或结构保留效果。先起草一个小型测试批次,检查输出的语气与格式,找出任何持续出现的翻译错误,每次仅优化一个变量(术语表或提示词指令),然后重新运行测试,再处理整个项目。
- 风格编码:在运行翻译批次前,先在本地提交你的工作源内容和术语表更新,然后单独提交生成的文件,这样如果模型生成的结构出现幻觉,你可以轻松回退。
.es.*
Limitations
局限性
- Single language pair per call: The script handles one source/target pair. For multi-language projects, run once per target language.
- Gemini CLI startup: ~13s overhead per batch call. Batching amortizes this.
- Output token limit: 80K input tokens is the default budget. If truncation occurs, reduce .
--max-tokens - No streaming: The script waits for the full response. Large batches may take 30-60s of model time on top of startup.
- Python 3 required: The JSON extraction uses an embedded Python script.
- 单次调用仅支持一对语言:脚本仅处理一组源语言/目标语言。多语言项目需针对每个目标语言单独运行一次。
- Gemini CLI启动开销:每次批量调用约有13秒的启动开销。批量处理可分摊此成本。
- 输出token限制:默认预算为80K输入token。若出现截断情况,请降低值。
--max-tokens - 无流式输出:脚本需等待完整响应。大型批次可能在启动开销之外额外花费30-60秒的模型处理时间。
- 依赖Python 3:JSON提取功能使用内置Python脚本实现。