gemini-translate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Translate

Gemini Translate

Batch-translate content files (markdown, JSON, YAML frontmatter) using Gemini CLI as a translation subagent. Claude orchestrates the pipeline: identifies gaps, builds prompts with glossary context, dispatches to Gemini in a single CLI call, validates output structure, and writes files.
使用Gemini CLI作为翻译子代理,批量翻译内容文件(markdown、JSON、YAML前置元数据)。Claude统筹整个流程:识别翻译缺口、结合术语表上下文构建提示词、通过单次CLI调用分发任务给Gemini、验证输出结构并写入文件。

Why Gemini CLI

为何选择Gemini CLI

  • Uses your Google AI Ultra plan via OAuth (no API key needed)
  • 1M token context fits entire glossaries + dozens of source files in one call
  • Single startup cost (~13s) instead of per-file overhead
  • Claude stays in control of orchestration, validation, and file writes
  • 通过OAuth调用你的Google AI Ultra套餐(无需API密钥)
  • 100万token上下文窗口可容纳完整术语表+数十个源文件,实现单次调用
  • 仅需单次启动成本(约13秒),无需为每个文件单独付出开销
  • 由Claude全程把控流程编排、验证与文件写入操作

Prerequisites

前置条件

  • Gemini CLI installed and authenticated (
    gemini
    on PATH, OAuth configured)
  • Source content files in a consistent structure (markdown with frontmatter, JSON, etc.)
  • 已安装并完成Gemini CLI认证(
    gemini
    已添加至PATH,OAuth配置完成)
  • 源内容文件结构统一(如带前置元数据的markdown、JSON等)

Pipeline Overview

流程概览

Claude: find translation gaps (missing .es.* files, parity tests)
  |
Claude: read source files + glossary + existing translations for tone
  |
Claude: build batch prompt and call gemini-translate.sh
  |
Gemini: translate all files in one shot, return JSON
  |
Claude: parse response, validate structure, write .es.* files
  |
Claude: run project tests (i18n symmetry, coverage)
Claude: find translation gaps (missing .es.* files, parity tests)
  |
Claude: read source files + glossary + existing translations for tone
  |
Claude: build batch prompt and call gemini-translate.sh
  |
Gemini: translate all files in one shot, return JSON
  |
Claude: parse response, validate structure, write .es.* files
  |
Claude: run project tests (i18n symmetry, coverage)

Usage

使用方法

Step 1: Identify gaps

步骤1:识别翻译缺口

Find content files missing their locale counterpart:
bash
undefined
查找缺少对应语言版本的内容文件:
bash
undefined

Generic pattern -- adjust paths and extensions for your project

Generic pattern -- adjust paths and extensions for your project

for f in content/**/.md; do base=$(basename "$f") [[ "$base" == .es. ]] && continue name="${base%.}" dir=$(dirname "$f") [ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f" done

Or run your project's i18n parity tests if they exist.
for f in content/**/.md; do base=$(basename "$f") [[ "$base" == .es. ]] && continue name="${base%.}" dir=$(dirname "$f") [ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f" done

或者如果项目已有国际化一致性测试,直接运行该测试即可。

Step 2: Prepare the glossary

步骤2:准备术语表

Create a glossary of terms that must be translated consistently. The glossary is a simple text block embedded in the prompt:
undefined
创建一份需保持翻译一致性的术语表。术语表是嵌入提示词中的简单文本块:
undefined

Glossary (EN -> ES)

Glossary (EN -> ES)

  • "OB/GYN Physician" -> "Medico OB/GYN"
  • "High-risk pregnancy" -> "Embarazo de alto riesgo"
  • "Certified Nurse Midwife" -> "Enfermera Partera Certificada"

If your project has an existing glossary (Python dict, CSV, JSON), convert it to this format before calling the script. The script accepts a glossary file via `--glossary`.
  • "OB/GYN Physician" -> "Medico OB/GYN"
  • "High-risk pregnancy" -> "Embarazo de alto riesgo"
  • "Certified Nurse Midwife" -> "Enfermera Partera Certificada"

如果项目已有现成术语表(如Python字典、CSV、JSON格式),请先转换为此格式再调用脚本。脚本可通过`--glossary`参数接收术语表文件。

Step 3: Run the batch translation

步骤3:运行批量翻译

bash
bash gemini-translate.sh \
  --source-lang en \
  --target-lang es \
  --glossary glossary.txt \
  --model gemini-2.5-pro \
  --instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
  file1.md file2.md file3.md
The script:
  1. Reads all source files
  2. Builds a single prompt with glossary + instructions + all file contents
  3. Calls
    gemini -p "..." -o json --approval-mode plan
  4. Parses the JSON response and prints each translation to stdout as a JSON array
bash
bash gemini-translate.sh \
  --source-lang en \
  --target-lang es \
  --glossary glossary.txt \
  --model gemini-2.5-pro \
  --instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
  file1.md file2.md file3.md
该脚本会执行以下操作:
  1. 读取所有源文件
  2. 构建包含术语表+指令+所有文件内容的单次提示词
  3. 调用
    gemini -p "..." -o json --approval-mode plan
  4. 解析JSON响应并将每个翻译结果以JSON数组形式输出至标准输出

Step 4: Claude validates and writes

步骤4:Claude验证并写入文件

After the script returns, Claude should:
  1. Parse the JSON output
  2. For each translated file:
    • Verify frontmatter keys match the source exactly
    • Verify links, image paths, and brand names are preserved
    • Verify no
      <!-- REVIEW: -->
      flags from Gemini (or surface them to the user)
  3. Write the
    .es.*
    files
  4. Run the project's i18n tests
脚本返回结果后,Claude需要执行以下操作:
  1. 解析JSON输出
  2. 针对每个翻译文件:
    • 验证前置元数据键与源文件完全匹配
    • 验证链接、图片路径和品牌名称均已保留
    • 验证Gemini未添加
      <!-- REVIEW: -->
      标记(若有则告知用户)
  3. 写入
    .es.*
    格式的文件
  4. 运行项目的国际化测试

Script Reference

脚本参考

gemini-translate.sh

gemini-translate.sh

Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]

Options:
  --source-lang LANG    Source language code (default: en)
  --target-lang LANG    Target language code (default: es)
  --glossary FILE       Path to glossary file (term mappings, one per line)
  --instructions TEXT   Additional translation instructions for tone/style
  --model MODEL         Gemini model override (default: system default)
  --max-tokens N        Max estimated input tokens per batch (default: 80000)
  --gemini-bin PATH     Path to gemini binary (bypasses wrapper detection)
  --dry-run             Print the prompt without calling Gemini

Output: JSON array to stdout
  [
    {"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
    {"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
  ]

Exit codes:
  0  Success
  1  Gemini CLI not found or not authenticated
  2  No input files provided
  3  Gemini returned an error or unparseable output
Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]

Options:
  --source-lang LANG    Source language code (default: en)
  --target-lang LANG    Target language code (default: es)
  --glossary FILE       Path to glossary file (term mappings, one per line)
  --instructions TEXT   Additional translation instructions for tone/style
  --model MODEL         Gemini model override (default: system default)
  --max-tokens N        Max estimated input tokens per batch (default: 80000)
  --gemini-bin PATH     Path to gemini binary (bypasses wrapper detection)
  --dry-run             Print the prompt without calling Gemini

Output: JSON array to stdout
  [
    {"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
    {"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
  ]

Exit codes:
  0  Success
  1  Gemini CLI not found or not authenticated
  2  No input files provided
  3  Gemini returned an error or unparseable output

Translation Quality Rules

翻译质量规则

These rules are embedded in the prompt sent to Gemini:
  1. Preserve structure exactly: frontmatter keys, markdown formatting, links, HTML tags, image paths
  2. Never translate: brand names, proper nouns, URLs, file paths, credentials (MD, DO, CNM, etc.)
  3. Medical terms: Use the glossary. When a term is not in the glossary and you are uncertain, wrap it in
    <!-- REVIEW: original term -->
  4. Tone: Match the source document's tone. For medical patient-facing content, be warm, reassuring, and professional
  5. Output format: Return the complete translated file content (frontmatter + body), not just the changed parts
以下规则已嵌入发送给Gemini的提示词中:
  1. 严格保留结构:前置元数据键、markdown格式、链接、HTML标签、图片路径均需完全保留
  2. 绝不翻译:品牌名称、专有名词、URL、文件路径、资质标识(如MD、DO、CNM等)
  3. 医学术语:使用术语表翻译。若术语未收录且不确定译法,请用
    <!-- REVIEW: 原术语 -->
    包裹
  4. 语气风格:匹配源文档语气。面向患者的医疗内容需亲切、贴心且专业
  5. 输出格式:返回完整的翻译文件内容(前置元数据+正文),而非仅返回修改部分

Adapting for Other Projects

适配其他项目

This skill is project-agnostic. To use it on a new codebase:
  1. File convention: Set your project's locale file naming pattern (
    .es.md
    ,
    .es.json
    ,
    locales/es/
    , etc.)
  2. Glossary: Extract domain-specific terms into a glossary file
  3. Instructions: Write a one-paragraph style guide for the target language
  4. Validation: Point Claude at your project's i18n tests or write a simple key-comparison check
该工具具有项目通用性,如需在新代码库中使用:
  1. 文件命名规范:设置项目的语言版本文件命名规则(如
    .es.md
    .es.json
    locales/es/
    等)
  2. 术语表:提取领域专属术语至术语表文件
  3. 翻译指令:为目标语言撰写一段风格指南
  4. 验证环节:让Claude调用项目的国际化测试,或编写简单的键值对比检查

Gemini CLI Wrapper Compatibility

Gemini CLI 包装器兼容性

Many users have a shell wrapper (e.g.,
~/bin/gemini
) that adds
--yolo
/
-y
by default. This conflicts with
--approval-mode
. The script avoids this by:
  1. Preferring
    pnpx @google/gemini-cli
    (calls the package directly, no wrapper)
  2. Falling back to
    gemini
    on PATH only if
    pnpx
    is unavailable
  3. Accepting
    --gemini-bin /path/to/binary
    to override detection entirely
The script uses
-o json
for structured output, which returns a
{session_id, response, stats}
envelope. The embedded Python parser extracts the
response
field and handles markdown code fences, null bytes, and MCP warning prefixes automatically.
许多用户会使用shell包装器(如
~/bin/gemini
)默认添加
--yolo
/
-y
参数,这会与
--approval-mode
冲突。脚本通过以下方式避免此问题:
  1. 优先使用
    pnpx @google/gemini-cli
    (直接调用包,不使用包装器)
  2. 仅当
    pnpx
    不可用时,才回退使用PATH中的
    gemini
  3. 支持通过
    --gemini-bin /path/to/binary
    参数手动指定二进制文件路径
脚本使用
-o json
获取结构化输出,返回包含
{session_id, response, stats}
的信封格式。内置Python解析器会提取
response
字段,并自动处理markdown代码块、空字节和MCP警告前缀。

Token-Based Batching

基于Token的批量处理

Instead of a fixed file count, the script estimates input tokens (1 token ~ 4 chars) and stops adding files when the budget is reached. The default
--max-tokens 80000
leaves room for the translation output (roughly 1.2x the input for EN->ES). Files that exceed the budget are listed as skipped so the caller can run a follow-up batch.
脚本并非按固定文件数量批量处理,而是估算输入token数(1token约等于4个字符),当达到预算上限时停止添加文件。默认
--max-tokens 80000
为翻译输出预留了空间(英西翻译的输出量约为输入的1.2倍)。超出预算的文件会被标记为跳过,方便调用者后续进行补批处理。

Truncation Recovery

截断恢复

When Gemini hits its output token limit and truncates the JSON mid-entry, the script recovers by:
  1. Detecting incomplete JSON
  2. Progressively trimming from the end to find valid JSON boundaries
  3. Dropping the last (likely truncated) entry
  4. Reporting how many complete translations were recovered
当Gemini达到输出token限制导致JSON中途截断时,脚本会通过以下方式恢复:
  1. 检测不完整的JSON
  2. 逐步从末尾截断以寻找有效的JSON边界
  3. 丢弃最后一个(可能已截断的)条目
  4. 报告已恢复的完整翻译数量

Agentic Workflow & Vibe Coding

智能代理工作流与风格编码

  • Iterative Translation: Do not expect perfect linguistic tone or structural preservation on the first batch run. Draft a small test batch, review the output for tone and formatting, isolate any consistent translation errors, refine the glossary or prompt instructions ONE variable at a time, and rerun the test before processing the entire project.
  • Vibe Coding: Commit your working source content and glossary updates locally before running the translation batch, and commit the generated
    .es.*
    files separately so you can easily revert if the model hallucinated structure.
  • 迭代式翻译:不要期望首次批量运行就能获得完美的语言语气或结构保留效果。先起草一个小型测试批次,检查输出的语气与格式,找出任何持续出现的翻译错误,每次仅优化一个变量(术语表或提示词指令),然后重新运行测试,再处理整个项目。
  • 风格编码:在运行翻译批次前,先在本地提交你的工作源内容和术语表更新,然后单独提交生成的
    .es.*
    文件,这样如果模型生成的结构出现幻觉,你可以轻松回退。

Limitations

局限性

  • Single language pair per call: The script handles one source/target pair. For multi-language projects, run once per target language.
  • Gemini CLI startup: ~13s overhead per batch call. Batching amortizes this.
  • Output token limit: 80K input tokens is the default budget. If truncation occurs, reduce
    --max-tokens
    .
  • No streaming: The script waits for the full response. Large batches may take 30-60s of model time on top of startup.
  • Python 3 required: The JSON extraction uses an embedded Python script.
  • 单次调用仅支持一对语言:脚本仅处理一组源语言/目标语言。多语言项目需针对每个目标语言单独运行一次。
  • Gemini CLI启动开销:每次批量调用约有13秒的启动开销。批量处理可分摊此成本。
  • 输出token限制:默认预算为80K输入token。若出现截断情况,请降低
    --max-tokens
    值。
  • 无流式输出:脚本需等待完整响应。大型批次可能在启动开销之外额外花费30-60秒的模型处理时间。
  • 依赖Python 3:JSON提取功能使用内置Python脚本实现。