Gemini Translate
Batch-translate content files (markdown, JSON, YAML frontmatter) using Gemini CLI as a translation subagent. Claude orchestrates the pipeline: identifies gaps, builds prompts with glossary context, dispatches to Gemini in a single CLI call, validates output structure, and writes files.
Why Gemini CLI
- Uses your Google AI Ultra plan via OAuth (no API key needed)
- 1M token context fits entire glossaries + dozens of source files in one call
- Single startup cost (~13s) instead of per-file overhead
- Claude stays in control of orchestration, validation, and file writes
Prerequisites
- Gemini CLI installed and authenticated ( on PATH, OAuth configured)
- Source content files in a consistent structure (markdown with frontmatter, JSON, etc.)
Pipeline Overview
Claude: find translation gaps (missing .es.* files, parity tests)
|
Claude: read source files + glossary + existing translations for tone
|
Claude: build batch prompt and call gemini-translate.sh
|
Gemini: translate all files in one shot, return JSON
|
Claude: parse response, validate structure, write .es.* files
|
Claude: run project tests (i18n symmetry, coverage)
Usage
Step 1: Identify gaps
Find content files missing their locale counterpart:
bash
# Generic pattern -- adjust paths and extensions for your project
for f in content/**/*.md; do
base=$(basename "$f")
[[ "$base" == *.es.* ]] && continue
name="${base%.*}"
dir=$(dirname "$f")
[ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f"
done
Or run your project's i18n parity tests if they exist.
Step 2: Prepare the glossary
Create a glossary of terms that must be translated consistently. The glossary is a simple text block embedded in the prompt:
## Glossary (EN -> ES)
- "OB/GYN Physician" -> "Medico OB/GYN"
- "High-risk pregnancy" -> "Embarazo de alto riesgo"
- "Certified Nurse Midwife" -> "Enfermera Partera Certificada"
If your project has an existing glossary (Python dict, CSV, JSON), convert it to this format before calling the script. The script accepts a glossary file via
.
Step 3: Run the batch translation
bash
bash gemini-translate.sh \
--source-lang en \
--target-lang es \
--glossary glossary.txt \
--model gemini-2.5-pro \
--instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
file1.md file2.md file3.md
The script:
- Reads all source files
- Builds a single prompt with glossary + instructions + all file contents
- Calls
gemini -p "..." -o json --approval-mode plan
- Parses the JSON response and prints each translation to stdout as a JSON array
Step 4: Claude validates and writes
After the script returns, Claude should:
- Parse the JSON output
- For each translated file:
- Verify frontmatter keys match the source exactly
- Verify links, image paths, and brand names are preserved
- Verify no flags from Gemini (or surface them to the user)
- Write the files
- Run the project's i18n tests
Script Reference
Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]
Options:
--source-lang LANG Source language code (default: en)
--target-lang LANG Target language code (default: es)
--glossary FILE Path to glossary file (term mappings, one per line)
--instructions TEXT Additional translation instructions for tone/style
--model MODEL Gemini model override (default: system default)
--max-tokens N Max estimated input tokens per batch (default: 80000)
--gemini-bin PATH Path to gemini binary (bypasses wrapper detection)
--dry-run Print the prompt without calling Gemini
Output: JSON array to stdout
[
{"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
{"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
]
Exit codes:
0 Success
1 Gemini CLI not found or not authenticated
2 No input files provided
3 Gemini returned an error or unparseable output
Translation Quality Rules
These rules are embedded in the prompt sent to Gemini:
- Preserve structure exactly: frontmatter keys, markdown formatting, links, HTML tags, image paths
- Never translate: brand names, proper nouns, URLs, file paths, credentials (MD, DO, CNM, etc.)
- Medical terms: Use the glossary. When a term is not in the glossary and you are uncertain, wrap it in
<!-- REVIEW: original term -->
- Tone: Match the source document's tone. For medical patient-facing content, be warm, reassuring, and professional
- Output format: Return the complete translated file content (frontmatter + body), not just the changed parts
Adapting for Other Projects
This skill is project-agnostic. To use it on a new codebase:
- File convention: Set your project's locale file naming pattern (, , , etc.)
- Glossary: Extract domain-specific terms into a glossary file
- Instructions: Write a one-paragraph style guide for the target language
- Validation: Point Claude at your project's i18n tests or write a simple key-comparison check
Gemini CLI Wrapper Compatibility
Many users have a shell wrapper (e.g.,
) that adds
/
by default. This conflicts with
. The script avoids this by:
- Preferring (calls the package directly, no wrapper)
- Falling back to on PATH only if is unavailable
- Accepting
--gemini-bin /path/to/binary
to override detection entirely
The script uses
for structured output, which returns a
{session_id, response, stats}
envelope. The embedded Python parser extracts the
field and handles markdown code fences, null bytes, and MCP warning prefixes automatically.
Token-Based Batching
Instead of a fixed file count, the script estimates input tokens (1 token ~ 4 chars) and stops adding files when the budget is reached. The default
leaves room for the translation output (roughly 1.2x the input for EN->ES). Files that exceed the budget are listed as skipped so the caller can run a follow-up batch.
Truncation Recovery
When Gemini hits its output token limit and truncates the JSON mid-entry, the script recovers by:
- Detecting incomplete JSON
- Progressively trimming from the end to find valid JSON boundaries
- Dropping the last (likely truncated) entry
- Reporting how many complete translations were recovered
Agentic Workflow & Vibe Coding
- Iterative Translation: Do not expect perfect linguistic tone or structural preservation on the first batch run. Draft a small test batch, review the output for tone and formatting, isolate any consistent translation errors, refine the glossary or prompt instructions ONE variable at a time, and rerun the test before processing the entire project.
- Vibe Coding: Commit your working source content and glossary updates locally before running the translation batch, and commit the generated files separately so you can easily revert if the model hallucinated structure.
Limitations
- Single language pair per call: The script handles one source/target pair. For multi-language projects, run once per target language.
- Gemini CLI startup: ~13s overhead per batch call. Batching amortizes this.
- Output token limit: 80K input tokens is the default budget. If truncation occurs, reduce .
- No streaming: The script waits for the full response. Large batches may take 30-60s of model time on top of startup.
- Python 3 required: The JSON extraction uses an embedded Python script.