Gemini Translate

Batch-translate content files (markdown, JSON, YAML frontmatter) using Gemini CLI as a translation subagent. Claude orchestrates the pipeline: identifies gaps, builds prompts with glossary context, dispatches to Gemini in a single CLI call, validates output structure, and writes files.

Why Gemini CLI

Uses your Google AI Ultra plan via OAuth (no API key needed)
1M token context fits entire glossaries + dozens of source files in one call
Single startup cost (~13s) instead of per-file overhead
Claude stays in control of orchestration, validation, and file writes

Prerequisites

Gemini CLI installed and authenticated (
```
gemini
```
on PATH, OAuth configured)
Source content files in a consistent structure (markdown with frontmatter, JSON, etc.)

Pipeline Overview

Claude: find translation gaps (missing .es.* files, parity tests)
  |
Claude: read source files + glossary + existing translations for tone
  |
Claude: build batch prompt and call gemini-translate.sh
  |
Gemini: translate all files in one shot, return JSON
  |
Claude: parse response, validate structure, write .es.* files
  |
Claude: run project tests (i18n symmetry, coverage)

Usage

Step 1: Identify gaps

Find content files missing their locale counterpart:

bash

# Generic pattern -- adjust paths and extensions for your project
for f in content/**/*.md; do
  base=$(basename "$f")
  [[ "$base" == *.es.* ]] && continue
  name="${base%.*}"
  dir=$(dirname "$f")
  [ ! -f "$dir/${name}.es.md" ] && echo "MISSING: $f"
done

Or run your project's i18n parity tests if they exist.

Step 2: Prepare the glossary

Create a glossary of terms that must be translated consistently. The glossary is a simple text block embedded in the prompt:

## Glossary (EN -> ES)
- "OB/GYN Physician" -> "Medico OB/GYN"
- "High-risk pregnancy" -> "Embarazo de alto riesgo"
- "Certified Nurse Midwife" -> "Enfermera Partera Certificada"

If your project has an existing glossary (Python dict, CSV, JSON), convert it to this format before calling the script. The script accepts a glossary file via

--glossary

Step 3: Run the batch translation

bash

bash gemini-translate.sh \
  --source-lang en \
  --target-lang es \
  --glossary glossary.txt \
  --model gemini-2.5-pro \
  --instructions "Use formal 'usted'. Latin American Spanish, not Spain. Warm but professional tone for patient-facing medical content." \
  file1.md file2.md file3.md

The script:

Reads all source files
Builds a single prompt with glossary + instructions + all file contents

Calls

gemini -p "..." -o json --approval-mode plan

Parses the JSON response and prints each translation to stdout as a JSON array

Step 4: Claude validates and writes

After the script returns, Claude should:

Parse the JSON output
For each translated file:
- Verify frontmatter keys match the source exactly
- Verify links, image paths, and brand names are preserved
- Verify no
```

```
  flags from Gemini (or surface them to the user)
Write the
```
.es.*
```
files
Run the project's i18n tests

Script Reference

gemini-translate.sh

Usage: gemini-translate.sh [OPTIONS] FILE [FILE...]

Options:
  --source-lang LANG    Source language code (default: en)
  --target-lang LANG    Target language code (default: es)
  --glossary FILE       Path to glossary file (term mappings, one per line)
  --instructions TEXT   Additional translation instructions for tone/style
  --model MODEL         Gemini model override (default: system default)
  --max-tokens N        Max estimated input tokens per batch (default: 80000)
  --gemini-bin PATH     Path to gemini binary (bypasses wrapper detection)
  --dry-run             Print the prompt without calling Gemini

Output: JSON array to stdout
  [
    {"file": "about.md", "translation": "---\ntitle: Acerca de\n---\n..."},
    {"file": "careers.md", "translation": "---\ntitle: Carreras\n---\n..."}
  ]

Exit codes:
  0  Success
  1  Gemini CLI not found or not authenticated
  2  No input files provided
  3  Gemini returned an error or unparseable output

Translation Quality Rules

These rules are embedded in the prompt sent to Gemini:

Preserve structure exactly: frontmatter keys, markdown formatting, links, HTML tags, image paths
Never translate: brand names, proper nouns, URLs, file paths, credentials (MD, DO, CNM, etc.)
Medical terms: Use the glossary. When a term is not in the glossary and you are uncertain, wrap it in
```

```
Tone: Match the source document's tone. For medical patient-facing content, be warm, reassuring, and professional
Output format: Return the complete translated file content (frontmatter + body), not just the changed parts

Adapting for Other Projects

This skill is project-agnostic. To use it on a new codebase:

File convention: Set your project's locale file naming pattern (
```
.es.md
```
,
```
.es.json
```
,
```
locales/es/
```
, etc.)
Glossary: Extract domain-specific terms into a glossary file
Instructions: Write a one-paragraph style guide for the target language
Validation: Point Claude at your project's i18n tests or write a simple key-comparison check

Gemini CLI Wrapper Compatibility

Many users have a shell wrapper (e.g.,

~/bin/gemini

) that adds

--yolo

-y

by default. This conflicts with

--approval-mode

. The script avoids this by:

Preferring
```
pnpx @google/gemini-cli
```
(calls the package directly, no wrapper)
Falling back to
```
gemini
```
on PATH only if
```
pnpx
```
is unavailable
Accepting
```
--gemini-bin /path/to/binary
```
to override detection entirely

The script uses

-o json

for structured output, which returns a

{session_id, response, stats}

envelope. The embedded Python parser extracts the

response

field and handles markdown code fences, null bytes, and MCP warning prefixes automatically.

Token-Based Batching

Instead of a fixed file count, the script estimates input tokens (1 token ~ 4 chars) and stops adding files when the budget is reached. The default

--max-tokens 80000

leaves room for the translation output (roughly 1.2x the input for EN->ES). Files that exceed the budget are listed as skipped so the caller can run a follow-up batch.

Truncation Recovery

When Gemini hits its output token limit and truncates the JSON mid-entry, the script recovers by:

Detecting incomplete JSON
Progressively trimming from the end to find valid JSON boundaries
Dropping the last (likely truncated) entry
Reporting how many complete translations were recovered

Agentic Workflow & Vibe Coding

Iterative Translation: Do not expect perfect linguistic tone or structural preservation on the first batch run. Draft a small test batch, review the output for tone and formatting, isolate any consistent translation errors, refine the glossary or prompt instructions ONE variable at a time, and rerun the test before processing the entire project.
Vibe Coding: Commit your working source content and glossary updates locally before running the translation batch, and commit the generated
```
.es.*
```
files separately so you can easily revert if the model hallucinated structure.

Limitations

Single language pair per call: The script handles one source/target pair. For multi-language projects, run once per target language.
Gemini CLI startup: ~13s overhead per batch call. Batching amortizes this.
Output token limit: 80K input tokens is the default budget. If truncation occurs, reduce
```
--max-tokens
```
.
No streaming: The script waits for the full response. Large batches may take 30-60s of model time on top of startup.
Python 3 required: The JSON extraction uses an embedded Python script.

gemini-translate

NPX Install

Tags

SKILL.md Content