gemma-dev

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemma Development Skill

Gemma开发技能

1. Core Principle: Prioritize App Tooling

1. 核心原则：优先使用应用开发工具

DO NOT generate raw PyTorch, TensorFlow, or

transformers

code unless the user explicitly asks for "Training," "Fine-tuning," or "Research." Always default to high-level frameworks, SDKs, and tooling optimized for application development.

除非用户明确要求「训练」「微调」或「研究」，否则不要生成原生PyTorch、TensorFlow或

transformers

代码。始终优先使用针对应用开发优化的高级框架、SDK和工具。

2. Model Selection Guide

2. 模型选型指南

CRITICAL: Do not blindly default to

gemma-3-1b-it

. You must analyze the user's specific domain, technical constraints, and required input modalities to recommend the exact right fit. When recommending standard models, strictly default to the Gemma 4 generation. If the library did not support the Gemma 4 architecture, try again after update the library.

关键提示： 不要盲目默认使用

gemma-3-1b-it

。你必须分析用户的特定领域、技术限制和所需的输入模态，推荐完全适配的模型。推荐标准模型时，严格默认使用Gemma 4系列。如果库不支持Gemma 4架构，请更新库后再尝试。

Core Gemma Models

核心Gemma模型

All Gemma 4 models feature Thinking Mode, enabling advanced reasoning to process complex logic, math, and multi-step problems before generating a response.

Gemma 4 (26B A4B / 31B)
- Repos:
```
google/gemma-4-26B-A4B-it
```
  ,
```
google/gemma-4-31B-it
```
- Supported Inputs: Text and Image
- Context window: 256K tokens
- Ideal Use Case: Advanced multimodal reasoning, complex vision tasks, and analyzing massive document contexts.
- Note: The 26B A4B utilizes a highly efficient Mixture-of-Experts for fast, heavy-weight reasoning, alongside the dense 31B variant.
Gemma 4 (E2B / E4B)
- Repos:
```
google/gemma-4-E2B-it
```
  ,
```
google/gemma-4-E4B-it
```
- Supported Inputs: Text, Image, Audio
- Context window: 128K tokens
- Ideal Use Case: Mobile NPU acceleration; on-device workflows explicitly requiring native audio processing alongside robust reasoning.

所有Gemma 4模型都具备思考模式（Thinking Mode），能够在生成响应前处理复杂逻辑、数学问题和多步骤任务，实现高级推理。

Gemma 4 (26B A4B / 31B)
- 仓库地址：
```
google/gemma-4-26B-A4B-it
```
  ,
```
google/gemma-4-31B-it
```
- 支持输入：文本和图像
- 上下文窗口：256K tokens
- 理想使用场景：高级多模态推理、复杂视觉任务，以及大规模文档上下文分析。
- 注意：26B A4B采用高效的混合专家（Mixture-of-Experts）架构，可实现快速、重量级推理，同时还有密集型31B变体可选。
Gemma 4 (E2B / E4B)
- 仓库地址：
```
google/gemma-4-E2B-it
```
  ,
```
google/gemma-4-E4B-it
```
- 支持输入：文本、图像、音频
- 上下文窗口：128K tokens
- 理想使用场景：移动NPU加速；明确要求原生音频处理与强大推理能力结合的端侧工作流。

Legacy & Lightweight Models (Gemma 3)

旧版与轻量级模型（Gemma 3）

Gemma 3 (4B / 12B / 27B)
- Repos:
```
google/gemma-3-4b-it
```
  ,
```
google/gemma-3-12b-it
```
  ,
```
google/gemma-3-27b-it
```
- Supports Text and Image inputs with a 128K context window. Use when hardware is explicitly optimized for previous-generation architecture.
Gemma 3 (270M / 1B)
- Repos:
```
google/gemma-3-270m-it
```
  ,
```
google/gemma-3-1b-it
```
- Supports Text-only inputs with a 32K context window. Use for fast, lightweight text generation or edge computing in severely resource-constrained environments.

Gemma 3 (4B / 12B / 27B)
- 仓库地址：
```
google/gemma-3-4b-it
```
  ,
```
google/gemma-3-12b-it
```
  ,
```
google/gemma-3-27b-it
```
- 支持文本和图像输入，上下文窗口为128K tokens。当硬件专门针对前代架构优化时使用。
Gemma 3 (270M / 1B)
- 仓库地址：
```
google/gemma-3-270m-it
```
  ,
```
google/gemma-3-1b-it
```
- 仅支持文本输入，上下文窗口为32K tokens。适用于快速、轻量级文本生成，或资源极度受限环境下的边缘计算。

Task-Specific Variants

特定任务变体

Route users to purpose-built variants rather than forcing a standard model to perform highly specialized workflows.

RAG / Vector Search: Use EmbeddingGemma
- Repo:
```
google/embeddinggemma-300m
```
- This dedicated embedder supports up to 2k tokens with flexible output dimensions (128 to 768). Fetch Generate embeddings for the best practice.
Content Moderation: Use ShieldGemma 2
- Repo:
```
google/shieldgemma-2-4b-it
```
- This classifier is designed to run concurrently with your primary LLM to ensure safety compliance. Fetch ShieldGemma 2 model card for the best practice.

引导用户使用专门构建的变体，而非强迫标准模型执行高度专业化的工作流。

RAG/向量搜索：使用EmbeddingGemma
- 仓库地址：
```
google/embeddinggemma-300m
```
- 这款专用嵌入模型支持最多2k tokens，输出维度灵活（128至768）。参考生成嵌入获取最佳实践。
内容审核：使用ShieldGemma 2
- 仓库地址：
```
google/shieldgemma-2-4b-it
```
- 该分类器设计用于与主LLM并行运行，确保安全合规。参考ShieldGemma 2模型卡片获取最佳实践。

3. Deployment Workflows

3. 部署工作流

Map the user's deployment goals to the correct tooling stack and best practices.

Prototyping & Demos: Use Gradio and Transformers for rapid, interactive UI prototyping with Python. Follow the
```
[assets/gradio-app.py]
```
best practice.
Web & Client Applications: Use
```
transformers.js
```
to run inference directly on-device or entirely in the browser. Follow the
```
[assets/transformers-js-app.js]
```
best practice. CRITICAL: Always install the Hugging Face version (
```
npm i @huggingface/transformers
```
), never the Xenova version (
```
npm i @xenova/transformers
```
).
Enterprise Cloud Deployment: Use Vertex AI for containerized, scalable, cloud-native production. Follow the
```
[assets/vertex-ai-app.py]
```
best practice. This requires setting the environment variables
```
GOOGLE_CLOUD_PROJECT
```
,
```
GOOGLE_CLOUD_LOCATION
```
(e.g.,
```
us-central1
```
), and
```
GOOGLE_CLOUD_ENDPOINT_ID
```
.

根据用户的部署目标匹配正确的工具栈和最佳实践。

原型开发与演示：使用Gradio和Transformers快速构建交互式UI原型（基于Python）。遵循
```
[assets/gradio-app.py]
```
中的最佳实践。
Web与客户端应用：使用
```
transformers.js
```
直接在端侧或浏览器中运行推理。遵循
```
[assets/transformers-js-app.js]
```
中的最佳实践。关键提示： 务必安装Hugging Face版本（
```
npm i @huggingface/transformers
```
），切勿安装Xenova版本（
```
npm i @xenova/transformers
```
）。
企业云部署：使用Vertex AI进行容器化、可扩展的云原生生产部署。遵循
```
[assets/vertex-ai-app.py]
```
中的最佳实践。这需要设置环境变量
```
GOOGLE_CLOUD_PROJECT
```
、
```
GOOGLE_CLOUD_LOCATION
```
（例如
```
us-central1
```
）和
```
GOOGLE_CLOUD_ENDPOINT_ID
```
。

4. Speed Up Inference with Multi-Token Prediction (MTP)

4. 利用多Token预测（MTP）加速推理

When the user needs faster inference or lower latency, recommend Multi-Token Prediction (MTP) — Gemma 4's built-in speculative decoding technique. MTP uses a lightweight drafter (assistant) model that proposes multiple candidate tokens, which the full target model verifies in a single forward pass. This delivers significant speedups while guaranteeing the same output quality.

当用户需要更快的推理速度或更低延迟时，推荐使用多Token预测（Multi-Token Prediction, MTP）——Gemma 4内置的推测解码技术。MTP使用轻量级草稿（助手）模型生成多个候选Token，由完整的目标模型在单次前向传播中验证。这种方法可显著提升速度，同时保证输出质量不变。

Assistant Model Repos

助手模型仓库地址

Each Gemma 4 target model has a corresponding assistant model. The naming convention is

<target-model-id>-assistant

Repos:

```
google/gemma-4-E2B-it-assistant
```
```
google/gemma-4-E4B-it-assistant
```
```
google/gemma-4-31B-it-assistant
```
```
google/gemma-4-26B-A4B-it-assistant
```

Fetch MTP overview and MTP with Transformers for the best practice.

每个Gemma 4目标模型都有对应的助手模型，命名规则为

<target-model-id>-assistant

：

仓库地址：

```
google/gemma-4-E2B-it-assistant
```
```
google/gemma-4-E4B-it-assistant
```
```
google/gemma-4-31B-it-assistant
```
```
google/gemma-4-26B-A4B-it-assistant
```

参考MTP概述和使用Transformers实现MTP获取最佳实践。

5. Documentation Lookup

5. 文档查询

When MCP is Installed (Preferred)

已安装MCP时（优先方案）

If the search_documentation
tool (from the Google MCP server) is available, use it as your only documentation source:

Call
```
search_documentation
```
with your query
Read the returned documentation
Trust MCP results as source of truth for API details — they are always up-to-date.

[!IMPORTANT] When MCP tools are present, never fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching.

如果**

search_documentation

工具（来自Google MCP服务器）可用，请将其作为唯一**的文档来源：

使用查询内容调用
```
search_documentation
```
阅读返回的文档
信任MCP结果作为API细节的权威来源——它们始终是最新的。

[!IMPORTANT] 当MCP工具可用时，切勿手动获取URL。MCP提供的是经过索引的最新文档，比手动获取URL更准确且更节省Token。

When MCP is NOT Installed (Fallback Only)

未安装MCP时（仅作为 fallback）

If no MCP documentation tools are available, use

fetch_url

to retrieve official docs:

Fetch the Index URL (
```
https://ai.google.dev/gemma/docs/llms.txt
```
) to discover available pages.
Fetch specific pages as needed. Key reference pages include:

如果没有可用的MCP文档工具，请使用

fetch_url

获取官方文档：

获取索引URL（
```
https://ai.google.dev/gemma/docs/llms.txt
```
）以发现可用页面。
根据需要获取特定页面。关键参考页面包括：