gemma-dev
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemma Development Skill
Gemma开发技能
1. Core Principle: Prioritize App Tooling
1. 核心原则:优先使用应用开发工具
DO NOT generate raw PyTorch, TensorFlow, or code unless the user explicitly asks for "Training," "Fine-tuning," or "Research." Always default to high-level frameworks, SDKs, and tooling optimized for application development.
transformers除非用户明确要求「训练」「微调」或「研究」,否则不要生成原生PyTorch、TensorFlow或代码。始终优先使用针对应用开发优化的高级框架、SDK和工具。
transformers2. Model Selection Guide
2. 模型选型指南
CRITICAL: Do not blindly default to . You must analyze the user's specific domain, technical constraints, and required input modalities to recommend the exact right fit. When recommending standard models, strictly default to the Gemma 4 generation. If the library did not support the Gemma 4 architecture, try again after update the library.
gemma-3-1b-it关键提示: 不要盲目默认使用。你必须分析用户的特定领域、技术限制和所需的输入模态,推荐完全适配的模型。推荐标准模型时,严格默认使用Gemma 4系列。如果库不支持Gemma 4架构,请更新库后再尝试。
gemma-3-1b-itCore Gemma Models
核心Gemma模型
All Gemma 4 models feature Thinking Mode, enabling advanced reasoning to process complex logic, math, and multi-step problems before generating a response.
- Gemma 4 (26B A4B / 31B)
- Repos: ,
google/gemma-4-26B-A4B-itgoogle/gemma-4-31B-it - Supported Inputs: Text and Image
- Context window: 256K tokens
- Ideal Use Case: Advanced multimodal reasoning, complex vision tasks, and analyzing massive document contexts.
- Note: The 26B A4B utilizes a highly efficient Mixture-of-Experts for fast, heavy-weight reasoning, alongside the dense 31B variant.
- Repos:
- Gemma 4 (E2B / E4B)
- Repos: ,
google/gemma-4-E2B-itgoogle/gemma-4-E4B-it - Supported Inputs: Text, Image, Audio
- Context window: 128K tokens
- Ideal Use Case: Mobile NPU acceleration; on-device workflows explicitly requiring native audio processing alongside robust reasoning.
- Repos:
所有Gemma 4模型都具备思考模式(Thinking Mode),能够在生成响应前处理复杂逻辑、数学问题和多步骤任务,实现高级推理。
- Gemma 4 (26B A4B / 31B)
- 仓库地址:,
google/gemma-4-26B-A4B-itgoogle/gemma-4-31B-it - 支持输入:文本和图像
- 上下文窗口:256K tokens
- 理想使用场景:高级多模态推理、复杂视觉任务,以及大规模文档上下文分析。
- 注意:26B A4B采用高效的混合专家(Mixture-of-Experts)架构,可实现快速、重量级推理,同时还有密集型31B变体可选。
- 仓库地址:
- Gemma 4 (E2B / E4B)
- 仓库地址:,
google/gemma-4-E2B-itgoogle/gemma-4-E4B-it - 支持输入:文本、图像、音频
- 上下文窗口:128K tokens
- 理想使用场景:移动NPU加速;明确要求原生音频处理与强大推理能力结合的端侧工作流。
- 仓库地址:
Legacy & Lightweight Models (Gemma 3)
旧版与轻量级模型(Gemma 3)
- Gemma 3 (4B / 12B / 27B)
- Repos: ,
google/gemma-3-4b-it,google/gemma-3-12b-itgoogle/gemma-3-27b-it - Supports Text and Image inputs with a 128K context window. Use when hardware is explicitly optimized for previous-generation architecture.
- Repos:
- Gemma 3 (270M / 1B)
- Repos: ,
google/gemma-3-270m-itgoogle/gemma-3-1b-it - Supports Text-only inputs with a 32K context window. Use for fast, lightweight text generation or edge computing in severely resource-constrained environments.
- Repos:
- Gemma 3 (4B / 12B / 27B)
- 仓库地址:,
google/gemma-3-4b-it,google/gemma-3-12b-itgoogle/gemma-3-27b-it - 支持文本和图像输入,上下文窗口为128K tokens。当硬件专门针对前代架构优化时使用。
- 仓库地址:
- Gemma 3 (270M / 1B)
- 仓库地址:,
google/gemma-3-270m-itgoogle/gemma-3-1b-it - 仅支持文本输入,上下文窗口为32K tokens。适用于快速、轻量级文本生成,或资源极度受限环境下的边缘计算。
- 仓库地址:
Task-Specific Variants
特定任务变体
Route users to purpose-built variants rather than forcing a standard model to perform highly specialized workflows.
- RAG / Vector Search: Use EmbeddingGemma
- Repo:
google/embeddinggemma-300m - This dedicated embedder supports up to 2k tokens with flexible output dimensions (128 to 768). Fetch Generate embeddings for the best practice.
- Repo:
- Content Moderation: Use ShieldGemma 2
- Repo:
google/shieldgemma-2-4b-it - This classifier is designed to run concurrently with your primary LLM to ensure safety compliance. Fetch ShieldGemma 2 model card for the best practice.
- Repo:
引导用户使用专门构建的变体,而非强迫标准模型执行高度专业化的工作流。
- RAG/向量搜索:使用EmbeddingGemma
- 仓库地址:
google/embeddinggemma-300m - 这款专用嵌入模型支持最多2k tokens,输出维度灵活(128至768)。参考生成嵌入获取最佳实践。
- 仓库地址:
- 内容审核:使用ShieldGemma 2
- 仓库地址:
google/shieldgemma-2-4b-it - 该分类器设计用于与主LLM并行运行,确保安全合规。参考ShieldGemma 2模型卡片获取最佳实践。
- 仓库地址:
3. Deployment Workflows
3. 部署工作流
Map the user's deployment goals to the correct tooling stack and best practices.
- Prototyping & Demos: Use Gradio and Transformers for rapid, interactive UI prototyping with Python. Follow the best practice.
[assets/gradio-app.py] - Web & Client Applications: Use to run inference directly on-device or entirely in the browser. Follow the
transformers.jsbest practice. CRITICAL: Always install the Hugging Face version ([assets/transformers-js-app.js]), never the Xenova version (npm i @huggingface/transformers).npm i @xenova/transformers - Enterprise Cloud Deployment: Use Vertex AI for containerized, scalable, cloud-native production. Follow the best practice. This requires setting the environment variables
[assets/vertex-ai-app.py],GOOGLE_CLOUD_PROJECT(e.g.,GOOGLE_CLOUD_LOCATION), andus-central1.GOOGLE_CLOUD_ENDPOINT_ID
根据用户的部署目标匹配正确的工具栈和最佳实践。
- 原型开发与演示:使用Gradio和Transformers快速构建交互式UI原型(基于Python)。遵循中的最佳实践。
[assets/gradio-app.py] - Web与客户端应用:使用直接在端侧或浏览器中运行推理。遵循
transformers.js中的最佳实践。关键提示: 务必安装Hugging Face版本([assets/transformers-js-app.js]),切勿安装Xenova版本(npm i @huggingface/transformers)。npm i @xenova/transformers - 企业云部署:使用Vertex AI进行容器化、可扩展的云原生生产部署。遵循中的最佳实践。这需要设置环境变量
[assets/vertex-ai-app.py]、GOOGLE_CLOUD_PROJECT(例如GOOGLE_CLOUD_LOCATION)和us-central1。GOOGLE_CLOUD_ENDPOINT_ID
4. Speed Up Inference with Multi-Token Prediction (MTP)
4. 利用多Token预测(MTP)加速推理
When the user needs faster inference or lower latency, recommend Multi-Token Prediction (MTP) — Gemma 4's built-in speculative decoding technique. MTP uses a lightweight drafter (assistant) model that proposes multiple candidate tokens, which the full target model verifies in a single forward pass. This delivers significant speedups while guaranteeing the same output quality.
当用户需要更快的推理速度或更低延迟时,推荐使用多Token预测(Multi-Token Prediction, MTP)——Gemma 4内置的推测解码技术。MTP使用轻量级草稿(助手)模型生成多个候选Token,由完整的目标模型在单次前向传播中验证。这种方法可显著提升速度,同时保证输出质量不变。
Assistant Model Repos
助手模型仓库地址
Each Gemma 4 target model has a corresponding assistant model. The naming convention is :
<target-model-id>-assistant- Repos:
google/gemma-4-E2B-it-assistantgoogle/gemma-4-E4B-it-assistantgoogle/gemma-4-31B-it-assistantgoogle/gemma-4-26B-A4B-it-assistant
Fetch MTP overview and MTP with Transformers for the best practice.
每个Gemma 4目标模型都有对应的助手模型,命名规则为:
<target-model-id>-assistant- 仓库地址:
google/gemma-4-E2B-it-assistantgoogle/gemma-4-E4B-it-assistantgoogle/gemma-4-31B-it-assistantgoogle/gemma-4-26B-A4B-it-assistant
参考MTP概述和使用Transformers实现MTP获取最佳实践。
5. Documentation Lookup
5. 文档查询
When MCP is Installed (Preferred)
已安装MCP时(优先方案)
If the tool (from the Google MCP server) is available, use it as your only documentation source:
search_documentation- Call with your query
search_documentation - Read the returned documentation
- Trust MCP results as source of truth for API details — they are always up-to-date.
[!IMPORTANT] When MCP tools are present, never fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching.
如果**工具(来自Google MCP服务器)可用,请将其作为唯一**的文档来源:
search_documentation- 使用查询内容调用
search_documentation - 阅读返回的文档
- 信任MCP结果作为API细节的权威来源——它们始终是最新的。
[!IMPORTANT] 当MCP工具可用时,切勿手动获取URL。MCP提供的是经过索引的最新文档,比手动获取URL更准确且更节省Token。
When MCP is NOT Installed (Fallback Only)
未安装MCP时(仅作为 fallback)
If no MCP documentation tools are available, use to retrieve official docs:
fetch_url- Fetch the Index URL () to discover available pages.
https://ai.google.dev/gemma/docs/llms.txt - Fetch specific pages as needed. Key reference pages include:
如果没有可用的MCP文档工具,请使用获取官方文档:
fetch_url- 获取索引URL()以发现可用页面。
https://ai.google.dev/gemma/docs/llms.txt - 根据需要获取特定页面。关键参考页面包括: