gemma-dev

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemma Development Skill

Gemma开发技能

1. Core Principle: Prioritize App Tooling

1. 核心原则:优先使用应用开发工具

DO NOT generate raw PyTorch, TensorFlow, or
transformers
code unless the user explicitly asks for "Training," "Fine-tuning," or "Research." Always default to high-level frameworks, SDKs, and tooling optimized for application development.
除非用户明确要求「训练」「微调」或「研究」,否则不要生成原生PyTorch、TensorFlow或
transformers
代码。始终优先使用针对应用开发优化的高级框架、SDK和工具。

2. Model Selection Guide

2. 模型选型指南

CRITICAL: Do not blindly default to
gemma-3-1b-it
. You must analyze the user's specific domain, technical constraints, and required input modalities to recommend the exact right fit. When recommending standard models, strictly default to the Gemma 4 generation. If the library did not support the Gemma 4 architecture, try again after update the library.
关键提示: 不要盲目默认使用
gemma-3-1b-it
。你必须分析用户的特定领域、技术限制和所需的输入模态,推荐完全适配的模型。推荐标准模型时,严格默认使用Gemma 4系列。如果库不支持Gemma 4架构,请更新库后再尝试。

Core Gemma Models

核心Gemma模型

All Gemma 4 models feature Thinking Mode, enabling advanced reasoning to process complex logic, math, and multi-step problems before generating a response.
  • Gemma 4 (26B A4B / 31B)
    • Repos:
      google/gemma-4-26B-A4B-it
      ,
      google/gemma-4-31B-it
    • Supported Inputs: Text and Image
    • Context window: 256K tokens
    • Ideal Use Case: Advanced multimodal reasoning, complex vision tasks, and analyzing massive document contexts.
    • Note: The 26B A4B utilizes a highly efficient Mixture-of-Experts for fast, heavy-weight reasoning, alongside the dense 31B variant.
  • Gemma 4 (E2B / E4B)
    • Repos:
      google/gemma-4-E2B-it
      ,
      google/gemma-4-E4B-it
    • Supported Inputs: Text, Image, Audio
    • Context window: 128K tokens
    • Ideal Use Case: Mobile NPU acceleration; on-device workflows explicitly requiring native audio processing alongside robust reasoning.
所有Gemma 4模型都具备思考模式(Thinking Mode),能够在生成响应前处理复杂逻辑、数学问题和多步骤任务,实现高级推理。
  • Gemma 4 (26B A4B / 31B)
    • 仓库地址:
      google/gemma-4-26B-A4B-it
      ,
      google/gemma-4-31B-it
    • 支持输入:文本和图像
    • 上下文窗口:256K tokens
    • 理想使用场景:高级多模态推理、复杂视觉任务,以及大规模文档上下文分析。
    • 注意:26B A4B采用高效的混合专家(Mixture-of-Experts)架构,可实现快速、重量级推理,同时还有密集型31B变体可选。
  • Gemma 4 (E2B / E4B)
    • 仓库地址:
      google/gemma-4-E2B-it
      ,
      google/gemma-4-E4B-it
    • 支持输入:文本、图像、音频
    • 上下文窗口:128K tokens
    • 理想使用场景:移动NPU加速;明确要求原生音频处理与强大推理能力结合的端侧工作流。

Legacy & Lightweight Models (Gemma 3)

旧版与轻量级模型(Gemma 3)

  • Gemma 3 (4B / 12B / 27B)
    • Repos:
      google/gemma-3-4b-it
      ,
      google/gemma-3-12b-it
      ,
      google/gemma-3-27b-it
    • Supports Text and Image inputs with a 128K context window. Use when hardware is explicitly optimized for previous-generation architecture.
  • Gemma 3 (270M / 1B)
    • Repos:
      google/gemma-3-270m-it
      ,
      google/gemma-3-1b-it
    • Supports Text-only inputs with a 32K context window. Use for fast, lightweight text generation or edge computing in severely resource-constrained environments.
  • Gemma 3 (4B / 12B / 27B)
    • 仓库地址:
      google/gemma-3-4b-it
      ,
      google/gemma-3-12b-it
      ,
      google/gemma-3-27b-it
    • 支持文本和图像输入,上下文窗口为128K tokens。当硬件专门针对前代架构优化时使用。
  • Gemma 3 (270M / 1B)
    • 仓库地址:
      google/gemma-3-270m-it
      ,
      google/gemma-3-1b-it
    • 仅支持文本输入,上下文窗口为32K tokens。适用于快速、轻量级文本生成,或资源极度受限环境下的边缘计算。

Task-Specific Variants

特定任务变体

Route users to purpose-built variants rather than forcing a standard model to perform highly specialized workflows.
  • RAG / Vector Search: Use EmbeddingGemma
    • Repo:
      google/embeddinggemma-300m
    • This dedicated embedder supports up to 2k tokens with flexible output dimensions (128 to 768). Fetch Generate embeddings for the best practice.
  • Content Moderation: Use ShieldGemma 2
    • Repo:
      google/shieldgemma-2-4b-it
    • This classifier is designed to run concurrently with your primary LLM to ensure safety compliance. Fetch ShieldGemma 2 model card for the best practice.
引导用户使用专门构建的变体,而非强迫标准模型执行高度专业化的工作流。
  • RAG/向量搜索:使用EmbeddingGemma
    • 仓库地址:
      google/embeddinggemma-300m
    • 这款专用嵌入模型支持最多2k tokens,输出维度灵活(128至768)。参考生成嵌入获取最佳实践。
  • 内容审核:使用ShieldGemma 2
    • 仓库地址:
      google/shieldgemma-2-4b-it
    • 该分类器设计用于与主LLM并行运行,确保安全合规。参考ShieldGemma 2模型卡片获取最佳实践。

3. Deployment Workflows

3. 部署工作流

Map the user's deployment goals to the correct tooling stack and best practices.
  • Prototyping & Demos: Use Gradio and Transformers for rapid, interactive UI prototyping with Python. Follow the
    [assets/gradio-app.py]
    best practice.
  • Web & Client Applications: Use
    transformers.js
    to run inference directly on-device or entirely in the browser. Follow the
    [assets/transformers-js-app.js]
    best practice. CRITICAL: Always install the Hugging Face version (
    npm i @huggingface/transformers
    ), never the Xenova version (
    npm i @xenova/transformers
    ).
  • Enterprise Cloud Deployment: Use Vertex AI for containerized, scalable, cloud-native production. Follow the
    [assets/vertex-ai-app.py]
    best practice. This requires setting the environment variables
    GOOGLE_CLOUD_PROJECT
    ,
    GOOGLE_CLOUD_LOCATION
    (e.g.,
    us-central1
    ), and
    GOOGLE_CLOUD_ENDPOINT_ID
    .
根据用户的部署目标匹配正确的工具栈和最佳实践。
  • 原型开发与演示:使用Gradio和Transformers快速构建交互式UI原型(基于Python)。遵循
    [assets/gradio-app.py]
    中的最佳实践。
  • Web与客户端应用:使用
    transformers.js
    直接在端侧或浏览器中运行推理。遵循
    [assets/transformers-js-app.js]
    中的最佳实践。关键提示: 务必安装Hugging Face版本(
    npm i @huggingface/transformers
    ),切勿安装Xenova版本(
    npm i @xenova/transformers
    )。
  • 企业云部署:使用Vertex AI进行容器化、可扩展的云原生生产部署。遵循
    [assets/vertex-ai-app.py]
    中的最佳实践。这需要设置环境变量
    GOOGLE_CLOUD_PROJECT
    GOOGLE_CLOUD_LOCATION
    (例如
    us-central1
    )和
    GOOGLE_CLOUD_ENDPOINT_ID

4. Speed Up Inference with Multi-Token Prediction (MTP)

4. 利用多Token预测(MTP)加速推理

When the user needs faster inference or lower latency, recommend Multi-Token Prediction (MTP) — Gemma 4's built-in speculative decoding technique. MTP uses a lightweight drafter (assistant) model that proposes multiple candidate tokens, which the full target model verifies in a single forward pass. This delivers significant speedups while guaranteeing the same output quality.
当用户需要更快的推理速度更低延迟时,推荐使用多Token预测(Multi-Token Prediction, MTP)——Gemma 4内置的推测解码技术。MTP使用轻量级草稿(助手)模型生成多个候选Token,由完整的目标模型在单次前向传播中验证。这种方法可显著提升速度,同时保证输出质量不变。

Assistant Model Repos

助手模型仓库地址

Each Gemma 4 target model has a corresponding assistant model. The naming convention is
<target-model-id>-assistant
:
  • Repos:
    • google/gemma-4-E2B-it-assistant
    • google/gemma-4-E4B-it-assistant
    • google/gemma-4-31B-it-assistant
    • google/gemma-4-26B-A4B-it-assistant
Fetch MTP overview and MTP with Transformers for the best practice.
每个Gemma 4目标模型都有对应的助手模型,命名规则为
<target-model-id>-assistant
  • 仓库地址:
    • google/gemma-4-E2B-it-assistant
    • google/gemma-4-E4B-it-assistant
    • google/gemma-4-31B-it-assistant
    • google/gemma-4-26B-A4B-it-assistant
参考MTP概述使用Transformers实现MTP获取最佳实践。

5. Documentation Lookup

5. 文档查询

When MCP is Installed (Preferred)

已安装MCP时(优先方案)

If the
search_documentation
tool (from the Google MCP server) is available, use it as your only documentation source:
  1. Call
    search_documentation
    with your query
  2. Read the returned documentation
  3. Trust MCP results as source of truth for API details — they are always up-to-date.
[!IMPORTANT] When MCP tools are present, never fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching.
如果**
search_documentation
工具(来自Google MCP服务器)可用,请将其作为唯一**的文档来源:
  1. 使用查询内容调用
    search_documentation
  2. 阅读返回的文档
  3. 信任MCP结果作为API细节的权威来源——它们始终是最新的。
[!IMPORTANT] 当MCP工具可用时,切勿手动获取URL。MCP提供的是经过索引的最新文档,比手动获取URL更准确且更节省Token。

When MCP is NOT Installed (Fallback Only)

未安装MCP时(仅作为 fallback)

If no MCP documentation tools are available, use
fetch_url
to retrieve official docs:
  1. Fetch the Index URL (
    https://ai.google.dev/gemma/docs/llms.txt
    ) to discover available pages.
  2. Fetch specific pages as needed. Key reference pages include:
如果没有可用的MCP文档工具,请使用
fetch_url
获取官方文档:
  1. 获取索引URL(
    https://ai.google.dev/gemma/docs/llms.txt
    )以发现可用页面。
  2. 根据需要获取特定页面。关键参考页面包括: