gemini-sdk-expert

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

🤖 Skill: gemini-sdk-expert (v1.3.0)

Executive Summary

执行摘要

gemini-sdk-expert

is a high-tier skill focused on mastering the Google Gemini ecosystem. In 2026, building with AI isn't just about prompts; it's about Structural Integrity, Context Optimization, and Multimodal Orchestration. This skill provides the blueprint for building ultra-reliable, cost-effective, and powerful AI applications using the latest

@google/genai

standards.

gemini-sdk-expert

是一项专注于精通Google Gemini生态系统的高阶技能。在2026年，基于AI进行开发不再只是编写提示词，更关乎结构完整性、上下文优化和多模态编排。本技能提供了使用最新

@google/genai

标准构建超可靠、高性价比且功能强大的AI应用的蓝图。

📋 Table of Contents

核心能力
“切勿”清单（反模式）
快速入门：JSON强制输出
标准生产模式
高级智能体模式
上下文缓存策略
多模态集成
安全与负责任AI
参考库

🚀 Core Capabilities

🚀 核心能力

Strict Structured Output: Leveraging
```
responseSchema
```
for 100% reliable JSON generation.
Agentic Function Calling: enabling models to interact with private APIs and tools.
Long-Form Context Management: Using Context Caching for massive datasets (2M+ tokens).
Native Multimodal Reasoning: Processing video, audio, and documents as first-class inputs.
Latency Optimization: Strategic model selection (Flash vs. Pro) and streaming responses.

严格结构化输出：利用
```
responseSchema
```
实现100%可靠的JSON生成。
智能体函数调用：让模型能够与私有API和工具交互。
长文本上下文管理：使用上下文缓存处理超大规模数据集（200万+tokens）。
原生多模态推理：将视频、音频和文档作为一等输入进行处理。
延迟优化：策略性选择模型（Flash vs. Pro）并支持流式响应。

🚫 The "Do Not" List (Anti-Patterns)

🚫 “切勿”清单（反模式）

Anti-Pattern	Why it fails in 2026	Modern Alternative
Regex Parsing	Fragile and prone to hallucination.	Use `responseSchema` (Controlled Output).
Old SDK ( `@google/generative-ai` )	Outdated, lacks 2026 features.	Use `@google/genai` exclusively.
Uncached Large Contexts	Extremely expensive and slow.	Use Context Caching for repetitive queries.
Hardcoded API Keys	Security risk.	Use Secure Environment Variables and `GOOGLE_GENAI_API_VERSION` .
Single-Model Bias	Pro is overkill for simple extraction.	Use Gemini 3 Flash for speed/cost tasks.

反模式	2026年失效原因	现代替代方案
正则表达式解析	脆弱且易出现幻觉。	使用 `responseSchema` （受控输出）。
旧版SDK（ `@google/generative-ai` ）	已过时，缺少2026年新增功能。	仅使用 `@google/genai` 。
未缓存的大上下文	成本极高且速度缓慢。	对重复查询使用上下文缓存。
硬编码API密钥	存在安全风险。	使用安全环境变量和 `GOOGLE_GENAI_API_VERSION` 。
单一模型偏见	对于简单提取任务，Pro模型性能过剩。	针对速度/成本敏感任务使用Gemini 3 Flash。

⚡ Quick Start: JSON Enforcement

⚡ 快速入门：JSON强制输出

The #1 rule in 2026: Structure at the Source.

typescript

import { GoogleGenerativeAI, Type } from "@google/genai";

// Optional: Set API Version via env
// process.env.GOOGLE_GENAI_API_VERSION = "v1beta1";

const schema = {
  type: Type.OBJECT,
  properties: {
    status: { type: Type.STRING, enum: ["COMPLETE", "PENDING", "ERROR"] },
    summary: { type: Type.STRING },
    priority: { type: Type.NUMBER }
  },
  required: ["status", "summary"]
};

// Always set MIME type to application/json
const result = await model.generateContent({
  contents: [{ role: 'user', parts: [{ text: "Evaluate task X..." }] }],
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: schema
  }
});

2026年的首要规则：从源头定义结构。

typescript

import { GoogleGenerativeAI, Type } from "@google/genai";

// 可选：通过环境变量设置API版本
// process.env.GOOGLE_GENAI_API_VERSION = "v1beta1";

const schema = {
  type: Type.OBJECT,
  properties: {
    status: { type: Type.STRING, enum: ["COMPLETE", "PENDING", "ERROR"] },
    summary: { type: Type.STRING },
    priority: { type: Type.NUMBER }
  },
  required: ["status", "summary"]
};

// 始终将MIME类型设置为application/json
const result = await model.generateContent({
  contents: [{ role: 'user', parts: [{ text: "评估任务X..." }] }],
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: schema
  }
});

🛠 Standard Production Patterns

🛠 标准生产模式

Pattern A: The Data Extractor (Flash)

模式A：数据提取器（Flash版）

Best for processing thousands of documents quickly and cheaply.

Model:
```
gemini-3-flash
```
Config: High
```
topP
```
, low
```
temperature
```
for deterministic extraction.

最适合快速且低成本地处理数千份文档。

模型：
```
gemini-3-flash
```
配置：高
```
topP
```
、低
```
temperature
```
以实现确定性提取。

Pattern B: The Complex Reasoner (Pro)

模式B：复杂推理器（Pro版）

Best for architectural decisions, coding assistance, and deep media analysis.

Model:
```
gemini-3-pro
```
Config: Enable Strict Mode in schemas for 100% adherence.

最适合架构决策、编码辅助和深度媒体分析。

模型：
```
gemini-3-pro
```
配置：在Schema中启用严格模式以确保100%合规。

🧩 Advanced Agentic Patterns

🧩 高级智能体模式

Parallel Function Calling

并行函数调用

Reduce round-trips by allowing the model to call multiple tools at once. See References: Function Calling for implementation.

允许模型同时调用多个工具，减少往返次数。 实现细节请参见参考：函数调用

Semantic Caching

语义缓存

Store and retrieve embeddings of common queries to bypass the LLM for identical requests.

存储并检索常见查询的嵌入向量，对相同请求直接绕过LLM。

💾 Context Caching Strategy

💾 上下文缓存策略

In 2026, we don't re-upload. We cache.

Warm-up Phase: Initial context upload.
Persistence Phase: Referencing the cache via
```
cachedContent
```
.
Cleanup Phase: Managing TTLs to optimize storage costs.

See References: Context Caching for more.

2026年，我们不再重复上传，而是采用缓存。

预热阶段：初始上下文上传。
持久化阶段：通过
```
cachedContent
```
引用缓存。
清理阶段：管理TTL以优化存储成本。

更多内容请参见参考：上下文缓存

📸 Multimodal Integration

📸 多模态集成

Gemini 3 understands the world visually and audibly.

Video: Scene detection and temporal reasoning.
Audio: Sentiment, tone, and environment detection.
Document: Visual layout and OCR.

See References: Multimodal Mastery for details.

Gemini 3能够以视觉和听觉方式理解世界。

视频：场景检测与时间推理。
音频：情感、语气与环境检测。
文档：视觉布局与OCR识别。

详情请参见参考：多模态精通

📖 Reference Library

📖 参考库

Detailed deep-dives into Gemini SDK excellence:

Structured Output: Nested schemas and validation.
Function Calling: Tools, execution loops, and security.
Context Caching: Reducing cost and latency.
Multimodal 2026: Video, audio, and PDF mastery.

Updated: January 31, 2026 - 10:45

关于Gemini SDK卓越实践的深度解析：

结构化输出：嵌套Schema与验证。
函数调用：工具、执行循环与安全。
上下文缓存：降低成本与延迟。
2026多模态：视频、音频与PDF精通。

更新时间：2026年1月31日 10:45