gemma-gem-browser-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemma Gem Browser AI

Gemma Gem 浏览器AI

Skill by ara.so — Daily 2026 Skills collection.
Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely on-device via WebGPU. It injects a chat overlay into every page and exposes a tool-calling agent loop that can read pages, click elements, fill forms, execute JavaScript, and take screenshots — all without sending data to any server.
Skill by ara.so — 2026年度每日技能合集。
Gemma Gem是一款Chrome扩展,通过WebGPU完全在端侧运行谷歌的Gemma 4模型。它会在每个页面注入聊天浮层,提供工具调用Agent循环,支持读取页面内容、点击元素、填写表单、执行JavaScript、截图等功能,所有操作都不会向任何服务器发送数据。

Architecture Overview

架构总览

Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution
  • Offscreen document (
    offscreen/
    ): Loads the ONNX model via
    @huggingface/transformers
    , runs the agent loop, streams tokens.
  • Service worker (
    background/
    ): Routes messages, handles
    take_screenshot
    and
    run_javascript
    .
  • Content script (
    content/
    ): Injects shadow DOM chat UI, executes DOM tools.
  • agent/
    : Zero-dependency module defining
    ModelBackend
    and
    ToolExecutor
    interfaces — extractable as a standalone library.
Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution
  • 离屏文档 (
    offscreen/
    ): 通过
    @huggingface/transformers
    加载ONNX模型,运行Agent循环,流式输出token。
  • Service worker (
    background/
    ): 路由消息,处理
    take_screenshot
    run_javascript
    请求。
  • Content script (
    content/
    ): 注入shadow DOM聊天UI,执行DOM工具。
  • agent/
    : 零依赖模块,定义了
    ModelBackend
    ToolExecutor
    接口,可提取为独立库使用。

Install & Build

安装与构建

bash
undefined
bash
undefined

Prerequisites: Node.js 18+, pnpm

前置依赖: Node.js 18+, pnpm

pnpm install
pnpm install

Development build (logging active, source maps)

开发构建(开启日志、生成Source Map)

pnpm build
pnpm build

Production build (errors only, minified)

生产构建(仅输出错误、代码压缩)

pnpm build:prod

Load the extension:
1. Open `chrome://extensions`
2. Enable **Developer mode**
3. Click **Load unpacked** → select `.output/chrome-mv3-dev/`

**Model download** happens automatically on first chat open:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB (default)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB

Models are cached in the browser's cache storage after the first run.
pnpm build:prod

加载扩展:
1. 打开`chrome://extensions`
2. 开启**开发者模式**
3. 点击**加载已解压的扩展程序** → 选择`.output/chrome-mv3-dev/`

**模型下载**会在首次打开聊天窗口时自动执行:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB(默认)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB

首次运行后,模型会被缓存到浏览器的缓存存储中。

Key Interfaces (
agent/
)

核心接口 (
agent/
)

ModelBackend

ModelBackend

typescript
// agent/types.ts
export interface ModelBackend {
  generate(
    messages: ChatMessage[],
    tools: ToolDefinition[],
    options: GenerateOptions
  ): AsyncGenerator<StreamChunk>;
}

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
}

export interface GenerateOptions {
  maxNewTokens?: number;
  thinking?: boolean;
}
typescript
// agent/types.ts
export interface ModelBackend {
  generate(
    messages: ChatMessage[],
    tools: ToolDefinition[],
    options: GenerateOptions
  ): AsyncGenerator<StreamChunk>;
}

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
}

export interface GenerateOptions {
  maxNewTokens?: number;
  thinking?: boolean;
}

ToolExecutor

ToolExecutor

typescript
// agent/types.ts
export interface ToolExecutor {
  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}
typescript
// agent/types.ts
export interface ToolExecutor {
  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}

Agent Loop

Agent Loop

typescript
// agent/loop.ts — simplified illustration
export async function* runAgentLoop(
  userMessage: string,
  history: ChatMessage[],
  model: ModelBackend,
  tools: ToolExecutor,
  toolDefs: ToolDefinition[],
  maxIterations: number
): AsyncGenerator<AgentEvent> {
  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {
    for await (const chunk of model.generate(messages, toolDefs, {})) {
      if (chunk.type === "token") yield { type: "token", token: chunk.token };
      if (chunk.type === "tool_call") {
        yield { type: "tool_start", name: chunk.name };
        const result = await tools.execute(chunk.name, chunk.args);
        yield { type: "tool_result", name: chunk.name, result };
        messages.push({ role: "tool", name: chunk.name, content: String(result) });
      }
      if (chunk.type === "done") return;
    }
  }
}
typescript
// agent/loop.ts — 简化示例
export async function* runAgentLoop(
  userMessage: string,
  history: ChatMessage[],
  model: ModelBackend,
  tools: ToolExecutor,
  toolDefs: ToolDefinition[],
  maxIterations: number
): AsyncGenerator<AgentEvent> {
  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {
    for await (const chunk of model.generate(messages, toolDefs, {})) {
      if (chunk.type === "token") yield { type: "token", token: chunk.token };
      if (chunk.type === "tool_call") {
        yield { type: "tool_start", name: chunk.name };
        const result = await tools.execute(chunk.name, chunk.args);
        yield { type: "tool_result", name: chunk.name, result };
        messages.push({ role: "tool", name: chunk.name, content: String(result) });
      }
      if (chunk.type === "done") return;
    }
  }
}

Built-in Tools

内置工具

ToolLocationDescription
read_page_content
Content scriptRead page text/HTML or a CSS selector
take_screenshot
Service workerCapture visible tab as PNG
click_element
Content scriptClick by CSS selector
type_text
Content scriptType into input by CSS selector
scroll_page
Content scriptScroll by pixel amount
run_javascript
Service workerExecute JS in page context
工具位置描述
read_page_content
Content script读取页面文本/HTML内容,或指定CSS选择器的内容
take_screenshot
Service worker将当前可见标签页捕获为PNG
click_element
Content script点击指定CSS选择器的元素
type_text
Content script在指定CSS选择器的输入框中输入内容
scroll_page
Content script按指定像素数滚动页面
run_javascript
Service worker在页面上下文执行JS代码

Adding a New Tool

添加新工具

Tools live in two places: the definition (in the offscreen agent) and the executor (in content script or service worker).
工具分为两部分:定义(在离屏Agent中)和执行器(在Content script或Service worker中)。

Step 1 — Define the tool schema

步骤1 — 定义工具schema

typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
  name: "get_page_title",
  description: "Returns the document title of the current page.",
  parameters: {
    type: "object",
    properties: {},
    required: [],
  },
};
typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
  name: "get_page_title",
  description: "Returns the document title of the current page.",
  parameters: {
    type: "object",
    properties: {},
    required: [],
  },
};

Step 2 — Register in the tool list

步骤2 — 在工具列表中注册

typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [
  // ...existing tools
  MY_TOOL_DEFINITION,
];
typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [
  // ...现有工具
  MY_TOOL_DEFINITION,
];

Step 3 — Implement execution in the content script

步骤3 — 在Content script中实现执行逻辑

typescript
// content/tools/executor.ts
export async function executeContentTool(
  name: string,
  args: Record<string, unknown>
): Promise<unknown> {
  switch (name) {
    case "get_page_title":
      return document.title;

    case "read_page_content": {
      const selector = args.selector as string | undefined;
      if (selector) {
        return document.querySelector(selector)?.textContent ?? "Not found";
      }
      return document.body.innerText;
    }

    case "click_element": {
      const el = document.querySelector(args.selector as string) as HTMLElement;
      if (!el) throw new Error(`Element not found: ${args.selector}`);
      el.click();
      return "clicked";
    }

    case "type_text": {
      const input = document.querySelector(args.selector as string) as HTMLInputElement;
      if (!input) throw new Error(`Input not found: ${args.selector}`);
      input.focus();
      input.value = args.text as string;
      input.dispatchEvent(new Event("input", { bubbles: true }));
      input.dispatchEvent(new Event("change", { bubbles: true }));
      return "typed";
    }

    default:
      throw new Error(`Unknown content tool: ${name}`);
  }
}
typescript
// content/tools/executor.ts
export async function executeContentTool(
  name: string,
  args: Record<string, unknown>
): Promise<unknown> {
  switch (name) {
    case "get_page_title":
      return document.title;

    case "read_page_content": {
      const selector = args.selector as string | undefined;
      if (selector) {
        return document.querySelector(selector)?.textContent ?? "Not found";
      }
      return document.body.innerText;
    }

    case "click_element": {
      const el = document.querySelector(args.selector as string) as HTMLElement;
      if (!el) throw new Error(`Element not found: ${args.selector}`);
      el.click();
      return "clicked";
    }

    case "type_text": {
      const input = document.querySelector(args.selector as string) as HTMLInputElement;
      if (!input) throw new Error(`Input not found: ${args.selector}`);
      input.focus();
      input.value = args.text as string;
      input.dispatchEvent(new Event("input", { bubbles: true }));
      input.dispatchEvent(new Event("change", { bubbles: true }));
      return "typed";
    }

    default:
      throw new Error(`Unknown content tool: ${name}`);
  }
}

Step 4 — Handle service-worker-side tools

步骤4 — 处理Service worker侧的工具

typescript
// background/tools.ts
export async function executeSwTool(
  name: string,
  args: Record<string, unknown>,
  tabId: number
): Promise<unknown> {
  switch (name) {
    case "take_screenshot": {
      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
      return dataUrl;
    }

    case "run_javascript": {
      const results = await chrome.scripting.executeScript({
        target: { tabId },
        func: new Function(args.code as string) as () => unknown,
      });
      return results[0]?.result ?? null;
    }

    default:
      return null; // not a SW tool — forward to content script
  }
}
typescript
// background/tools.ts
export async function executeSwTool(
  name: string,
  args: Record<string, unknown>,
  tabId: number
): Promise<unknown> {
  switch (name) {
    case "take_screenshot": {
      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
      return dataUrl;
    }

    case "run_javascript": {
      const results = await chrome.scripting.executeScript({
        target: { tabId },
        func: new Function(args.code as string) as () => unknown,
      });
      return results[0]?.result ?? null;
    }

    default:
      return null; // 非SW工具,转发到Content script
  }
}

Message Routing Pattern

消息路由模式

The service worker acts as a message bus. All communication uses
chrome.runtime.sendMessage
.
typescript
// Message types (shared/messages.ts)
export type ExtMessage =
  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
  | { type: "TOOL_RESULT"; name: string; result: unknown }
  | { type: "TOKEN"; token: string }
  | { type: "AGENT_DONE" }
  | { type: "AGENT_ERROR"; error: string };

// Offscreen → SW
chrome.runtime.sendMessage<ExtMessage>({
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId: currentTabId,
});

// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId,
});
Service worker充当消息总线,所有通信都使用
chrome.runtime.sendMessage
typescript
// 消息类型 (shared/messages.ts)
export type ExtMessage =
  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
  | { type: "TOOL_RESULT"; name: string; result: unknown }
  | { type: "TOKEN"; token: string }
  | { type: "AGENT_DONE" }
  | { type: "AGENT_ERROR"; error: string };

// 离屏文档 → SW
chrome.runtime.sendMessage<ExtMessage>({
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId: currentTabId,
});

// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId,
});

Model Configuration

模型配置

typescript
// offscreen/model.ts — loading with transformers.js
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {
  E2B: "onnx-community/gemma-4-E2B-it-ONNX",
  E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(
  size: ModelSize,
  onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
  return pipeline("text-generation", MODEL_IDS[size], {
    dtype: "q4f16",
    device: "webgpu",
    progress_callback: (p: { progress: number }) => onProgress(p.progress),
  });
}
typescript
// offscreen/model.ts — 使用transformers.js加载
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {
  E2B: "onnx-community/gemma-4-E2B-it-ONNX",
  E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(
  size: ModelSize,
  onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
  return pipeline("text-generation", MODEL_IDS[size], {
    dtype: "q4f16",
    device: "webgpu",
    progress_callback: (p: { progress: number }) => onProgress(p.progress),
  });
}

Settings & Persistence

设置与持久化

Settings are stored via
chrome.storage.sync
:
typescript
export interface GemmaGemSettings {
  modelSize: "E2B" | "E4B";
  thinking: boolean;
  maxIterations: number;
  disabledHosts: string[];
}

const DEFAULT_SETTINGS: GemmaGemSettings = {
  modelSize: "E2B",
  thinking: false,
  maxIterations: 10,
  disabledHosts: [],
};

export async function getSettings(): Promise<GemmaGemSettings> {
  const stored = await chrome.storage.sync.get("settings");
  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
  const current = await getSettings();
  await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}

// Disable extension on current host
async function disableOnCurrentSite() {
  const host = new URL(location.href).hostname;
  const settings = await getSettings();
  if (!settings.disabledHosts.includes(host)) {
    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
  }
}
设置通过
chrome.storage.sync
存储:
typescript
export interface GemmaGemSettings {
  modelSize: "E2B" | "E4B";
  thinking: boolean;
  maxIterations: number;
  disabledHosts: string[];
}

const DEFAULT_SETTINGS: GemmaGemSettings = {
  modelSize: "E2B",
  thinking: false,
  maxIterations: 10,
  disabledHosts: [],
};

export async function getSettings(): Promise<GemmaGemSettings> {
  const stored = await chrome.storage.sync.get("settings");
  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
  const current = await getSettings();
  await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}

// 在当前站点禁用扩展
async function disableOnCurrentSite() {
  const host = new URL(location.href).hostname;
  const settings = await getSettings();
  if (!settings.disabledHosts.includes(host)) {
    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
  }
}

Shadow DOM Chat UI Pattern

Shadow DOM聊天UI模式

The content script injects a shadow DOM to isolate styles:
typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
  const host = document.createElement("div");
  host.id = "gemma-gem-host";
  // Prevent page styles from leaking in
  const shadow = host.attachShadow({ mode: "closed" });

  // Inject styles
  const style = document.createElement("style");
  style.textContent = CHAT_STYLES; // imported CSS string
  shadow.appendChild(style);

  // Inject chat container
  const container = document.createElement("div");
  container.id = "gemma-gem-container";
  shadow.appendChild(container);

  document.body.appendChild(host);
  return shadow;
}
Content script会注入shadow DOM来隔离样式:
typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
  const host = document.createElement("div");
  host.id = "gemma-gem-host";
  // 防止页面样式渗透
  const shadow = host.attachShadow({ mode: "closed" });

  // 注入样式
  const style = document.createElement("style");
  style.textContent = CHAT_STYLES; // 导入的CSS字符串
  shadow.appendChild(style);

  // 注入聊天容器
  const container = document.createElement("div");
  container.id = "gemma-gem-container";
  shadow.appendChild(container);

  document.body.appendChild(host);
  return shadow;
}

Debugging

调试

All logs use
[Gemma Gem]
prefix. Development builds log info/debug/warn; production only logs errors.
undefined
所有日志都带有
[Gemma Gem]
前缀。开发构建版本会输出info/debug/warn级别的日志;生产版本仅输出错误日志。
undefined

Service worker logs

Service worker日志

chrome://extensions → Gemma Gem → "Inspect views: service worker"
chrome://extensions → Gemma Gem → "检查视图: service worker"

Offscreen document (most useful: model loading, prompts, tool calls)

离屏文档(最实用:模型加载、Prompt、工具调用)

chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
chrome://extensions → Gemma Gem → "检查视图: offscreen.html"

Content script logs

Content script日志

DevTools on any page → Console (filter: [Gemma Gem])
任意页面的DevTools → 控制台(过滤: [Gemma Gem])

All extension contexts

所有扩展上下文

chrome://inspect#other

Key things to check in offscreen document logs:
- Model download progress
- Full prompt construction
- Token counts per turn
- Raw model output (before tool call parsing)
- Tool execution results
chrome://inspect#other

在离屏文档日志中可检查的核心内容:
- 模型下载进度
- 完整Prompt构建过程
- 每轮的token数量
- 原始模型输出(工具调用解析前)
- 工具执行结果

Common Patterns & Gotchas

常用模式与注意事项

WebGPU availability check:
typescript
if (!navigator.gpu) {
  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");
Offscreen document lifecycle — Chrome may suspend the offscreen document. Ping it before sending messages:
typescript
async function ensureOffscreen() {
  const existing = await chrome.offscreen.hasDocument();
  if (!existing) {
    await chrome.offscreen.createDocument({
      url: "offscreen.html",
      reasons: [chrome.offscreen.Reason.WORKERS],
      justification: "Run Gemma 4 inference via WebGPU",
    });
  }
}
Context window management — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with
clear_context
or limit stored turns:
typescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
  return messages.slice(-MAX_HISTORY_TURNS * 2);
}
Tool call parsing — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:
typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
  try {
    return JSON.parse(raw);
  } catch {
    return null; // still streaming
  }
}
CSS selector safety for DOM tools:
typescript
function safeQuerySelector(selector: string): Element | null {
  try {
    return document.querySelector(selector);
  } catch {
    return null; // invalid selector from model
  }
}
WebGPU可用性检查:
typescript
if (!navigator.gpu) {
  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");
离屏文档生命周期 — Chrome可能会暂停离屏文档,发送消息前先ping确认存活:
typescript
async function ensureOffscreen() {
  const existing = await chrome.offscreen.hasDocument();
  if (!existing) {
    await chrome.offscreen.createDocument({
      url: "offscreen.html",
      reasons: [chrome.offscreen.Reason.WORKERS],
      justification: "Run Gemma 4 inference via WebGPU",
    });
  }
}
上下文窗口管理 — Gemma 4支持128K token,但上下文越长推理速度越慢。可通过
clear_context
按页面清空历史,或限制存储的对话轮数:
typescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
  return messages.slice(-MAX_HISTORY_TURNS * 2);
}
工具调用解析 — Gemma 4会输出结构化格式的工具调用。如果添加自定义解析逻辑,需要防范不完整/流式传输的JSON:
typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
  try {
    return JSON.parse(raw);
  } catch {
    return null; // 仍在流式传输
  }
}
DOM工具的CSS选择器安全:
typescript
function safeQuerySelector(selector: string): Element | null {
  try {
    return document.querySelector(selector);
  } catch {
    return null; // 模型输出了无效的选择器
  }
}