desktop-test-agent-tauri
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDesktop Test Agent (Tauri / Electron)
桌面测试Agent(Tauri / Electron)
Two desktop surfaces:
| Engine | When |
|---|---|
| tauri-docker-testing | Cicero Tauri app in Docker — AppImage extraction, WebKitGTK virtual display, Gemini Computer Use automation, DOCX export verification |
| agent-browser electron subcommand | Electron desktop apps via |
两种桌面测试场景:
| 引擎 | 适用场景 |
|---|---|
| tauri-docker-testing | Docker环境中的Cicero Tauri应用——包含AppImage提取、WebKitGTK虚拟显示、Gemini Computer Use自动化、DOCX导出验证 |
| agent-browser electron子命令 | 通过 |
⚙️ Default Workflow (start here)
⚙️ 默认工作流(从这里开始)
When invoked, say:
"I'll start with the default workflow and assess what stage we're at, then continue from there. If everything is done, I'll come back and ask for your decisions. I can also do A/B/C alternatives — let me know if you want me to lay out capabilities and trade-offs."
Default flow for a Tauri/Electron app:
- Probe the build — does the AppImage / Electron binary exist and launch?
- Set up environment — virtual display (Xvfb), required system packages, env vars.
- Launch the app under instrumentation (Gemini Computer Use or agent-browser CDP).
- Capture baseline — screenshot of initial window, log dump.
- Run the user's specific check (or "does the export feature work end-to-end" if none specified).
- Diagnose — distinguish app crashes vs WebKitGTK issues vs missing system deps.
- Iterate — fix and re-run (max 2 retry cycles).
Wait between every step before moving to the next. Don't batch.
调用时,请说明:
"我将从默认工作流开始,评估当前所处阶段,然后继续推进。如果所有步骤完成,我会返回并询问您的决策。我也可以提供A/B/C替代方案——如果您希望我列出功能和权衡,请告知我。"
Tauri/Electron应用的默认流程:
- 探测构建产物——AppImage/Electron二进制文件是否存在并可启动?
- 搭建环境——虚拟显示(Xvfb)、所需系统包、环境变量。
- 在监控下启动应用——基于Gemini Computer Use或agent-browser CDP。
- 捕获基线数据——初始窗口截图、日志转储。
- 执行用户指定的检查(如果未指定,则执行“导出功能端到端是否可用”检查)。
- 诊断问题——区分应用崩溃、WebKitGTK问题与缺失系统依赖。
- 迭代优化——修复后重新运行(最多2次重试循环)。
每一步之间都要等待,再进入下一步。不要批量执行。
Watching for human comments while waiting
等待期间监控人工评论
When a step is waiting on the user, a CI run, or any external event, set up a polling watcher:
bash
/loop 10m "check for new comments on PR #<N> via gh CLI; if none, re-ping reviewers"
/schedule "in 20 minutes, re-check comments and continue"Cadence: start at 10 minutes, back off to 30 minutes if nothing lands. If 30 minutes pass with no comments, repeat the request (re-ping CR, re-ask the user) and continue iterating.
当步骤等待用户、CI运行或任何外部事件时,设置轮询监控器:
bash
/loop 10m "check for new comments on PR #<N> via gh CLI; if none, re-ping reviewers"
/schedule "in 20 minutes, re-check comments and continue"轮询节奏:初始为10分钟,如果无响应则延长至30分钟。如果30分钟后仍无评论,重复请求(重新提醒代码评审人员、重新询问用户)并继续迭代。
A / B / C alternative approaches
A / B / C替代方案
| Path | Capability | Trade-off |
|---|---|---|
| A — Docker + Xvfb + Gemini Computer Use | Fully reproducible CI; works on headless Linux | Slow startup; Gemini token cost |
| B — Local native execution + agent-browser CDP | Fastest iteration; uses real Chrome of the app | Only works on a graphical session; OS-specific |
| C — Manual screenshot + visual diff | Lowest infra cost | Brittle; doesn't catch interaction bugs |
Default to A for Cicero Tauri (Docker-reproducible), B for Electron apps you're developing locally.
| 方案 | 功能 | 权衡 |
|---|---|---|
| A — Docker + Xvfb + Gemini Computer Use | 完全可复现的CI流程;适用于无头Linux环境 | 启动缓慢;产生Gemini令牌成本 |
| B — 本地原生执行 + agent-browser CDP | 迭代速度最快;使用应用的真实Chrome内核 | 仅适用于图形化会话;依赖特定操作系统 |
| C — 手动截图 + 视觉差异对比 | 基础设施成本最低 | 稳定性差;无法捕获交互类bug |
针对Cicero Tauri应用默认选择A(Docker可复现),针对本地开发的Electron应用默认选择B。
Tauri Docker Testing
Tauri Docker测试
Tauri Docker Testing
Tauri Docker测试
Test Cicero's Tauri AppImage inside a Docker container with virtual display and Gemini Computer Use for vision-driven UI automation.
在Docker容器中测试Cicero的Tauri AppImage,结合虚拟显示与Gemini Computer Use实现视觉驱动的UI自动化。
Step-by-Step: The Full Pipeline
分步指南:完整流程
Follow these steps IN ORDER. Every command is copy-pasteable. Do NOT skip steps.
请按顺序执行以下步骤。所有命令均可直接复制粘贴。请勿跳过任何步骤。
STEP 1: Build the AppImage
步骤1:构建AppImage
bash
cd /home/arthrod/workspace/potion_deploy
git checkout main && git pullBuild the Vite frontend first (REQUIRED before cargo build):
bash
NODE_ENV=production VITE_ENVIRONMENT=production bun run build:tauriThen build the Tauri AppImage:
bash
cd src-tauri && cargo tauri build --bundles appimageOutput will be at: (~83MB)
src-tauri/target/release/bundle/appimage/Cicero_0.1.0_amd64.AppImageIf cargo build fails with "beforeBuildCommand" error: The Vite build above didn't run. Run it again.
bash
cd /home/arthrod/workspace/potion_deploy
git checkout main && git pull首先构建Vite前端(必须在cargo build之前执行):
bash
NODE_ENV=production VITE_ENVIRONMENT=production bun run build:tauri然后构建Tauri AppImage:
bash
cd src-tauri && cargo tauri build --bundles appimage输出文件路径:(约83MB)
src-tauri/target/release/bundle/appimage/Cicero_0.1.0_amd64.AppImage如果cargo build出现"beforeBuildCommand"错误: 上述Vite构建未执行,请重新运行。
STEP 2: Build the Docker Image
步骤2:构建Docker镜像
bash
cd /home/arthrod/workspace/potion_deploy
docker build -t cicero-test -f Dockerfile.computer-use .If "no such file" error: You're in the wrong directory. to the repo root.
cdWhat the Dockerfile does:
- Installs Ubuntu 24.04 + Xvfb + VNC + noVNC + Chromium + WebKitGTK deps
- Copies the AppImage into
/app/ - Extracts it with (FUSE doesn't work in Docker — never try to run the AppImage directly)
--appimage-extract - Registers deep link scheme via xdg-mime
cicero:// - Sets up supervisor to manage Xvfb, fluxbox, x11vnc, noVNC, dbus, and the Cicero app
bash
cd /home/arthrod/workspace/potion_deploy
docker build -t cicero-test -f Dockerfile.computer-use .如果出现"文件不存在"错误: 您处于错误目录,请切换到仓库根目录。
Dockerfile的作用:
- 安装Ubuntu 24.04 + Xvfb + VNC + noVNC + Chromium + WebKitGTK依赖
- 将AppImage复制到目录
/app/ - 使用提取AppImage(Docker中无法使用FUSE——切勿直接运行AppImage)
--appimage-extract - 通过xdg-mime注册深度链接协议
cicero:// - 配置supervisor管理Xvfb、fluxbox、x11vnc、noVNC、dbus和Cicero应用
STEP 3: Start the Container
步骤3:启动容器
bash
docker rm -f cicero-test 2>/dev/null
docker run -d --name cicero-test -p 5901:5900 -p 6081:6080 --shm-size=1g cicero-testPort 5900 in use? That's why we use 5901:5900. If 5901 is also taken, use any free port.
--shm-size=1gbash
docker rm -f cicero-test 2>/dev/null
docker run -d --name cicero-test -p 5901:5900 -p 6081:6080 --shm-size=1g cicero-test端口5900已被占用? 这就是我们使用5901:5900的原因。如果5901也被占用,可使用任意空闲端口。
--shm-size=1gSTEP 4: Wait for the App to Load
步骤4:等待应用加载
The app takes ~15-30 seconds to start. Supervisor will show warnings — THIS IS NORMAL. The app crashes 2-4 times because Xvfb/dbus aren't ready yet. Supervisor retries and it eventually starts.
cicero (exit status 101)bash
sleep 30Check if the app is running:
bash
docker exec cicero-test ps aux | grep cicero_desktopYou should see in the process list. If not, start it manually:
cicero_desktopbash
docker exec -d -e DISPLAY=:99 -e DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/dbus-session \
-e XDG_RUNTIME_DIR=/tmp/runtime-agent -e NO_AT_BRIDGE=1 \
-e WEBKIT_DISABLE_DMABUF_RENDERER=1 -u agent cicero-test /app/cicero/AppRun
sleep 15ALL of these env vars are REQUIRED:
- — virtual display
DISPLAY=:99 - — WebKitGTK needs dbus
DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/dbus-session - — XDG runtime
XDG_RUNTIME_DIR=/tmp/runtime-agent - — suppress accessibility warnings
NO_AT_BRIDGE=1 - — CRITICAL: without this, the app crashes with GPU/DMA errors
WEBKIT_DISABLE_DMABUF_RENDERER=1
应用启动约需15-30秒。Supervisor会显示警告——这是正常现象。由于Xvfb/dbus尚未就绪,应用会崩溃2-4次。Supervisor会重试,最终会成功启动。
cicero (exit status 101)bash
sleep 30检查应用是否运行:
bash
docker exec cicero-test ps aux | grep cicero_desktop您应在进程列表中看到。如果没有,手动启动:
cicero_desktopbash
docker exec -d -e DISPLAY=:99 -e DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/dbus-session \
-e XDG_RUNTIME_DIR=/tmp/runtime-agent -e NO_AT_BRIDGE=1 \
-e WEBKIT_DISABLE_DMABUF_RENDERER=1 -u agent cicero-test /app/cicero/AppRun
sleep 15所有这些环境变量都是必填项:
- —— 虚拟显示
DISPLAY=:99 - —— WebKitGTK需要dbus
DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/dbus-session - —— XDG运行时目录
XDG_RUNTIME_DIR=/tmp/runtime-agent - —— 禁用辅助功能警告
NO_AT_BRIDGE=1 - —— 关键:没有它,应用会因GPU/DMA错误崩溃
WEBKIT_DISABLE_DMABUF_RENDERER=1
STEP 5: Close WebKit Inspector (CRITICAL)
步骤5:关闭WebKit检查器(关键)
THE APP OPENS WEBKIT INSPECTOR BY DEFAULT. Inspector steals ALL keyboard focus. If you skip this step, and Gemini Computer Use typing will go to the inspector console, NOT the app.
xdotool typebash
docker exec -e DISPLAY=:99 -u agent cicero-test xdotool key F12
sleep 1应用默认会打开WebKit检查器。 检查器会占用所有键盘焦点。如果跳过此步骤,和Gemini Computer Use的输入会发送到检查器控制台,而非应用。
xdotool typebash
docker exec -e DISPLAY=:99 -u agent cicero-test xdotool key F12
sleep 1STEP 6: Take a Screenshot to Verify
步骤6:截图验证
bash
docker exec -e DISPLAY=:99 -u agent cicero-test scrot /tmp/verify.png
docker cp cicero-test:/tmp/verify.png ./verify.pngYou should see the Cicero sign-in page: "Contracts from the future" on the left, email/password form on the right.
If you see a blank/gray desktop: The app didn't start. Go back to Step 4 and start manually.
If you see the WebKit Inspector taking up half the screen: Go back to Step 5.
bash
docker exec -e DISPLAY=:99 -u agent cicero-test scrot /tmp/verify.png
docker cp cicero-test:/tmp/verify.png ./verify.png您应看到Cicero登录页面:左侧显示"Contracts from the future",右侧是邮箱/密码表单。
如果看到空白/灰色桌面: 应用未启动,请返回步骤4手动启动。
如果看到WebKit检查器占据半个屏幕: 返回步骤5重新操作。
STEP 7: Run Gemini Computer Use Agent
步骤7:运行Gemini Computer Use Agent
This is the main automation tool. It takes screenshots, sends them to Gemini, and executes the model's actions via xdotool.
Prerequisites:
- Python package:
google-genai(oruv pip install google-genai)pip install google-genai - env var set
GEMINI_API_KEY
bash
GEMINI_API_KEY=$GEMINI_API_KEY python3 tooling/scripts/tauri-computer-use.py \
--container cicero-test \
--goal "Create a new account with email test@cicero.im, password TDDisthesolution, first name Test, last name User. After login, click the WRITE card to create a document. Type a haiku in the editor. Then export as DOCX." \
--model gemini-3-flash-preview \
--max-turns 20The agent will:
- Find the Sign Up link and click it
- Fill in the sign-up form fields
- Accept terms and submit
- Click the WRITE card on the dashboard
- Type text in the editor
- Find the export button and trigger DOCX download
- Handle the GTK save dialog
If the agent gets stuck in a safety confirmation loop: The Gemini CU model keeps asking for confirmation on downloads. The script auto-confirms, but sometimes the model loops. Kill it (Ctrl+C) and handle the remaining steps manually.
If is None: Safety block from Gemini. The script handles this gracefully and retries.
response.candidates这是主要的自动化工具。它会截取屏幕截图,发送给Gemini,并通过xdotool执行模型返回的操作。
前置条件:
- 安装Python包:
google-genai(或uv pip install google-genai)pip install google-genai - 设置环境变量
GEMINI_API_KEY
bash
GEMINI_API_KEY=$GEMINI_API_KEY python3 tooling/scripts/tauri-computer-use.py \
--container cicero-test \
--goal "Create a new account with email test@cicero.im, password TDDisthesolution, first name Test, last name User. After login, click the WRITE card to create a document. Type a haiku in the editor. Then export as DOCX." \
--model gemini-3-flash-preview \
--max-turns 20Agent会执行以下操作:
- 找到注册链接并点击
- 填写注册表单字段
- 接受条款并提交
- 点击仪表板上的WRITE卡片创建文档
- 在编辑器中输入文本
- 找到导出按钮并触发DOCX下载
- 处理GTK保存对话框
如果Agent陷入安全确认循环: Gemini CU模型会反复要求确认下载。脚本会自动确认,但有时模型会循环。此时请终止脚本(Ctrl+C)并手动完成剩余步骤。
如果为None: Gemini触发了安全拦截。脚本会优雅处理并重试。
response.candidatesSTEP 8: Manual Typing Fallback
步骤8:手动输入备用方案
If the Computer Use agent can't type in the Plate editor (text doesn't appear), do it manually:
bash
undefined如果Computer Use Agent无法在Plate编辑器中输入(文本不显示),请手动执行:
bash
undefinedMUST close inspector first (Step 5)
必须先关闭检查器(步骤5)
docker exec -e DISPLAY=:99 -u agent cicero-test bash -c "
xdotool mousemove 600 300 && sleep 0.3 && xdotool click 1 && sleep 0.5
xdotool type --delay 30 'Contracts from the past'
xdotool key Return
xdotool type --delay 30 'AI writes the future now'
xdotool key Return
xdotool type --delay 30 'Cicero guides all'
"
**MUST use `xdotool type --delay 30 'text'`.** Individual `xdotool key X` calls do NOT trigger Slate/Plate input events. The `type` command fires the proper IME/input pipeline that Plate.js listens to.
**`xdotool type` mangles uppercase** — it uses `--clearmodifiers` which strips Shift. "AI" becomes "ai", "Cicero" becomes "cicero". This is cosmetic and acceptable for testing.docker exec -e DISPLAY=:99 -u agent cicero-test bash -c "
xdotool mousemove 600 300 && sleep 0.3 && xdotool click 1 && sleep 0.5
xdotool type --delay 30 'Contracts from the past'
xdotool key Return
xdotool type --delay 30 'AI writes the future now'
xdotool key Return
xdotool type --delay 30 'Cicero guides all'
"
**必须使用`xdotool type --delay 30 'text'`。** 单独的`xdotool key X`调用无法触发Slate/Plate输入事件。`type`命令会触发Plate.js监听的正确IME/输入流程。
**`xdotool type`会混淆大写字母**——它使用`--clearmodifiers`清除Shift键。"AI"会变成"ai","Cicero"会变成"cicero"。这属于外观问题,在测试中是可接受的。STEP 9: DOCX Export
步骤9:DOCX导出
After typing content in the editor:
- Use the Computer Use agent to click the export button:
bash
GEMINI_API_KEY=$GEMINI_API_KEY python3 tooling/scripts/tauri-computer-use.py \
--container cicero-test \
--goal "Click the export/download icon in the toolbar and export as DOCX" \
--model gemini-3-flash-preview \
--max-turns 5-
If the agent gets stuck, do it manually — the export flow is:
- Click export icon in toolbar → "Download" dialog appears (format: WORD)
- Click red "Download" button → "Export to DOCX" confirmation dialog
- Click red "Continue" button → GTK "Save File" dialog
- Press Enter to save with default filename
-
Check if the DOCX was saved:
bash
docker exec cicero-test find / -name "*.docx" -type f 2>/dev/null- Copy it out:
bash
docker cp cicero-test:/path/to/file.docx ./output.docx- Verify it's valid:
bash
python3 -c "
import zipfile, re
z = zipfile.ZipFile('./output.docx')
doc = z.read('word/document.xml').decode('utf-8')
texts = re.findall(r'<w:t[^>]*>([^<]+)</w:t>', doc)
for t in texts: print(t)
"If DOCX export fails (yellow warning in Download dialog):
- Check is in
fs:allow-write-filesrc-tauri/capabilities/default.json - Without this permission, the GTK save dialog appears but silently fails
writeFile() - This was the bug we found — only grants READ, not write
fs:default
在编辑器中输入内容后:
- 使用Computer Use Agent点击导出按钮:
bash
GEMINI_API_KEY=$GEMINI_API_KEY python3 tooling/scripts/tauri-computer-use.py \
--container cicero-test \
--goal "Click the export/download icon in the toolbar and export as DOCX" \
--model gemini-3-flash-preview \
--max-turns 5-
如果Agent卡住,手动执行——导出流程为:
- 点击工具栏中的导出图标 → 弹出“Download”对话框(格式:WORD)
- 点击红色“Download”按钮 → 弹出“Export to DOCX”确认对话框
- 点击红色“Continue”按钮 → 弹出GTK“Save File”对话框
- 按Enter键使用默认文件名保存
-
检查DOCX是否保存成功:
bash
docker exec cicero-test find / -name "*.docx" -type f 2>/dev/null- 将文件复制到本地:
bash
docker cp cicero-test:/path/to/file.docx ./output.docx- 验证文件有效性:
bash
python3 -c "
import zipfile, re
z = zipfile.ZipFile('./output.docx')
doc = z.read('word/document.xml').decode('utf-8')
texts = re.findall(r'<w:t[^>]*>([^<]+)</w:t>', doc)
for t in texts: print(t)
"如果DOCX导出失败(Download对话框显示黄色警告):
- 检查中是否包含
src-tauri/capabilities/default.jsonfs:allow-write-file - 没有此权限的话,GTK保存对话框会显示,但会静默失败
writeFile() - 这是我们发现的bug——仅授予读取权限,不包含写入权限
fs:default
STEP 10: Verify & Clean Up
步骤10:验证与清理
Take final screenshot:
bash
docker exec -e DISPLAY=:99 -u agent cicero-test scrot /tmp/final.png
docker cp cicero-test:/tmp/final.png ./final.pngStop container:
bash
docker rm -f cicero-test截取最终截图:
bash
docker exec -e DISPLAY=:99 -u agent cicero-test scrot /tmp/final.png
docker cp cicero-test:/tmp/final.png ./final.png停止容器:
bash
docker rm -f cicero-testGemini Computer Use: How tauri-computer-use.py
Works
tauri-computer-use.pyGemini Computer Use:tauri-computer-use.py
工作原理
tauri-computer-use.pyArchitecture
架构
Host machine Docker container (Ubuntu 24.04)
┌──────────────────┐ ┌─────────────────────────────┐
│ tauri-computer- │ docker exec │ Xvfb :99 (virtual display) │
│ use.py │ ──────────────> │ Cicero AppImage (WebKitGTK) │
│ │ scrot → PNG │ x11vnc (VNC on :5900) │
│ Gemini CU API │ <────────────── │ noVNC (web on :6080) │
│ (google-genai) │ xdotool cmds │ fluxbox (window manager) │
│ │ ──────────────> │ dbus-daemon │
└──────────────────┘ └─────────────────────────────┘宿主机器 Docker容器(Ubuntu 24.04)
┌──────────────────┐ ┌─────────────────────────────┐
│ tauri-computer- │ docker exec │ Xvfb :99 (虚拟显示) │
│ use.py │ ──────────────> │ Cicero AppImage (WebKitGTK) │
│ │ scrot → PNG │ x11vnc (VNC监听:5900) │
│ Gemini CU API │ <────────────── │ noVNC (Web端监听:6080) │
│ (google-genai) │ xdotool命令 │ fluxbox (窗口管理器) │
│ │ ──────────────> │ dbus-daemon │
└──────────────────┘ └─────────────────────────────┘Agent Loop
Agent循环
- Takes screenshot from Docker via
docker exec scrot - Sends screenshot + goal to Gemini Computer Use API (native SDK)
google-genai - Model returns actions with normalized coordinates (0-999)
function_call - Script denormalizes: ,
actual_x = x / 1000 * 1440actual_y = y / 1000 * 960 - Executes via (click, type, scroll, key combos)
xdotool - Takes new screenshot, sends back as with:
FunctionResponse- :
url(REQUIRED — 400 error without it)"cicero://localhost" - :
safety_acknowledgement(REQUIRED when model sends"true")require_confirmation - Screenshot as with
FunctionResponsePartblobinline_data
- Loop until model says done or max turns reached
- 通过从Docker截取屏幕截图
docker exec scrot - 将截图+目标发送给Gemini Computer Use API(原生SDK)
google-genai - 模型返回带标准化坐标(0-999)的操作
function_call - 脚本转换坐标:,
actual_x = x / 1000 * 1440actual_y = y / 1000 * 960 - 通过执行操作(点击、输入、滚动、组合键)
xdotool - 截取新截图,作为返回,包含:
FunctionResponse- :
url(必填项——缺少会返回400错误)"cicero://localhost" - :
safety_acknowledgement(当模型发送"true"时必填)require_confirmation - 截图作为的
FunctionResponsePart二进制数据inline_data
- 循环直到模型标记完成或达到最大轮次
Supported Gemini Models
支持的Gemini模型
| Model | Use Case |
|---|---|
| Fast, good for most tasks |
| Dedicated CU model, more accurate |
| 模型 | 适用场景 |
|---|---|
| 速度快,适用于大多数任务 |
| 专用CU模型,精度更高 |
What DOESN'T Work for Automation
自动化不适用的方案
| Approach | Why It Fails | What Happens |
|---|---|---|
| Midscene | Uses OpenAI-compatible endpoint ( | Returns "empty content from AI model" on every Gemini model |
| agent-browser via noVNC | VNC canvas is a single | agent-browser sees noVNC controls (disconnect, clipboard) but NOT the app UI inside |
| xdotool coordinate guessing | No vision — you're guessing pixel positions | Breaks when layout changes, wastes time iterating |
| Individual key events don't trigger Slate/Plate input | Text never appears in the editor contenteditable |
| 方案 | 失败原因 | 现象 |
|---|---|---|
| Midscene | 使用OpenAI兼容端点( | 调用任何Gemini模型都会返回"empty content from AI model" |
| 通过noVNC使用agent-browser | VNC画布是单个 | agent-browser只能看到noVNC控件(断开连接、剪贴板),无法看到内部的应用UI |
| xdotool坐标猜测 | 无视觉能力——只能猜测像素位置 | 布局变化时失效,浪费时间迭代 |
| 单个按键事件无法触发Slate/Plate输入 | 文本永远不会显示在编辑器的contenteditable区域 |
Dockerfile Requirements
Dockerfile要求
Dockerfile.computer-usedockerfile
RUN apt-get install -y \
# Virtual display + window manager
xvfb fluxbox xterm \
# VNC + noVNC (browser-based VNC viewer)
x11vnc novnc websockify \
# OAuth browser (for Google OAuth roundtrip)
chromium-browser \
# WebKitGTK — Tauri's Linux rendering engine
libwebkit2gtk-4.1-0 libgtk-3-0t64 \
libappindicator3-1 librsvg2-2 libsoup-3.0-0 \
# D-Bus + Accessibility (WebKitGTK REFUSES to start without dbus)
dbus-x11 at-spi2-core libatk-bridge2.0-0 libatspi2.0-0 \
# AppImage extraction (FUSE doesn't work in Docker)
libfuse2t64 \
# UI automation tools
scrot xdotool wmctrl \
# Deep link registration (cicero:// scheme)
xdg-utils desktop-file-utils \
# Midscene deps (if you want to try Midscene — it won't work for CU but connect/screenshot work)
nodejs npm imagemagick x11-xserver-utils \
# Process manager
supervisor \
# Fonts (without these, screenshots show boxes instead of text)
fonts-liberation fonts-noto-color-emoji fonts-dejavu \
# Misc
curl ca-certificatesDockerfile.computer-usedockerfile
RUN apt-get install -y \
# 虚拟显示 + 窗口管理器
xvfb fluxbox xterm \
# VNC + noVNC(基于浏览器的VNC查看器)
x11vnc novnc websockify \
# OAuth浏览器(用于Google OAuth跳转)
chromium-browser \
# WebKitGTK —— Tauri的Linux渲染引擎
libwebkit2gtk-4.1-0 libgtk-3-0t64 \
libappindicator3-1 librsvg2-2 libsoup-3.0-0 \
# D-Bus + 辅助功能(WebKitGTK没有dbus无法启动)
dbus-x11 at-spi2-core libatk-bridge2.0-0 libatspi2.0-0 \
# AppImage提取(Docker中无法使用FUSE)
libfuse2t64 \
# UI自动化工具
scrot xdotool wmctrl \
# 深度链接注册(cicero://协议)
xdg-utils desktop-file-utils \
# Midscene依赖(如果尝试使用Midscene——CU功能不可用,但连接/截图可用)
nodejs npm imagemagick x11-xserver-utils \
# 进程管理器
supervisor \
# 字体(没有这些,截图会显示方块而非文字)
fonts-liberation fonts-noto-color-emoji fonts-dejavu \
# 其他
curl ca-certificatesDeep Link Registration
深度链接注册
Register so OAuth deep links work:
cicero://dockerfile
RUN cat > /usr/share/applications/cicero-handler.desktop << 'EOF'
[Desktop Entry]
Name=Cicero Deep Link Handler
Exec=/app/cicero/AppRun %u
Type=Application
MimeType=x-scheme-handler/cicero;
NoDisplay=true
EOF
RUN update-desktop-database /usr/share/applications/注册以支持OAuth深度链接:
cicero://dockerfile
RUN cat > /usr/share/applications/cicero-handler.desktop << 'EOF'
[Desktop Entry]
Name=Cicero Deep Link Handler
Exec=/app/cicero/AppRun %u
Type=Application
MimeType=x-scheme-handler/cicero;
NoDisplay=true
EOF
RUN update-desktop-database /usr/share/applications/Supervisor Config
Supervisor配置
ini
[program:cicero]
command=/app/cicero/AppRun
environment=DISPLAY=":99",DBUS_SESSION_BUS_ADDRESS="unix:path=/tmp/dbus-session",XDG_RUNTIME_DIR="/tmp/runtime-agent",NO_AT_BRIDGE="1",WEBKIT_DISABLE_DMABUF_RENDERER="1"
autorestart=true
user=agent
priority=40
startsecs=5
startretries=10startretries=10startsecs=5ini
[program:cicero]
command=/app/cicero/AppRun
environment=DISPLAY=":99",DBUS_SESSION_BUS_ADDRESS="unix:path=/tmp/dbus-session",XDG_RUNTIME_DIR="/tmp/runtime-agent",NO_AT_BRIDGE="1",WEBKIT_DISABLE_DMABUF_RENDERER="1"
autorestart=true
user=agent
priority=40
startsecs=5
startretries=10startretries=10startsecs=5Tauri Capabilities (Permissions)
Tauri权限配置
File:
src-tauri/capabilities/default.jsonThese permissions MUST be present:
json
{
"permissions": [
"core:default",
"deep-link:default",
"dialog:default",
"fs:default",
"fs:allow-write-file",
"fs:allow-write-text-file",
"http:default",
"opener:default",
"os:default",
"sql:default",
"sql:allow-execute",
"clipboard-manager:default",
"notification:default",
"store:default",
"upload:default",
"websocket:default"
]
}fs:allow-write-file- The GTK "Save File" dialog appears (from )
@tauri-apps/plugin-dialog save() - User picks a filename and clicks Save
- silently fails — no error, no file
@tauri-apps/plugin-fs writeFile() - The DOCX export appears to work but the file is never written
ALL app methods are CLIENT-SIDE. DOCX export, file save, clipboard, notifications — these all use Tauri plugins (, ). They do NOT depend on Cloudflare Worker bindings. The server only provides auth, document CRUD, and AI endpoints.
@tauri-apps/plugin-fs@tauri-apps/plugin-dialog文件:
src-tauri/capabilities/default.json必须包含以下权限:
json
{
"permissions": [
"core:default",
"deep-link:default",
"dialog:default",
"fs:default",
"fs:allow-write-file",
"fs:allow-write-text-file",
"http:default",
"opener:default",
"os:default",
"sql:default",
"sql:allow-execute",
"clipboard-manager:default",
"notification:default",
"store:default",
"upload:default",
"websocket:default"
]
}fs:allow-write-file- GTK“Save File”对话框会显示(来自)
@tauri-apps/plugin-dialog save() - 用户选择文件名并点击保存
- 会静默失败——无错误提示,无文件生成
@tauri-apps/plugin-fs writeFile() - DOCX导出看似成功,但文件从未写入
所有应用方法均为客户端操作。 DOCX导出、文件保存、剪贴板、通知——这些都使用Tauri插件(、)。它们不依赖Cloudflare Worker绑定。服务器仅提供认证、文档CRUD和AI端点。
@tauri-apps/plugin-fs@tauri-apps/plugin-dialogAll Known Pitfalls (Complete List)
所有已知陷阱(完整列表)
| # | Pitfall | Symptom | Fix |
|---|---|---|---|
| 1 | WebKit Inspector open | Typing goes to console, not editor | Press F12 before any text input |
| 2 | Missing | App crashes immediately with GPU error | Add to env vars in supervisor and manual start |
| 3 | Missing | WebKitGTK refuses to start, exit 101 | Install in Dockerfile |
| 4 | AppImage run directly (not extracted) | "FUSE not available" error | Always |
| 5 | | DOCX save dialog works but file never writes | Add |
| 6 | | Text never appears in editor | Use |
| 7 | Port 5900 already in use | Container fails to start | Use |
| 8 | Missing | Chromium crashes | Add to |
| 9 | Supervisor gives up on cicero | | Increase |
| 10 | Midscene for computer use | "empty content from AI model" | Use |
| 11 | agent-browser via noVNC | Can only see canvas, not app UI | Use Gemini CU or direct xdotool |
| 12 | Missing | Gemini returns 400 INVALID_ARGUMENT | Add |
| 13 | | Safety block or rate limit | Handle gracefully, retry with new screenshot |
| 14 | Tab once from email to password | Lands on "Forgot password?" link | Tab TWICE |
| 15 | Protocol detection | Auth shim not installed, CORS blocked, splash hangs | Use |
| 16 | | Gets 404 instead of processing callback | Add |
| 17 | Special chars in password via xdotool | | Use Gemini CU or escape properly |
| 18 | | "Invalid email or password" | Create new account via sign-up instead |
| 19 | | "AI" becomes "ai" due to | Cosmetic — acceptable for testing |
| 20 | Gemini CU calls | "Unimplemented action" in script | These are implemented in the script now |
| 编号 | 陷阱 | 症状 | 修复方案 |
|---|---|---|---|
| 1 | WebKit检查器处于打开状态 | 输入内容发送到控制台而非编辑器 | 在任何文本输入前按F12关闭 |
| 2 | 缺少 | 应用立即因GPU错误崩溃 | 在supervisor配置和手动启动命令中添加该环境变量 |
| 3 | 缺少 | WebKitGTK无法启动,退出码101 | 在Dockerfile中安装该包 |
| 4 | 直接运行AppImage(未提取) | 出现"FUSE not available"错误 | 始终先使用 |
| 5 | 仅配置 | DOCX保存对话框正常显示,但文件从未写入 | 在权限配置中添加 |
| 6 | 在Plate编辑器中使用 | 文本从未显示在编辑器中 | 使用 |
| 7 | 端口5900已被占用 | 容器无法启动 | 使用 |
| 8 | 缺少 | Chromium崩溃 | 在 |
| 9 | Supervisor放弃重启cicero | 显示 | 增加 |
| 10 | 使用Midscene进行Computer Use操作 | 出现"empty content from AI model" | 使用 |
| 11 | 通过noVNC使用agent-browser | 只能看到画布,无法看到应用UI | 使用Gemini CU或直接xdotool操作 |
| 12 | CU响应中缺少 | Gemini返回400 INVALID_ARGUMENT错误 | 在FunctionResponse中添加 |
| 13 | | 安全拦截或速率限制 | 优雅处理并重试,重新截取截图 |
| 14 | 从邮箱输入框按一次Tab键到密码框 | 焦点落在“Forgot password?”链接上 | 按两次Tab键 |
| 15 | 协议检测 | 未安装认证垫片、CORS被拦截、启动页卡住 | 使用 |
| 16 | | 得到404错误而非处理回调 | 添加 |
| 17 | 通过xdotool输入包含特殊字符的密码 | | 使用Gemini CU或正确转义字符 |
| 18 | | 显示"Invalid email or password" | 通过注册流程创建新账户 |
| 19 | | 因 | 属于外观问题——测试中可接受 |
| 20 | Gemini CU调用 | 脚本中显示"Unimplemented action" | 这些操作现已在脚本中实现 |