omni-vu

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

omni.vu - Visual Understanding & Automation

omni.vu - 视觉理解与自动化

Overview

概述

omni.vu gives you eyes and hands on the user's macOS screen. Use it to:

See what the user sees (screen capture)
Understand UI state with AI vision
Detect changes and wait for events
Act with mouse and keyboard automation

omni.vu 让您能够“看见”并“操控”用户的macOS屏幕。您可以用它来：

查看用户所见内容（屏幕捕获）
理解UI状态（借助AI视觉）
检测变化并等待事件发生
执行操作（鼠标与键盘自动化）

When to Use This Skill

何时使用该工具

Proactively Use omni.vu When:

建议主动使用 omni.vu 的场景：

Debugging UI issues - "The button isn't working" → capture and analyze
Verifying changes - After modifying UI code, check if it rendered correctly
Waiting for operations - Build finishing, deployment completing, tests running
Understanding context - User describes something on screen you can't see
Automating repetitive tasks - Clicking through UI flows, filling forms
Documentation - Capturing screenshots of features

调试UI问题 - “按钮点击没反应” → 捕获并分析
验证变更 - 修改UI代码后，检查渲染是否正确
等待操作完成 - 构建结束、部署完成、测试运行中
理解上下文 - 用户描述了屏幕上的内容，但您无法直接查看
自动化重复任务 - 点击UI流程、填写表单
文档制作 - 捕获功能截图

Do NOT Use When:

请勿使用的场景：

Reading/writing files (use Read/Write tools)
Running terminal commands (use Bash)
Making API calls (use appropriate tools)
User hasn't granted screen recording permission

读写文件（使用读写工具）
运行终端命令（使用Bash）
调用API（使用对应工具）
用户未授予屏幕录制权限

Tool Reference

工具参考

Capture Tools

捕获工具

vu

- Full Screen Capture

vu

- 全屏捕获

vu(monitor=0, save_to_history=True)

Captures the entire screen. Returns base64 image.

Use when: You need to see everything on screen.

vu(monitor=0, save_to_history=True)

捕获整个屏幕。返回base64格式的图片。

适用场景：您需要查看屏幕上的所有内容时。

vu_window

- Window Capture

vu_window

- 窗口捕获

vu_window(window_id=None, title="VS Code", include_frame=False)

Captures a specific window by ID or title (partial match).

Use when: You only need one application's content.

Workflow:

Call
```
vu_list_windows
```
to see available windows
Find the window_id or use title matching
Call
```
vu_window
```
with that ID/title

vu_window(window_id=None, title="VS Code", include_frame=False)

通过ID或标题（部分匹配）捕获特定窗口。

适用场景：您只需要某个应用的内容时。

工作流程：

调用
```
vu_list_windows
```
查看可用窗口
找到window_id或使用标题匹配
传入该ID/标题调用
```
vu_window
```

vu_region

- Region Capture

vu_region

- 区域捕获

vu_region(x=100, y=100, width=500, height=300)

Captures a specific rectangle of the screen.

Use when: You need a precise area (error message, specific component).

vu_region(x=100, y=100, width=500, height=300)

捕获屏幕上的特定矩形区域。

适用场景：您需要精确区域（如错误提示、特定组件）时。

vu_list_windows

- List Windows

vu_list_windows

- 列出窗口

vu_list_windows(filter_app="Chrome")

Lists all visible windows with metadata.

Returns: List of

{window_id, title, owner, x, y, width, height}

vu_list_windows(filter_app="Chrome")

列出所有可见窗口及其元数据。

返回值：

{window_id, title, owner, x, y, width, height}

格式的列表

vu_list_monitors

- List Displays

vu_list_monitors

- 列出显示器

vu_list_monitors()

Lists connected displays with resolution and scale factor.

vu_list_monitors()

列出已连接的显示器及其分辨率和缩放比例。

Vision Tools

视觉工具

vu_describe

- AI Vision Analysis

vu_describe

- AI视觉分析

vu_describe(
    prompt="What errors are visible?",
    provider="claude",  # or openai, gemini, ollama
    max_tokens=1024,
    capture_first=True
)

Captures screen and analyzes with AI.

Best prompts:

"Describe what you see on screen"
"Are there any error messages visible?"
"What is the state of the build/test output?"
"Is the login form filled correctly?"
"What color is the status indicator?"

Providers:

Provider	Speed	Quality	Cost
claude	Medium	Excellent	$$
openai	Fast	Very Good	$$
gemini	Fast	Good	$
ollama	Slow	Varies	Free

vu_describe(
    prompt="What errors are visible?",
    provider="claude",  # or openai, gemini, ollama
    max_tokens=1024,
    capture_first=True
)

捕获屏幕并通过AI进行分析。

推荐提示词：

“描述屏幕上的内容”
“是否有错误提示可见？”
“构建/测试输出的状态是什么？”
“登录表单是否填写正确？”
“状态指示器是什么颜色？”

提供商:

提供商	速度	质量	成本
claude	中等	优秀	$$
openai	快速	非常好	$$
gemini	快速	良好	$
ollama	缓慢	参差不齐	免费

Utility Tools

实用工具

vu_diff

- Change Detection

vu_diff

- 变化检测

vu_diff(threshold=0.02, monitor=0)

Detects if screen changed since last capture.

Returns:

{changed: bool, diff_percentage: float}

Use for polling patterns:

python

undefined

vu_diff(threshold=0.02, monitor=0)

检测自上次捕获以来屏幕是否发生变化。

返回值：

{changed: bool, diff_percentage: float}

轮询模式示例：

python

undefined

Wait for build to finish

while True: result = vu_diff(threshold=0.05) if result["changed"]: # Screen changed, check what happened analysis = vu_describe(prompt="Did the build succeed or fail?") break # Wait before checking again

undefined

undefined

vu_history

- Capture History

vu_history

- 捕获历史

vu_history(limit=10, capture_type="full_screen")

Gets recent capture metadata.

vu_history(limit=10, capture_type="full_screen")

获取最近的捕获元数据。

vu_last

- Last Capture

vu_last

- 最后一次捕获

vu_last()

Returns the most recent capture with image data.

Use when: You need to re-analyze without re-capturing.

vu_last()

返回最近一次捕获的图片数据。

适用场景：您需要重新分析而无需重新捕获时。

vu_status

- System Status

vu_status

- 系统状态

vu_status()

Returns system info: monitors, providers, safety settings.

vu_status()

返回系统信息：显示器、提供商、安全设置。

Automation Tools

自动化工具

vu_click

- Mouse Click

vu_click

- 鼠标点击

vu_click(x=500, y=300, button="left", count=1)

Clicks at screen coordinates.

Buttons:

left

right

middle

Count: 1 (single), 2 (double), 3 (triple)

Safety: Coordinates validated against screen bounds.

vu_click(x=500, y=300, button="left", count=1)

在屏幕坐标位置点击。

按钮选项：

left

（左键）、

right

（右键）、

middle

（中键） 点击次数：1（单击）、2（双击）、3（三击）

安全机制：坐标会根据屏幕边界进行验证。

vu_move

- Move Cursor

vu_move

- 移动光标

vu_move(x=500, y=300, duration_ms=0)

Moves cursor to position. Use

duration_ms

for animated movement.

vu_move(x=500, y=300, duration_ms=0)

将光标移动到指定位置。使用

duration_ms

参数实现动画效果移动。

vu_drag

- Drag Operation

vu_drag

- 拖拽操作

vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)

Drags from start to end position.

Safety: Maximum 2000px drag distance.

vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)

从起始位置拖拽到结束位置。

安全机制：最大拖拽距离为2000像素。

vu_scroll

- Scroll

vu_scroll

- 滚动

vu_scroll(direction="down", amount=3, x=None, y=None)

Scrolls at current or specified position.

Directions:

up

down

left

right

Amount: Lines to scroll (1-20)

vu_scroll(direction="down", amount=3, x=None, y=None)

在当前位置或指定位置滚动。

方向选项：

up

（上）、

down

（下）、

left

（左）、

right

（右） 滚动量：滚动行数（1-20）

vu_type

- Type Text

vu_type

- 输入文本

vu_type(text="Hello World", delay_between_ms=0)

Types text character by character.

Note: Click on target field first!

vu_type(text="Hello World", delay_between_ms=0)

逐字符输入文本。

注意：请先点击目标输入框！

vu_hotkey

- Keyboard Shortcut

vu_hotkey

- 键盘快捷键

vu_hotkey(keys="cmd+s")

Executes keyboard shortcuts.

Format:

modifier+key

(e.g.,

cmd+c

ctrl+shift+s

cmd+opt+i

) Modifiers:

cmd

ctrl

alt

opt

shift

Blocked hotkeys (for safety):

```
cmd+q
```
(quit)
```
cmd+shift+q
```
(logout)
```
cmd+opt+esc
```
(force quit)

vu_hotkey(keys="cmd+s")

执行键盘快捷键。

格式：

modifier+key

（例如：

cmd+c

、

ctrl+shift+s

、

cmd+opt+i

） 修饰键：

cmd

、

ctrl

、

alt

opt

、

shift

禁用的快捷键（出于安全考虑）：

```
cmd+q
```
（退出）
```
cmd+shift+q
```
（注销）
```
cmd+opt+esc
```
（强制退出）

Common Workflows

常见工作流程

1. Debug UI Issue

1. 调试UI问题

User: "The submit button doesn't do anything when I click it"

1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")

User: "The submit button doesn't do anything when I click it"

1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")

2. Verify Build/Deploy

2. 验证构建/部署

User: "Run the build and let me know when it's done"

1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report result

User: "Run the build and let me know when it's done"

1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report result

3. Fill a Form

3. 填写表单

User: "Fill out the registration form with test data"

1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y)  # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab")  # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y)  # Submit
8. vu_describe(prompt="Was the form submitted successfully?")

User: "Fill out the registration form with test data"

1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y)  # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab")  # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y)  # Submit
8. vu_describe(prompt="Was the form submitted successfully?")

4. Navigate UI

4. 导航UI

User: "Open the settings panel in VS Code"

1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code")  # Capture VS Code
3. vu_hotkey(keys="cmd+,")  # Open settings
4. vu_diff()  # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")

User: "Open the settings panel in VS Code"

1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code")  # Capture VS Code
3. vu_hotkey(keys="cmd+,")  # Open settings
4. vu_diff()  # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")

5. Monitor for Changes

5. 监控变化

User: "Watch the dashboard and tell me when the status changes"

1. vu()  # Initial capture
2. Loop every 5 seconds:
   - vu_diff(threshold=0.02)
   - If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to user

User: "Watch the dashboard and tell me when the status changes"

1. vu()  # Initial capture
2. Loop every 5 seconds:
   - vu_diff(threshold=0.02)
   - If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to user

Best Practices

最佳实践

1. Capture Before Acting

1. 操作前先捕获分析

Always capture and analyze before clicking/typing:

❌ Bad: vu_click(x=500, y=300)  # Hope it's the right spot

✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)

在点击/输入前始终先捕获并分析：

❌ Bad: vu_click(x=500, y=300)  # Hope it's the right spot

✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)

2. Use Appropriate Capture Scope

2. 使用合适的捕获范围

Full screen → General context, multi-app workflows
Window → Single app focus, cleaner output
Region → Specific element, faster/smaller

全屏 → 通用上下文、多应用工作流
窗口 → 单应用聚焦、更清晰的输出
区域 → 特定元素、更快/更小的捕获量

3. Specific Vision Prompts

3. 使用明确的视觉提示词

❌ Vague: "What do you see?"

✅ Specific:
- "Is there an error message visible? If so, what does it say?"
- "What is the status of the test runner? Passing, failing, or running?"
- "List all form fields visible and their current values"

❌ 模糊：“你看到了什么？”

✅ 明确：
- “是否有错误提示可见？如果有，内容是什么？”
- “测试运行器的状态是什么？通过、失败还是运行中？”
- “列出所有可见的表单字段及其当前值”

4. Rate Limit Automation

4. 限制自动化操作频率

Don't spam clicks. Wait between actions:

vu_click(x=100, y=100)

不要频繁点击，操作之间请等待：

vu_click(x=100, y=100)

Wait for UI response

vu_diff(threshold=0.01) vu_click(x=200, y=200)

undefined

vu_diff(threshold=0.01) vu_click(x=200, y=200)

undefined

5. Verify After Actions

5. 操作后验证结果

Always verify automation succeeded:

vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")

始终验证自动化操作是否成功：

vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")

Troubleshooting

故障排除

"Permission denied" or blank captures

“Permission denied”或空白捕获

→ Grant Screen Recording permission: System Settings > Privacy > Screen Recording

→ 授予屏幕录制权限：系统设置 > 隐私与安全性 > 屏幕录制

Automation not working

自动化操作无效

→ Grant Accessibility permission: System Settings > Privacy > Accessibility

→ 授予辅助功能权限：系统设置 > 隐私与安全性 > 辅助功能

Vision analysis failing

视觉分析失败

→ Check API keys in environment variables → Try different provider:

vu_describe(provider="openai")

→ 检查环境变量中的API密钥 → 尝试更换提供商：

vu_describe(provider="openai")

Coordinates seem off

坐标显示异常

→ Retina display? Coordinates are in logical points, not pixels → Multi-monitor? Check

vu_list_monitors()

for display arrangement

→ 是Retina显示屏吗？坐标使用逻辑点而非像素 → 多显示器？调用

vu_list_monitors()

查看显示器布局

Hotkey blocked

快捷键被阻止

→ Some dangerous hotkeys (cmd+q) are blocked for safety → Unblock in config if absolutely necessary

→ 部分危险快捷键（如cmd+q）出于安全考虑被阻止 → 如有绝对必要，可在配置中解除限制

Configuration

配置

Environment variables:

bash

OMNI_VU_ANTHROPIC_API_KEY=sk-ant-...    # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-...           # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude          # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium              # low/medium/high

History stored at:

~/.omni.vu/captures/

环境变量：

bash

OMNI_VU_ANTHROPIC_API_KEY=sk-ant-...    # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-...           # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude          # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium              # low/medium/high

历史记录存储位置：

~/.omni.vu/captures/