omni-vu
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseomni.vu - Visual Understanding & Automation
omni.vu - 视觉理解与自动化
Overview
概述
omni.vu gives you eyes and hands on the user's macOS screen. Use it to:
- See what the user sees (screen capture)
- Understand UI state with AI vision
- Detect changes and wait for events
- Act with mouse and keyboard automation
omni.vu 让您能够“看见”并“操控”用户的macOS屏幕。您可以用它来:
- 查看用户所见内容(屏幕捕获)
- 理解UI状态(借助AI视觉)
- 检测变化并等待事件发生
- 执行操作(鼠标与键盘自动化)
When to Use This Skill
何时使用该工具
Proactively Use omni.vu When:
建议主动使用 omni.vu 的场景:
- Debugging UI issues - "The button isn't working" → capture and analyze
- Verifying changes - After modifying UI code, check if it rendered correctly
- Waiting for operations - Build finishing, deployment completing, tests running
- Understanding context - User describes something on screen you can't see
- Automating repetitive tasks - Clicking through UI flows, filling forms
- Documentation - Capturing screenshots of features
- 调试UI问题 - “按钮点击没反应” → 捕获并分析
- 验证变更 - 修改UI代码后,检查渲染是否正确
- 等待操作完成 - 构建结束、部署完成、测试运行中
- 理解上下文 - 用户描述了屏幕上的内容,但您无法直接查看
- 自动化重复任务 - 点击UI流程、填写表单
- 文档制作 - 捕获功能截图
Do NOT Use When:
请勿使用的场景:
- Reading/writing files (use Read/Write tools)
- Running terminal commands (use Bash)
- Making API calls (use appropriate tools)
- User hasn't granted screen recording permission
- 读写文件(使用读写工具)
- 运行终端命令(使用Bash)
- 调用API(使用对应工具)
- 用户未授予屏幕录制权限
Tool Reference
工具参考
Capture Tools
捕获工具
vu
- Full Screen Capture
vuvu
- 全屏捕获
vuvu(monitor=0, save_to_history=True)Captures the entire screen. Returns base64 image.
Use when: You need to see everything on screen.
vu(monitor=0, save_to_history=True)捕获整个屏幕。返回base64格式的图片。
适用场景:您需要查看屏幕上的所有内容时。
vu_window
- Window Capture
vu_windowvu_window
- 窗口捕获
vu_windowvu_window(window_id=None, title="VS Code", include_frame=False)Captures a specific window by ID or title (partial match).
Use when: You only need one application's content.
Workflow:
- Call to see available windows
vu_list_windows - Find the window_id or use title matching
- Call with that ID/title
vu_window
vu_window(window_id=None, title="VS Code", include_frame=False)通过ID或标题(部分匹配)捕获特定窗口。
适用场景:您只需要某个应用的内容时。
工作流程:
- 调用 查看可用窗口
vu_list_windows - 找到window_id或使用标题匹配
- 传入该ID/标题调用
vu_window
vu_region
- Region Capture
vu_regionvu_region
- 区域捕获
vu_regionvu_region(x=100, y=100, width=500, height=300)Captures a specific rectangle of the screen.
Use when: You need a precise area (error message, specific component).
vu_region(x=100, y=100, width=500, height=300)捕获屏幕上的特定矩形区域。
适用场景:您需要精确区域(如错误提示、特定组件)时。
vu_list_windows
- List Windows
vu_list_windowsvu_list_windows
- 列出窗口
vu_list_windowsvu_list_windows(filter_app="Chrome")Lists all visible windows with metadata.
Returns: List of
{window_id, title, owner, x, y, width, height}vu_list_windows(filter_app="Chrome")列出所有可见窗口及其元数据。
返回值: 格式的列表
{window_id, title, owner, x, y, width, height}vu_list_monitors
- List Displays
vu_list_monitorsvu_list_monitors
- 列出显示器
vu_list_monitorsvu_list_monitors()Lists connected displays with resolution and scale factor.
vu_list_monitors()列出已连接的显示器及其分辨率和缩放比例。
Vision Tools
视觉工具
vu_describe
- AI Vision Analysis
vu_describevu_describe
- AI视觉分析
vu_describevu_describe(
prompt="What errors are visible?",
provider="claude", # or openai, gemini, ollama
max_tokens=1024,
capture_first=True
)Captures screen and analyzes with AI.
Best prompts:
- "Describe what you see on screen"
- "Are there any error messages visible?"
- "What is the state of the build/test output?"
- "Is the login form filled correctly?"
- "What color is the status indicator?"
Providers:
| Provider | Speed | Quality | Cost |
|---|---|---|---|
| claude | Medium | Excellent | $$ |
| openai | Fast | Very Good | $$ |
| gemini | Fast | Good | $ |
| ollama | Slow | Varies | Free |
vu_describe(
prompt="What errors are visible?",
provider="claude", # or openai, gemini, ollama
max_tokens=1024,
capture_first=True
)捕获屏幕并通过AI进行分析。
推荐提示词:
- “描述屏幕上的内容”
- “是否有错误提示可见?”
- “构建/测试输出的状态是什么?”
- “登录表单是否填写正确?”
- “状态指示器是什么颜色?”
提供商:
| 提供商 | 速度 | 质量 | 成本 |
|---|---|---|---|
| claude | 中等 | 优秀 | $$ |
| openai | 快速 | 非常好 | $$ |
| gemini | 快速 | 良好 | $ |
| ollama | 缓慢 | 参差不齐 | 免费 |
Utility Tools
实用工具
vu_diff
- Change Detection
vu_diffvu_diff
- 变化检测
vu_diffvu_diff(threshold=0.02, monitor=0)Detects if screen changed since last capture.
Returns:
{changed: bool, diff_percentage: float}Use for polling patterns:
python
undefinedvu_diff(threshold=0.02, monitor=0)检测自上次捕获以来屏幕是否发生变化。
返回值:
{changed: bool, diff_percentage: float}轮询模式示例:
python
undefinedWait for build to finish
Wait for build to finish
while True:
result = vu_diff(threshold=0.05)
if result["changed"]:
# Screen changed, check what happened
analysis = vu_describe(prompt="Did the build succeed or fail?")
break
# Wait before checking again
undefinedwhile True:
result = vu_diff(threshold=0.05)
if result["changed"]:
# Screen changed, check what happened
analysis = vu_describe(prompt="Did the build succeed or fail?")
break
# Wait before checking again
undefinedvu_history
- Capture History
vu_historyvu_history
- 捕获历史
vu_historyvu_history(limit=10, capture_type="full_screen")Gets recent capture metadata.
vu_history(limit=10, capture_type="full_screen")获取最近的捕获元数据。
vu_last
- Last Capture
vu_lastvu_last
- 最后一次捕获
vu_lastvu_last()Returns the most recent capture with image data.
Use when: You need to re-analyze without re-capturing.
vu_last()返回最近一次捕获的图片数据。
适用场景:您需要重新分析而无需重新捕获时。
vu_status
- System Status
vu_statusvu_status
- 系统状态
vu_statusvu_status()Returns system info: monitors, providers, safety settings.
vu_status()返回系统信息:显示器、提供商、安全设置。
Automation Tools
自动化工具
vu_click
- Mouse Click
vu_clickvu_click
- 鼠标点击
vu_clickvu_click(x=500, y=300, button="left", count=1)Clicks at screen coordinates.
Buttons: , ,
Count: 1 (single), 2 (double), 3 (triple)
leftrightmiddleSafety: Coordinates validated against screen bounds.
vu_click(x=500, y=300, button="left", count=1)在屏幕坐标位置点击。
按钮选项:(左键)、(右键)、(中键)
点击次数:1(单击)、2(双击)、3(三击)
leftrightmiddle安全机制:坐标会根据屏幕边界进行验证。
vu_move
- Move Cursor
vu_movevu_move
- 移动光标
vu_movevu_move(x=500, y=300, duration_ms=0)Moves cursor to position. Use for animated movement.
duration_msvu_move(x=500, y=300, duration_ms=0)将光标移动到指定位置。使用参数实现动画效果移动。
duration_msvu_drag
- Drag Operation
vu_dragvu_drag
- 拖拽操作
vu_dragvu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)Drags from start to end position.
Safety: Maximum 2000px drag distance.
vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)从起始位置拖拽到结束位置。
安全机制:最大拖拽距离为2000像素。
vu_scroll
- Scroll
vu_scrollvu_scroll
- 滚动
vu_scrollvu_scroll(direction="down", amount=3, x=None, y=None)Scrolls at current or specified position.
Directions: , , ,
Amount: Lines to scroll (1-20)
updownleftrightvu_scroll(direction="down", amount=3, x=None, y=None)在当前位置或指定位置滚动。
方向选项:(上)、(下)、(左)、(右)
滚动量:滚动行数(1-20)
updownleftrightvu_type
- Type Text
vu_typevu_type
- 输入文本
vu_typevu_type(text="Hello World", delay_between_ms=0)Types text character by character.
Note: Click on target field first!
vu_type(text="Hello World", delay_between_ms=0)逐字符输入文本。
注意:请先点击目标输入框!
vu_hotkey
- Keyboard Shortcut
vu_hotkeyvu_hotkey
- 键盘快捷键
vu_hotkeyvu_hotkey(keys="cmd+s")Executes keyboard shortcuts.
Format: (e.g., , , )
Modifiers: , , /,
modifier+keycmd+cctrl+shift+scmd+opt+icmdctrlaltoptshiftBlocked hotkeys (for safety):
- (quit)
cmd+q - (logout)
cmd+shift+q - (force quit)
cmd+opt+esc
vu_hotkey(keys="cmd+s")执行键盘快捷键。
格式:(例如:、、)
修饰键:、、/、
modifier+keycmd+cctrl+shift+scmd+opt+icmdctrlaltoptshift禁用的快捷键(出于安全考虑):
- (退出)
cmd+q - (注销)
cmd+shift+q - (强制退出)
cmd+opt+esc
Common Workflows
常见工作流程
1. Debug UI Issue
1. 调试UI问题
User: "The submit button doesn't do anything when I click it"
1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")User: "The submit button doesn't do anything when I click it"
1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")2. Verify Build/Deploy
2. 验证构建/部署
User: "Run the build and let me know when it's done"
1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report resultUser: "Run the build and let me know when it's done"
1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report result3. Fill a Form
3. 填写表单
User: "Fill out the registration form with test data"
1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y) # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab") # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y) # Submit
8. vu_describe(prompt="Was the form submitted successfully?")User: "Fill out the registration form with test data"
1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y) # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab") # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y) # Submit
8. vu_describe(prompt="Was the form submitted successfully?")4. Navigate UI
4. 导航UI
User: "Open the settings panel in VS Code"
1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code") # Capture VS Code
3. vu_hotkey(keys="cmd+,") # Open settings
4. vu_diff() # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")User: "Open the settings panel in VS Code"
1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code") # Capture VS Code
3. vu_hotkey(keys="cmd+,") # Open settings
4. vu_diff() # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")5. Monitor for Changes
5. 监控变化
User: "Watch the dashboard and tell me when the status changes"
1. vu() # Initial capture
2. Loop every 5 seconds:
- vu_diff(threshold=0.02)
- If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to userUser: "Watch the dashboard and tell me when the status changes"
1. vu() # Initial capture
2. Loop every 5 seconds:
- vu_diff(threshold=0.02)
- If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to userBest Practices
最佳实践
1. Capture Before Acting
1. 操作前先捕获分析
Always capture and analyze before clicking/typing:
❌ Bad: vu_click(x=500, y=300) # Hope it's the right spot
✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)在点击/输入前始终先捕获并分析:
❌ Bad: vu_click(x=500, y=300) # Hope it's the right spot
✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)2. Use Appropriate Capture Scope
2. 使用合适的捕获范围
Full screen → General context, multi-app workflows
Window → Single app focus, cleaner output
Region → Specific element, faster/smaller全屏 → 通用上下文、多应用工作流
窗口 → 单应用聚焦、更清晰的输出
区域 → 特定元素、更快/更小的捕获量3. Specific Vision Prompts
3. 使用明确的视觉提示词
❌ Vague: "What do you see?"
✅ Specific:
- "Is there an error message visible? If so, what does it say?"
- "What is the status of the test runner? Passing, failing, or running?"
- "List all form fields visible and their current values"❌ 模糊:“你看到了什么?”
✅ 明确:
- “是否有错误提示可见?如果有,内容是什么?”
- “测试运行器的状态是什么?通过、失败还是运行中?”
- “列出所有可见的表单字段及其当前值”4. Rate Limit Automation
4. 限制自动化操作频率
Don't spam clicks. Wait between actions:
vu_click(x=100, y=100)不要频繁点击,操作之间请等待:
vu_click(x=100, y=100)Wait for UI response
Wait for UI response
vu_diff(threshold=0.01)
vu_click(x=200, y=200)
undefinedvu_diff(threshold=0.01)
vu_click(x=200, y=200)
undefined5. Verify After Actions
5. 操作后验证结果
Always verify automation succeeded:
vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")始终验证自动化操作是否成功:
vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")Troubleshooting
故障排除
"Permission denied" or blank captures
“Permission denied”或空白捕获
→ Grant Screen Recording permission: System Settings > Privacy > Screen Recording
→ 授予屏幕录制权限:系统设置 > 隐私与安全性 > 屏幕录制
Automation not working
自动化操作无效
→ Grant Accessibility permission: System Settings > Privacy > Accessibility
→ 授予辅助功能权限:系统设置 > 隐私与安全性 > 辅助功能
Vision analysis failing
视觉分析失败
→ Check API keys in environment variables
→ Try different provider:
vu_describe(provider="openai")→ 检查环境变量中的API密钥
→ 尝试更换提供商:
vu_describe(provider="openai")Coordinates seem off
坐标显示异常
→ Retina display? Coordinates are in logical points, not pixels
→ Multi-monitor? Check for display arrangement
vu_list_monitors()→ 是Retina显示屏吗?坐标使用逻辑点而非像素
→ 多显示器?调用查看显示器布局
vu_list_monitors()Hotkey blocked
快捷键被阻止
→ Some dangerous hotkeys (cmd+q) are blocked for safety
→ Unblock in config if absolutely necessary
→ 部分危险快捷键(如cmd+q)出于安全考虑被阻止
→ 如有绝对必要,可在配置中解除限制
Configuration
配置
Environment variables:
bash
OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium # low/medium/highHistory stored at:
~/.omni.vu/captures/环境变量:
bash
OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium # low/medium/high历史记录存储位置:
~/.omni.vu/captures/