Loading...
Loading...
Screen capture, AI vision analysis, and GUI automation for macOS. Use when you need to see what's on screen, analyze UI state, detect changes, or automate mouse/keyboard actions.
npx skill4agent add jmsktm/claude-settings omni-vuvuvu(monitor=0, save_to_history=True)vu_windowvu_window(window_id=None, title="VS Code", include_frame=False)vu_list_windowsvu_windowvu_regionvu_region(x=100, y=100, width=500, height=300)vu_list_windowsvu_list_windows(filter_app="Chrome"){window_id, title, owner, x, y, width, height}vu_list_monitorsvu_list_monitors()vu_describevu_describe(
prompt="What errors are visible?",
provider="claude", # or openai, gemini, ollama
max_tokens=1024,
capture_first=True
)| Provider | Speed | Quality | Cost |
|---|---|---|---|
| claude | Medium | Excellent | $$ |
| openai | Fast | Very Good | $$ |
| gemini | Fast | Good | $ |
| ollama | Slow | Varies | Free |
vu_diffvu_diff(threshold=0.02, monitor=0){changed: bool, diff_percentage: float}# Wait for build to finish
while True:
result = vu_diff(threshold=0.05)
if result["changed"]:
# Screen changed, check what happened
analysis = vu_describe(prompt="Did the build succeed or fail?")
break
# Wait before checking againvu_historyvu_history(limit=10, capture_type="full_screen")vu_lastvu_last()vu_statusvu_status()vu_clickvu_click(x=500, y=300, button="left", count=1)leftrightmiddlevu_movevu_move(x=500, y=300, duration_ms=0)duration_msvu_dragvu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)vu_scrollvu_scroll(direction="down", amount=3, x=None, y=None)updownleftrightvu_typevu_type(text="Hello World", delay_between_ms=0)vu_hotkeyvu_hotkey(keys="cmd+s")modifier+keycmd+cctrl+shift+scmd+opt+icmdctrlaltoptshiftcmd+qcmd+shift+qcmd+opt+escUser: "The submit button doesn't do anything when I click it"
1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")User: "Run the build and let me know when it's done"
1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report resultUser: "Fill out the registration form with test data"
1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y) # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab") # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y) # Submit
8. vu_describe(prompt="Was the form submitted successfully?")User: "Open the settings panel in VS Code"
1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code") # Capture VS Code
3. vu_hotkey(keys="cmd+,") # Open settings
4. vu_diff() # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")User: "Watch the dashboard and tell me when the status changes"
1. vu() # Initial capture
2. Loop every 5 seconds:
- vu_diff(threshold=0.02)
- If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to user❌ Bad: vu_click(x=500, y=300) # Hope it's the right spot
✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)Full screen → General context, multi-app workflows
Window → Single app focus, cleaner output
Region → Specific element, faster/smaller❌ Vague: "What do you see?"
✅ Specific:
- "Is there an error message visible? If so, what does it say?"
- "What is the status of the test runner? Passing, failing, or running?"
- "List all form fields visible and their current values"vu_click(x=100, y=100)
# Wait for UI response
vu_diff(threshold=0.01)
vu_click(x=200, y=200)vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")vu_describe(provider="openai")vu_list_monitors()OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium # low/medium/high~/.omni.vu/captures/