desktop-control
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: desktop-control
Skill:桌面控制
When to Use
使用场景
Use this skill when the user asks to:
- Click somewhere on the screen
- Move the mouse to a position
- Type text into an application
- Press keyboard shortcuts or hotkeys
- Read what's on the current screen (accessibility tree)
- Get information about the frontmost window
- Automate desktop interactions
- Control the computer (mouse, keyboard, screen)
- Scroll up/down in an application
- Drag and drop elements
IMPORTANT: This skill requires Accessibility permissions for the terminal/IDE. On macOS, go to System Settings > Privacy & Security > Accessibility and enable the running application.
当用户提出以下需求时,可使用此Skill:
- 点击屏幕某处
- 将鼠标移动到指定位置
- 在应用中输入文本
- 按下键盘快捷键或热键
- 读取当前屏幕内容(无障碍树)
- 获取最前端窗口的信息
- 自动化桌面交互
- 控制电脑(鼠标、键盘、屏幕)
- 在应用中上下滚动
- 拖拽元素
重要提示:此Skill需要为终端/IDE开启无障碍权限。在macOS系统中,前往「系统设置」>「隐私与安全性」>「无障碍」,启用正在运行的应用。
Bundled Scripts
内置脚本
| Script | Type | Description |
|---|---|---|
| Python | Mouse movement, clicking, dragging, scrolling |
| Python | Text typing, key presses, hotkeys |
| Python | Screen info, capture, accessibility tree reading |
All scripts auto-install if needed.
pyautogui| 脚本名称 | 类型 | 描述 |
|---|---|---|
| Python | 鼠标移动、点击、拖拽、滚动 |
| Python | 文本输入、按键操作、热键 |
| Python | 屏幕信息、截图、无障碍树读取 |
所有脚本会在需要时自动安装库。
pyautoguiMouse Control
鼠标控制
Input Parameters
输入参数
| Parameter | Required | Description | Example |
|---|---|---|---|
| Yes | | click |
| For most | X coordinate (pixels from left) | 500 |
| For most | Y coordinate (pixels from top) | 300 |
| No | Mouse button: | left |
| For drag | Destination X coordinate | 700 |
| For drag | Destination Y coordinate | 400 |
| For scroll | Scroll amount (positive=up, negative=down) | -3 |
| 参数名 | 是否必填 | 说明 | 示例 |
|---|---|---|---|
| 是 | 操作类型: | click |
| 多数操作需要 | X坐标(从左侧开始的像素值) | 500 |
| 多数操作需要 | Y坐标(从顶部开始的像素值) | 300 |
| 否 | 鼠标按键: | left |
| 拖拽操作需要 | 目标X坐标 | 700 |
| 拖拽操作需要 | 目标Y坐标 | 400 |
| 滚动操作需要 | 滚动次数(正数=向上滚动,负数=向下滚动) | -3 |
Script Usage
脚本使用示例
bash
undefinedbash
undefinedMove mouse
移动鼠标
python3 skills/desktop-control/scripts/mouse.py move --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py move --x 500 --y 300
Click at position
在指定位置点击
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
Double click
双击
python3 skills/desktop-control/scripts/mouse.py doubleclick --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py doubleclick --x 500 --y 300
Right click
右键点击
python3 skills/desktop-control/scripts/mouse.py rightclick --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py rightclick --x 500 --y 300
Drag from one position to another
从一个位置拖拽到另一个位置
python3 skills/desktop-control/scripts/mouse.py drag --x 100 --y 100 --to-x 500 --to-y 500
python3 skills/desktop-control/scripts/mouse.py drag --x 100 --y 100 --to-x 500 --to-y 500
Scroll down 3 clicks
向下滚动3次
python3 skills/desktop-control/scripts/mouse.py scroll --amount -3
python3 skills/desktop-control/scripts/mouse.py scroll --amount -3
Scroll up 5 clicks at specific position
在指定位置向上滚动5次
python3 skills/desktop-control/scripts/mouse.py scroll --x 500 --y 300 --amount 5
python3 skills/desktop-control/scripts/mouse.py scroll --x 500 --y 300 --amount 5
Get current mouse position
获取鼠标当前位置
python3 skills/desktop-control/scripts/mouse.py position
---python3 skills/desktop-control/scripts/mouse.py position
---Keyboard Control
键盘控制
Input Parameters
输入参数
| Parameter | Required | Description | Example |
|---|---|---|---|
| Yes | | type |
| For type | Text to type | Hello World |
| For press | Key name to press | enter |
| For hotkey | Key combination, plus-separated | command+c |
| No | Delay between keystrokes in seconds (default: 0.02) | 0.05 |
| 参数名 | 是否必填 | 说明 | 示例 |
|---|---|---|---|
| 是 | 操作类型: | type |
| 输入文本操作需要 | 要输入的文本 | Hello World |
| 按下按键操作需要 | 要按下的按键名称 | enter |
| 快捷键操作需要 | 组合按键,使用加号分隔 | command+c |
| 否 | 按键间隔时间(秒,默认值:0.02) | 0.05 |
Script Usage
脚本使用示例
bash
undefinedbash
undefinedType text
输入文本
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello World"
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello World"
Type slowly
慢速输入文本
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello" --interval 0.1
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello" --interval 0.1
Press a single key
按下单个按键
python3 skills/desktop-control/scripts/keyboard.py press --key enter
python3 skills/desktop-control/scripts/keyboard.py press --key tab
python3 skills/desktop-control/scripts/keyboard.py press --key escape
python3 skills/desktop-control/scripts/keyboard.py press --key enter
python3 skills/desktop-control/scripts/keyboard.py press --key tab
python3 skills/desktop-control/scripts/keyboard.py press --key escape
Keyboard shortcuts (hotkeys)
键盘快捷键(热键)
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+c"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+shift+s"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "alt+tab"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+space"
undefinedpython3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+c"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+shift+s"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "alt+tab"
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+space"
undefinedCommon Key Names
常用按键名称
enterreturntabspacebackspacedeleteescapeupdownleftrighthomeendpageuppagedownf1f12commandctrlaltshiftcapslockenterreturntabspacebackspacedeleteescapeupdownleftrighthomeendpageuppagedownf1f12commandctrlaltshiftcapslockScreen Reading
屏幕读取
Input Parameters
输入参数
| Parameter | Required | Description | Example |
|---|---|---|---|
| Yes | | read-ui |
| For capture | Screenshot output path | /tmp/screen.png |
| For capture region | Region to capture |
| 参数名 | 是否必填 | 说明 | 示例 |
|---|---|---|---|
| 是 | 操作类型: | read-ui |
| 截图操作需要 | 截图输出路径 | /tmp/screen.png |
| 区域截图操作需要 | 要截取的区域坐标 |
Script Usage
脚本使用示例
bash
undefinedbash
undefinedGet screen size and mouse position
获取屏幕尺寸和鼠标当前位置
python3 skills/desktop-control/scripts/screen.py info
python3 skills/desktop-control/scripts/screen.py info
Take a screenshot
截取全屏
python3 skills/desktop-control/scripts/screen.py capture --output /tmp/screen.png
python3 skills/desktop-control/scripts/screen.py capture --output /tmp/screen.png
Capture a specific region
截取指定区域
python3 skills/desktop-control/scripts/screen.py capture --x 0 --y 0 --width 800 --height 600 --output /tmp/region.png
python3 skills/desktop-control/scripts/screen.py capture --x 0 --y 0 --width 800 --height 600 --output /tmp/region.png
Read the accessibility tree of the frontmost application (MOST USEFUL)
读取最前端应用的无障碍树(最常用)
python3 skills/desktop-control/scripts/screen.py read-ui
python3 skills/desktop-control/scripts/screen.py read-ui
Read accessibility tree with depth limit
读取指定深度的无障碍树
python3 skills/desktop-control/scripts/screen.py read-ui --depth 3
The `read-ui` command uses AppleScript to read the accessibility tree of the frontmost application, returning window titles, buttons, text fields, menus, and other UI elements. This is the primary way to understand what's on screen before interacting.
---python3 skills/desktop-control/scripts/screen.py read-ui --depth 3
`read-ui`命令通过AppleScript读取最前端应用的无障碍树,返回窗口标题、按钮、文本框、菜单及其他UI元素。这是在进行交互前了解屏幕内容的主要方式。
---Typical Workflow
典型工作流程
- Read the screen to understand what's visible:
bash
python3 skills/desktop-control/scripts/screen.py read-ui - Identify targets from the accessibility tree output
- Interact using mouse/keyboard:
bash
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300 python3 skills/desktop-control/scripts/keyboard.py type --text "search query" python3 skills/desktop-control/scripts/keyboard.py press --key enter - Verify by reading the screen again
- 读取屏幕内容,了解当前显示的信息:
bash
python3 skills/desktop-control/scripts/screen.py read-ui - 从无障碍树输出中识别目标元素
- 进行交互操作,使用鼠标/键盘脚本:
bash
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300 python3 skills/desktop-control/scripts/keyboard.py type --text "search query" python3 skills/desktop-control/scripts/keyboard.py press --key enter - 再次读取屏幕内容进行验证
Example
示例场景
click on the search bar
type "hello" into the text field
press command+s to save
what's on the screen right now
read the UI elements of the current window
move the mouse to the center of the screen
scroll down in this window点击搜索栏
在文本框中输入“hello”
按下command+s保存
当前屏幕显示了什么内容
读取当前窗口的UI元素
将鼠标移动到屏幕中央
在当前窗口向下滚动