desktop-control

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: desktop-control

Skill:桌面控制

When to Use

使用场景

Use this skill when the user asks to:
  • Click somewhere on the screen
  • Move the mouse to a position
  • Type text into an application
  • Press keyboard shortcuts or hotkeys
  • Read what's on the current screen (accessibility tree)
  • Get information about the frontmost window
  • Automate desktop interactions
  • Control the computer (mouse, keyboard, screen)
  • Scroll up/down in an application
  • Drag and drop elements
IMPORTANT: This skill requires Accessibility permissions for the terminal/IDE. On macOS, go to System Settings > Privacy & Security > Accessibility and enable the running application.
当用户提出以下需求时,可使用此Skill:
  • 点击屏幕某处
  • 将鼠标移动到指定位置
  • 在应用中输入文本
  • 按下键盘快捷键或热键
  • 读取当前屏幕内容(无障碍树)
  • 获取最前端窗口的信息
  • 自动化桌面交互
  • 控制电脑(鼠标、键盘、屏幕)
  • 在应用中上下滚动
  • 拖拽元素
重要提示:此Skill需要为终端/IDE开启无障碍权限。在macOS系统中,前往「系统设置」>「隐私与安全性」>「无障碍」,启用正在运行的应用。

Bundled Scripts

内置脚本

ScriptTypeDescription
scripts/mouse.py
PythonMouse movement, clicking, dragging, scrolling
scripts/keyboard.py
PythonText typing, key presses, hotkeys
scripts/screen.py
PythonScreen info, capture, accessibility tree reading
All scripts auto-install
pyautogui
if needed.

脚本名称类型描述
scripts/mouse.py
Python鼠标移动、点击、拖拽、滚动
scripts/keyboard.py
Python文本输入、按键操作、热键
scripts/screen.py
Python屏幕信息、截图、无障碍树读取
所有脚本会在需要时自动安装
pyautogui
库。

Mouse Control

鼠标控制

Input Parameters

输入参数

ParameterRequiredDescriptionExample
action
Yes
move
,
click
,
doubleclick
,
rightclick
,
drag
,
scroll
click
x
For mostX coordinate (pixels from left)500
y
For mostY coordinate (pixels from top)300
button
NoMouse button:
left
(default),
right
,
middle
left
to_x
For dragDestination X coordinate700
to_y
For dragDestination Y coordinate400
amount
For scrollScroll amount (positive=up, negative=down)-3
参数名是否必填说明示例
action
操作类型:
move
(移动)、
click
(点击)、
doubleclick
(双击)、
rightclick
(右键点击)、
drag
(拖拽)、
scroll
(滚动)
click
x
多数操作需要X坐标(从左侧开始的像素值)500
y
多数操作需要Y坐标(从顶部开始的像素值)300
button
鼠标按键:
left
(左键,默认)、
right
(右键)、
middle
(中键)
left
to_x
拖拽操作需要目标X坐标700
to_y
拖拽操作需要目标Y坐标400
amount
滚动操作需要滚动次数(正数=向上滚动,负数=向下滚动)-3

Script Usage

脚本使用示例

bash
undefined
bash
undefined

Move mouse

移动鼠标

python3 skills/desktop-control/scripts/mouse.py move --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py move --x 500 --y 300

Click at position

在指定位置点击

python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300

Double click

双击

python3 skills/desktop-control/scripts/mouse.py doubleclick --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py doubleclick --x 500 --y 300

Right click

右键点击

python3 skills/desktop-control/scripts/mouse.py rightclick --x 500 --y 300
python3 skills/desktop-control/scripts/mouse.py rightclick --x 500 --y 300

Drag from one position to another

从一个位置拖拽到另一个位置

python3 skills/desktop-control/scripts/mouse.py drag --x 100 --y 100 --to-x 500 --to-y 500
python3 skills/desktop-control/scripts/mouse.py drag --x 100 --y 100 --to-x 500 --to-y 500

Scroll down 3 clicks

向下滚动3次

python3 skills/desktop-control/scripts/mouse.py scroll --amount -3
python3 skills/desktop-control/scripts/mouse.py scroll --amount -3

Scroll up 5 clicks at specific position

在指定位置向上滚动5次

python3 skills/desktop-control/scripts/mouse.py scroll --x 500 --y 300 --amount 5
python3 skills/desktop-control/scripts/mouse.py scroll --x 500 --y 300 --amount 5

Get current mouse position

获取鼠标当前位置

python3 skills/desktop-control/scripts/mouse.py position

---
python3 skills/desktop-control/scripts/mouse.py position

---

Keyboard Control

键盘控制

Input Parameters

输入参数

ParameterRequiredDescriptionExample
action
Yes
type
,
press
,
hotkey
type
text
For typeText to typeHello World
key
For pressKey name to pressenter
keys
For hotkeyKey combination, plus-separatedcommand+c
interval
NoDelay between keystrokes in seconds (default: 0.02)0.05
参数名是否必填说明示例
action
操作类型:
type
(输入文本)、
press
(按下按键)、
hotkey
(快捷键)
type
text
输入文本操作需要要输入的文本Hello World
key
按下按键操作需要要按下的按键名称enter
keys
快捷键操作需要组合按键,使用加号分隔command+c
interval
按键间隔时间(秒,默认值:0.02)0.05

Script Usage

脚本使用示例

bash
undefined
bash
undefined

Type text

输入文本

python3 skills/desktop-control/scripts/keyboard.py type --text "Hello World"
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello World"

Type slowly

慢速输入文本

python3 skills/desktop-control/scripts/keyboard.py type --text "Hello" --interval 0.1
python3 skills/desktop-control/scripts/keyboard.py type --text "Hello" --interval 0.1

Press a single key

按下单个按键

python3 skills/desktop-control/scripts/keyboard.py press --key enter python3 skills/desktop-control/scripts/keyboard.py press --key tab python3 skills/desktop-control/scripts/keyboard.py press --key escape
python3 skills/desktop-control/scripts/keyboard.py press --key enter python3 skills/desktop-control/scripts/keyboard.py press --key tab python3 skills/desktop-control/scripts/keyboard.py press --key escape

Keyboard shortcuts (hotkeys)

键盘快捷键(热键)

python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+c" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+shift+s" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "alt+tab" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+space"
undefined
python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+c" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+shift+s" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "alt+tab" python3 skills/desktop-control/scripts/keyboard.py hotkey --keys "command+space"
undefined

Common Key Names

常用按键名称

enter
,
return
,
tab
,
space
,
backspace
,
delete
,
escape
,
up
,
down
,
left
,
right
,
home
,
end
,
pageup
,
pagedown
,
f1
-
f12
,
command
,
ctrl
,
alt
,
shift
,
capslock

enter
,
return
,
tab
,
space
,
backspace
,
delete
,
escape
,
up
,
down
,
left
,
right
,
home
,
end
,
pageup
,
pagedown
,
f1
-
f12
,
command
,
ctrl
,
alt
,
shift
,
capslock

Screen Reading

屏幕读取

Input Parameters

输入参数

ParameterRequiredDescriptionExample
action
Yes
info
,
capture
,
read-ui
read-ui
output
For captureScreenshot output path/tmp/screen.png
x
,
y
,
width
,
height
For capture regionRegion to capture
参数名是否必填说明示例
action
操作类型:
info
(获取信息)、
capture
(截图)、
read-ui
(读取UI)
read-ui
output
截图操作需要截图输出路径/tmp/screen.png
x
,
y
,
width
,
height
区域截图操作需要要截取的区域坐标

Script Usage

脚本使用示例

bash
undefined
bash
undefined

Get screen size and mouse position

获取屏幕尺寸和鼠标当前位置

python3 skills/desktop-control/scripts/screen.py info
python3 skills/desktop-control/scripts/screen.py info

Take a screenshot

截取全屏

python3 skills/desktop-control/scripts/screen.py capture --output /tmp/screen.png
python3 skills/desktop-control/scripts/screen.py capture --output /tmp/screen.png

Capture a specific region

截取指定区域

python3 skills/desktop-control/scripts/screen.py capture --x 0 --y 0 --width 800 --height 600 --output /tmp/region.png
python3 skills/desktop-control/scripts/screen.py capture --x 0 --y 0 --width 800 --height 600 --output /tmp/region.png

Read the accessibility tree of the frontmost application (MOST USEFUL)

读取最前端应用的无障碍树(最常用)

python3 skills/desktop-control/scripts/screen.py read-ui
python3 skills/desktop-control/scripts/screen.py read-ui

Read accessibility tree with depth limit

读取指定深度的无障碍树

python3 skills/desktop-control/scripts/screen.py read-ui --depth 3

The `read-ui` command uses AppleScript to read the accessibility tree of the frontmost application, returning window titles, buttons, text fields, menus, and other UI elements. This is the primary way to understand what's on screen before interacting.

---
python3 skills/desktop-control/scripts/screen.py read-ui --depth 3

`read-ui`命令通过AppleScript读取最前端应用的无障碍树,返回窗口标题、按钮、文本框、菜单及其他UI元素。这是在进行交互前了解屏幕内容的主要方式。

---

Typical Workflow

典型工作流程

  1. Read the screen to understand what's visible:
    bash
    python3 skills/desktop-control/scripts/screen.py read-ui
  2. Identify targets from the accessibility tree output
  3. Interact using mouse/keyboard:
    bash
    python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
    python3 skills/desktop-control/scripts/keyboard.py type --text "search query"
    python3 skills/desktop-control/scripts/keyboard.py press --key enter
  4. Verify by reading the screen again
  1. 读取屏幕内容,了解当前显示的信息:
    bash
    python3 skills/desktop-control/scripts/screen.py read-ui
  2. 从无障碍树输出中识别目标元素
  3. 进行交互操作,使用鼠标/键盘脚本:
    bash
    python3 skills/desktop-control/scripts/mouse.py click --x 500 --y 300
    python3 skills/desktop-control/scripts/keyboard.py type --text "search query"
    python3 skills/desktop-control/scripts/keyboard.py press --key enter
  4. 再次读取屏幕内容进行验证

Example

示例场景

click on the search bar
type "hello" into the text field
press command+s to save
what's on the screen right now
read the UI elements of the current window
move the mouse to the center of the screen
scroll down in this window
点击搜索栏
在文本框中输入“hello”
按下command+s保存
当前屏幕显示了什么内容
读取当前窗口的UI元素
将鼠标移动到屏幕中央
在当前窗口向下滚动