agent-rdp

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Windows Remote Desktop Control with agent-rdp

使用agent-rdp控制Windows远程桌面

Quick start

快速开始

bash
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i              # See interactive elements
agent-rdp automate click "@e5"              # Click button by ref
agent-rdp automate fill "@e7" "Hello"       # Type into field
agent-rdp disconnect
bash
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i              # 查看可交互元素
agent-rdp automate click "@e5"              # 通过引用点击按钮
agent-rdp automate fill "@e7" "Hello"       # 在输入框中输入内容
agent-rdp disconnect

Core workflow

核心工作流

  1. Connect with automation:
    agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
  2. Snapshot:
    agent-rdp automate snapshot -i
    (get accessibility tree with refs)
  3. Act:
    agent-rdp automate click @e5
    or
    agent-rdp automate fill @e7 "text"
  4. Repeat: snapshot → act → snapshot → act...
  1. 开启自动化连接:
    agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
  2. 快照:
    agent-rdp automate snapshot -i
    (获取带引用的可访问性树)
  3. 执行操作:
    agent-rdp automate click @e5
    agent-rdp automate fill @e7 "text"
  4. 重复操作:快照 → 执行 → 快照 → 执行...

Troubleshooting

问题排查

Element not in snapshot with
-i

-i
参数的快照中找不到元素

Try without
-i
flag - some elements aren't marked as interactive but are still actionable:
bash
agent-rdp automate snapshot              # Full tree, no filtering
agent-rdp automate snapshot -d 5         # Limit depth if too large
尝试去掉
-i
标志——部分元素未被标记为可交互,但仍可操作:
bash
agent-rdp automate snapshot              # 完整树,无过滤
agent-rdp automate snapshot -d 5         # 树过大时限制层级深度

Element not in accessibility tree at all

元素完全不在可访问性树中

Some UI elements (WebView content, certain dialogs, toast notifications) don't appear in the accessibility tree. Use OCR as a last resort:
  1. Take screenshot to identify what you need:
    agent-rdp screenshot -o screen.png
  2. Use locate to find coordinates:
    agent-rdp locate "Button Text"
  3. Click using returned coordinates:
    agent-rdp mouse click <x> <y>
部分UI元素(WebView内容、特定对话框、Toast通知)不会出现在可访问性树中,万不得已时可使用OCR:
  1. 截图确认目标内容:
    agent-rdp screenshot -o screen.png
  2. 使用定位功能查找坐标:
    agent-rdp locate "Button Text"
  3. 使用返回的坐标点击:
    agent-rdp mouse click <x> <y>

Commands

命令说明

Connection

连接相关

bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin  # Read password from stdin
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share   # Map local directory
agent-rdp disconnect
bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin  # 从标准输入读取密码
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share   # 映射本地目录
agent-rdp disconnect

Screenshot

截图相关

bash
agent-rdp screenshot                      # Save to ./screenshot.png
agent-rdp screenshot -o desktop.png       # Save to specific file
agent-rdp screenshot --format jpeg        # JPEG format
bash
agent-rdp screenshot                      # 保存到./screenshot.png
agent-rdp screenshot -o desktop.png       # 保存到指定文件
agent-rdp screenshot --format jpeg        # JPEG格式

Mouse

鼠标操作

bash
agent-rdp mouse click 500 300             # Left click at (500, 300)
agent-rdp mouse right-click 500 300       # Right click
agent-rdp mouse double-click 500 300      # Double click
agent-rdp mouse move 100 200              # Move cursor
agent-rdp mouse drag 100 100 500 500      # Drag from (100,100) to (500,500)
bash
agent-rdp mouse click 500 300             # 在(500, 300)位置左键点击
agent-rdp mouse right-click 500 300       # 右键点击
agent-rdp mouse double-click 500 300      # 双击
agent-rdp mouse move 100 200              # 移动光标
agent-rdp mouse drag 100 100 500 500      # 从(100,100)拖拽到(500,500)

Keyboard

键盘操作

bash
agent-rdp keyboard type "Hello World"     # Type text (supports Unicode)
agent-rdp keyboard press "ctrl+c"         # Key combination
agent-rdp keyboard press "alt+tab"        # Switch windows
agent-rdp keyboard press "ctrl+shift+esc" # Task manager
agent-rdp keyboard press "win+r"          # Run dialog
agent-rdp keyboard press enter            # Single key (use press, not key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5
bash
agent-rdp keyboard type "Hello World"     # 输入文本(支持Unicode)
agent-rdp keyboard press "ctrl+c"         # 组合键
agent-rdp keyboard press "alt+tab"        # 切换窗口
agent-rdp keyboard press "ctrl+shift+esc" # 打开任务管理器
agent-rdp keyboard press "win+r"          # 打开运行对话框
agent-rdp keyboard press enter            # 单个按键(使用press,不要用key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5

Scroll

滚动操作

bash
agent-rdp scroll up --amount 3            # Scroll up 3 notches
agent-rdp scroll down --amount 5          # Scroll down 5 notches
agent-rdp scroll left
agent-rdp scroll right
bash
agent-rdp scroll up --amount 3            # 向上滚动3格
agent-rdp scroll down --amount 5          # 向下滚动5格
agent-rdp scroll left
agent-rdp scroll right

Clipboard

剪切板操作

bash
agent-rdp clipboard set "Text to paste"   # Set clipboard (paste on Windows)
agent-rdp clipboard get                   # Get clipboard (after copy on Windows)
bash
agent-rdp clipboard set "Text to paste"   # 设置剪切板内容(在Windows端粘贴)
agent-rdp clipboard get                   # 获取剪切板内容(在Windows端复制后)

Drive mapping

驱动器映射

bash
undefined
bash
undefined

Map at connect time

连接时映射

agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName
agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName

List mapped drives

列出已映射驱动器

agent-rdp drive list
undefined
agent-rdp drive list
undefined

Session management

会话管理

bash
agent-rdp session list                    # List active sessions
agent-rdp session info                    # Current session info
agent-rdp --session work connect ...      # Named session
agent-rdp --session work screenshot       # Use named session
bash
agent-rdp session list                    # 列出活跃会话
agent-rdp session info                    # 当前会话信息
agent-rdp --session work connect ...      # 命名会话
agent-rdp --session work screenshot       # 使用命名会话

Wait

等待操作

bash
agent-rdp wait 2000                       # Wait 2 seconds
bash
agent-rdp wait 2000                       # 等待2秒

Locate (OCR)

定位(OCR)

bash
agent-rdp locate "Cancel"                 # Find lines containing "Cancel"
agent-rdp locate "Save*" --pattern        # Glob pattern matching
agent-rdp locate --all                    # Get all text on screen
agent-rdp locate "OK" --json              # JSON output with coordinates
Returns text lines with bounding boxes and center coordinates for clicking:
Found 1 line(s) containing 'Cancel':
  'Cancel' at (650, 420) size 45x14 - center: (672, 427)

To click the first match: agent-rdp mouse click 672 427
bash
agent-rdp locate "Cancel"                 # 查找包含"Cancel"的文本行
agent-rdp locate "Save*" --pattern        # 通配符匹配
agent-rdp locate --all                    # 获取屏幕上所有文本
agent-rdp locate "OK" --json              # 返回带坐标的JSON格式输出
返回带边界框和点击中心坐标的文本行:
Found 1 line(s) containing 'Cancel':
  'Cancel' at (650, 420) size 45x14 - center: (672, 427)

To click the first match: agent-rdp mouse click 672 427

UI Automation

UI自动化

bash
undefined
bash
undefined

Connect with automation enabled

开启自动化连接

agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation

Snapshot - get accessibility tree (refs always included)

快照 - 获取可访问性树(始终包含引用标识)

agent-rdp automate snapshot # Full desktop tree agent-rdp automate snapshot -i # Interactive elements only agent-rdp automate snapshot -c # Compact (remove empty elements) agent-rdp automate snapshot -d 5 # Limit depth to 5 levels agent-rdp automate snapshot -s "~Notepad"# Scope to a window/element agent-rdp automate snapshot -f # Start from focused element agent-rdp automate snapshot -i -c -d 3 # Combine options
agent-rdp automate snapshot # 完整桌面树 agent-rdp automate snapshot -i # 仅显示可交互元素 agent-rdp automate snapshot -c # 精简模式(移除空元素) agent-rdp automate snapshot -d 5 # 限制层级为5级 agent-rdp automate snapshot -s "~Notepad"# 限定范围到某个窗口/元素 agent-rdp automate snapshot -f # 从当前聚焦元素开始 agent-rdp automate snapshot -i -c -d 3 # 组合参数

Pattern-based element operations (use selectors: @eN, #automationId, .className, or name)

基于模式的元素操作(使用选择器:@eN、#automationId、.className或名称)

agent-rdp automate click "#SaveButton" # Click button agent-rdp automate click "@e5" # Click by ref number agent-rdp automate click "@e5" -d # Double-click (for file list items) agent-rdp automate select "@e10" # Select item (SelectionItemPattern) agent-rdp automate select "@e5" --item "Option 1" # Select item by name in container agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern) agent-rdp automate toggle "@e7" --state on # Set specific state agent-rdp automate expand "@e3" # Expand menu/tree (ExpandCollapsePattern) agent-rdp automate collapse "@e3" # Collapse menu/tree agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10) agent-rdp automate focus <selector> # Focus element agent-rdp automate get <selector> # Get element properties
agent-rdp automate click "#SaveButton" # 点击按钮 agent-rdp automate click "@e5" # 通过引用编号点击 agent-rdp automate click "@e5" -d # 双击(适用于文件列表项) agent-rdp automate select "@e10" # 选中项(SelectionItemPattern) agent-rdp automate select "@e5" --item "Option 1" # 在容器中按名称选中项 agent-rdp automate toggle "@e7" # 切换复选框状态(TogglePattern) agent-rdp automate toggle "@e7" --state on # 设置指定状态 agent-rdp automate expand "@e3" # 展开菜单/树(ExpandCollapsePattern) agent-rdp automate collapse "@e3" # 收起菜单/树 agent-rdp automate context-menu "@e5" # 打开上下文菜单(Shift+F10) agent-rdp automate focus <selector> # 聚焦元素 agent-rdp automate get <selector> # 获取元素属性

Text input

文本输入

agent-rdp automate fill <selector> "text" # Clear and fill text (ValuePattern) agent-rdp automate clear <selector> # Just clear
agent-rdp automate fill <selector> "text" # 清空并填充文本(ValuePattern) agent-rdp automate clear <selector> # 仅清空内容

Scrolling

滚动

agent-rdp automate scroll <selector> --direction down --amount 3
agent-rdp automate scroll <selector> --direction down --amount 3

Window operations

窗口操作

agent-rdp automate window list agent-rdp automate window focus "Notepad" agent-rdp automate window maximize agent-rdp automate window minimize agent-rdp automate window restore agent-rdp automate window close "Notepad"
agent-rdp automate window list agent-rdp automate window focus "Notepad" agent-rdp automate window maximize agent-rdp automate window minimize agent-rdp automate window restore agent-rdp automate window close "Notepad"

Run commands/apps (best way to open apps)

运行命令/应用(打开应用的最佳方式)

agent-rdp automate run "notepad.exe" # Open Notepad agent-rdp automate run "Start-Process ms-settings:" --wait # Open Settings agent-rdp automate run "calc.exe" # Open Calculator agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout
agent-rdp automate run "notepad.exe" # 打开记事本 agent-rdp automate run "Start-Process ms-settings:" --wait # 打开设置 agent-rdp automate run "calc.exe" # 打开计算器 agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # 5秒超时

Wait for element

等待元素

agent-rdp automate wait-for <selector> --timeout 5000 agent-rdp automate wait-for <selector> --state visible
agent-rdp automate wait-for <selector> --timeout 5000 agent-rdp automate wait-for <selector> --state visible

Status

状态查询

agent-rdp automate status

**Selector syntax:**
- `@e5` or `@5` - Reference number from snapshot (e prefix recommended)
- `#SaveButton` - Automation ID
- `.Edit` - Win32 class name
- `~*pattern*` - Name with wildcard
- `File` - Element name (exact match)

**Snapshot output format:**
  • Window "Notepad" [ref=e1, id=Notepad]
    • MenuBar "Application" [ref=e2]
      • MenuItem "File" [ref=e3]
    • Edit "Text Editor" [ref=e5, value="Hello"]
undefined
agent-rdp automate status

**选择器语法:**
- `@e5` 或 `@5` - 快照中的引用编号(推荐带e前缀)
- `#SaveButton` - 自动化ID
- `.Edit` - Win32类名
- `~*pattern*` - 带通配符的名称
- `File` - 元素名称(精确匹配)

**快照输出格式:**
  • Window "Notepad" [ref=e1, id=Notepad]
    • MenuBar "Application" [ref=e2]
      • MenuItem "File" [ref=e3]
    • Edit "Text Editor" [ref=e5, value="Hello"]
undefined

JSON output

JSON输出

Add
--json
for machine-readable output:
bash
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshot
添加
--json
参数获取机器可读格式输出:
bash
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshot

Example: Open PowerShell and run command

示例:打开PowerShell并运行命令

bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000                       # Wait for desktop
agent-rdp keyboard press "win+r"          # Open Run dialog
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000                       # Wait for PowerShell
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press enter
agent-rdp screenshot --output result.png
agent-rdp disconnect
bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000                       # 等待桌面加载
agent-rdp keyboard press "win+r"          # 打开运行对话框
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000                       # 等待PowerShell加载
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press enter
agent-rdp screenshot --output result.png
agent-rdp disconnect

Example: File transfer via mapped drive

示例:通过映射驱动器传输文件

bash
undefined
bash
undefined

Connect with local directory mapped

连接时映射本地目录

agent-rdp connect --host 192.168.1.100 -u Admin -p secret --drive /tmp/transfer:Transfer
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --drive /tmp/transfer:Transfer

On Windows, access files at \tsclient\Transfer

在Windows端访问路径为\tsclient\Transfer

agent-rdp keyboard press "win+r" agent-rdp wait 500 agent-rdp keyboard type "\\tsclient\Transfer" agent-rdp keyboard press enter
undefined
agent-rdp keyboard press "win+r" agent-rdp wait 500 agent-rdp keyboard type "\\tsclient\Transfer" agent-rdp keyboard press enter
undefined

Example: Automate Notepad with UI Automation

示例:使用UI自动化操作记事本

bash
undefined
bash
undefined

Connect with automation enabled

开启自动化连接

agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation

Open Notepad

打开记事本

agent-rdp automate run "notepad.exe" agent-rdp wait 2000
agent-rdp automate run "notepad.exe" agent-rdp wait 2000

Get accessibility snapshot (refs are always included)

获取可访问性快照(始终包含引用标识)

agent-rdp automate snapshot -i # Interactive elements only
agent-rdp automate snapshot -i # 仅显示可交互元素

Type text into the edit control (use ref from snapshot)

在编辑控件中输入文本(使用快照中的引用编号)

agent-rdp automate fill "@e5" "Hello from automation!"
agent-rdp automate fill "@e5" "Hello from automation!"

Use File menu to save - expand menu, then invoke menu item

使用文件菜单保存 - 展开菜单,点击菜单项

agent-rdp automate expand "File" # Expand menu (ExpandCollapsePattern) agent-rdp wait 500 agent-rdp automate click "Save As..." # Click menu item
agent-rdp automate expand "File" # 展开菜单(ExpandCollapsePattern) agent-rdp wait 500 agent-rdp automate click "Save As..." # 点击菜单项

Wait for Save dialog

等待保存对话框加载

agent-rdp automate wait-for "#FileNameControlHost" --timeout 5000
agent-rdp automate wait-for "#FileNameControlHost" --timeout 5000

Fill filename and save

填写文件名并保存

agent-rdp automate fill "#FileNameControlHost" "test.txt" agent-rdp automate click "#1" # Click Save button
undefined
agent-rdp automate fill "#FileNameControlHost" "test.txt" agent-rdp automate click "#1" # 点击保存按钮
undefined

Environment variables

环境变量

bash
export AGENT_RDP_HOST=192.168.1.100
export AGENT_RDP_PORT=3389
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
export AGENT_RDP_SESSION=default
agent-rdp connect    # Uses env vars for connection
bash
export AGENT_RDP_HOST=192.168.1.100
export AGENT_RDP_PORT=3389
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
export AGENT_RDP_SESSION=default
agent-rdp connect    # 使用环境变量连接

Debugging with WebSocket streaming

使用WebSocket流调试

bash
undefined
bash
undefined

Enable streaming viewer on port 9224

在9224端口开启流查看器

agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret

Open web viewer in browser

在浏览器中打开网页查看器

agent-rdp view --port 9224
agent-rdp view --port 9224

Or manually access WebSocket at ws://localhost:9224 (broadcasts JPEG frames)

或手动访问WebSocket地址ws://localhost:9224(广播JPEG帧)

undefined
undefined

Tips

使用提示

Prefer
automate fill
over
keyboard type
when automation is enabled—it's lossless (no dropped characters) and faster.
开启自动化时优先使用
automate fill
而非
keyboard type
——它不会丢字符,速度也更快。

Opening applications

打开应用

Use
automate run
to launch apps directly:
bash
agent-rdp automate run "notepad.exe"
agent-rdp automate run "calc.exe"
agent-rdp automate run "Start-Process ms-settings:" --wait   # Settings
agent-rdp automate run "explorer.exe C:\\"                   # File Explorer
使用
automate run
直接启动应用:
bash
agent-rdp automate run "notepad.exe"
agent-rdp automate run "calc.exe"
agent-rdp automate run "Start-Process ms-settings:" --wait   # 设置
agent-rdp automate run "explorer.exe C:\\"                   # 文件资源管理器

Limitations

限制说明

IMPORTANT: Read these limitations carefully before attempting automation tasks.
重要:执行自动化任务前请仔细阅读以下限制。

UI Automation cannot access WebViews

UI Automation无法访问WebViews

  • The Windows Start menu search, Edge browser content, Electron app content, and other WebView-based UIs are NOT accessible via
    automate snapshot
    .
  • Workaround: Use
    Win+R
    (Run dialog) or
    automate run
    to launch programs directly instead of navigating through the Start menu.
  • Windows开始菜单搜索、Edge浏览器内容、Electron应用内容及其他基于WebView的UI无法通过
    automate snapshot
    访问。
  • 解决方案:使用
    Win+R
    (运行对话框)或
    automate run
    直接启动程序,无需通过开始菜单导航。

UI Automation cannot handle UAC dialogs

UI Automation无法处理UAC对话框

  • User Account Control elevation prompts run on a secure desktop isolated from UI Automation.
  • UAC dialogs will NOT appear in
    automate snapshot
    output.
  • Workaround: Use
    locate
    command (OCR) to find button text and
    mouse click
    to interact. This is unreliable but may work for simple Yes/No dialogs.
  • 用户账户控制提升提示运行在与UI Automation隔离的安全桌面中。
  • UAC对话框不会出现在
    automate snapshot
    输出中。
  • 解决方案:使用
    locate
    命令(OCR)查找按钮文本,再通过
    mouse click
    交互。该方案可靠性不高,但可应对简单的是/否对话框。

OCR (
locate
) is not highly reliable

OCR(
locate
)可靠性有限

  • The
    locate
    command uses OCR which can misread characters, miss text entirely, or return imprecise coordinates.
  • Use it as a last resort when UI Automation cannot access elements.
  • Always verify coordinates before clicking critical buttons.
  • locate
    命令使用的OCR可能会识别错误、漏识别文本或返回不精确的坐标。
  • 仅当UI Automation无法访问元素时作为最后手段使用。
  • 点击关键按钮前请务必验证坐标。

DO NOT estimate coordinates from screenshots (Claude only)

请勿从截图估算坐标(仅针对Claude)

  • Claude models in non-computer-use mode (like Claude Code) are very bad at pixel counting.
  • Do NOT look at a screenshot and try to guess coordinates - the estimates will likely be wrong.
  • Note: Gemini models are generally good at pixel coordinate estimation.
  • If you need vision-based coordinate detection with Claude, the user must implement a harness using Claude's Computer Use Tool.
  • 非计算机使用模式下的Claude模型(如Claude Code)像素计数能力很差。
  • 不要通过查看截图猜测坐标——估算结果大概率是错误的。
  • 注意:Gemini模型通常擅长像素坐标估算。
  • 若需要使用Claude进行基于视觉的坐标检测,用户需要基于Claude的Computer Use Tool实现对应组件。

Recommended workflow when UI Automation fails

UI Automation失败时的推荐工作流

  1. First, always try
    automate snapshot
    (with and without
    -i
    flag)
  2. If element not found, try
    locate "text"
    to find via OCR
  3. Use coordinates from
    locate
    output with
    mouse click
  4. Never estimate coordinates by looking at screenshots
  1. 首先始终尝试
    automate snapshot
    (带和不带
    -i
    参数都试一遍)
  2. 如果找不到元素,尝试
    locate "text"
    通过OCR查找
  3. 使用
    locate
    输出的坐标执行
    mouse click
  4. 绝对不要通过查看截图估算坐标