agent-rdp
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWindows Remote Desktop Control with agent-rdp
使用agent-rdp控制Windows远程桌面
Quick start
快速开始
bash
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i # See interactive elements
agent-rdp automate click "@e5" # Click button by ref
agent-rdp automate fill "@e7" "Hello" # Type into field
agent-rdp disconnectbash
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation
agent-rdp automate snapshot -i # 查看可交互元素
agent-rdp automate click "@e5" # 通过引用点击按钮
agent-rdp automate fill "@e7" "Hello" # 在输入框中输入内容
agent-rdp disconnectCore workflow
核心工作流
- Connect with automation:
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation - Snapshot: (get accessibility tree with refs)
agent-rdp automate snapshot -i - Act: or
agent-rdp automate click @e5agent-rdp automate fill @e7 "text" - Repeat: snapshot → act → snapshot → act...
- 开启自动化连接:
agent-rdp connect --host <ip> -u <user> -p <pass> --enable-win-automation - 快照:(获取带引用的可访问性树)
agent-rdp automate snapshot -i - 执行操作:或
agent-rdp automate click @e5agent-rdp automate fill @e7 "text" - 重复操作:快照 → 执行 → 快照 → 执行...
Troubleshooting
问题排查
Element not in snapshot with -i
-i带-i
参数的快照中找不到元素
-iTry without flag - some elements aren't marked as interactive but are still actionable:
-ibash
agent-rdp automate snapshot # Full tree, no filtering
agent-rdp automate snapshot -d 5 # Limit depth if too large尝试去掉标志——部分元素未被标记为可交互,但仍可操作:
-ibash
agent-rdp automate snapshot # 完整树,无过滤
agent-rdp automate snapshot -d 5 # 树过大时限制层级深度Element not in accessibility tree at all
元素完全不在可访问性树中
Some UI elements (WebView content, certain dialogs, toast notifications) don't appear in the accessibility tree. Use OCR as a last resort:
- Take screenshot to identify what you need:
agent-rdp screenshot -o screen.png - Use locate to find coordinates:
agent-rdp locate "Button Text" - Click using returned coordinates:
agent-rdp mouse click <x> <y>
部分UI元素(WebView内容、特定对话框、Toast通知)不会出现在可访问性树中,万不得已时可使用OCR:
- 截图确认目标内容:
agent-rdp screenshot -o screen.png - 使用定位功能查找坐标:
agent-rdp locate "Button Text" - 使用返回的坐标点击:
agent-rdp mouse click <x> <y>
Commands
命令说明
Connection
连接相关
bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin # Read password from stdin
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share # Map local directory
agent-rdp disconnectbash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp connect --host 192.168.1.100 -u Admin --password-stdin # 从标准输入读取密码
agent-rdp connect --host 192.168.1.100 --width 1920 --height 1080
agent-rdp connect --host 192.168.1.100 --drive /tmp/share:Share # 映射本地目录
agent-rdp disconnectScreenshot
截图相关
bash
agent-rdp screenshot # Save to ./screenshot.png
agent-rdp screenshot -o desktop.png # Save to specific file
agent-rdp screenshot --format jpeg # JPEG formatbash
agent-rdp screenshot # 保存到./screenshot.png
agent-rdp screenshot -o desktop.png # 保存到指定文件
agent-rdp screenshot --format jpeg # JPEG格式Mouse
鼠标操作
bash
agent-rdp mouse click 500 300 # Left click at (500, 300)
agent-rdp mouse right-click 500 300 # Right click
agent-rdp mouse double-click 500 300 # Double click
agent-rdp mouse move 100 200 # Move cursor
agent-rdp mouse drag 100 100 500 500 # Drag from (100,100) to (500,500)bash
agent-rdp mouse click 500 300 # 在(500, 300)位置左键点击
agent-rdp mouse right-click 500 300 # 右键点击
agent-rdp mouse double-click 500 300 # 双击
agent-rdp mouse move 100 200 # 移动光标
agent-rdp mouse drag 100 100 500 500 # 从(100,100)拖拽到(500,500)Keyboard
键盘操作
bash
agent-rdp keyboard type "Hello World" # Type text (supports Unicode)
agent-rdp keyboard press "ctrl+c" # Key combination
agent-rdp keyboard press "alt+tab" # Switch windows
agent-rdp keyboard press "ctrl+shift+esc" # Task manager
agent-rdp keyboard press "win+r" # Run dialog
agent-rdp keyboard press enter # Single key (use press, not key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5bash
agent-rdp keyboard type "Hello World" # 输入文本(支持Unicode)
agent-rdp keyboard press "ctrl+c" # 组合键
agent-rdp keyboard press "alt+tab" # 切换窗口
agent-rdp keyboard press "ctrl+shift+esc" # 打开任务管理器
agent-rdp keyboard press "win+r" # 打开运行对话框
agent-rdp keyboard press enter # 单个按键(使用press,不要用key)
agent-rdp keyboard press escape
agent-rdp keyboard press f5Scroll
滚动操作
bash
agent-rdp scroll up --amount 3 # Scroll up 3 notches
agent-rdp scroll down --amount 5 # Scroll down 5 notches
agent-rdp scroll left
agent-rdp scroll rightbash
agent-rdp scroll up --amount 3 # 向上滚动3格
agent-rdp scroll down --amount 5 # 向下滚动5格
agent-rdp scroll left
agent-rdp scroll rightClipboard
剪切板操作
bash
agent-rdp clipboard set "Text to paste" # Set clipboard (paste on Windows)
agent-rdp clipboard get # Get clipboard (after copy on Windows)bash
agent-rdp clipboard set "Text to paste" # 设置剪切板内容(在Windows端粘贴)
agent-rdp clipboard get # 获取剪切板内容(在Windows端复制后)Drive mapping
驱动器映射
bash
undefinedbash
undefinedMap at connect time
连接时映射
agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName
agent-rdp connect --host <ip> -u <user> -p <pass> --drive /local/path:DriveName
List mapped drives
列出已映射驱动器
agent-rdp drive list
undefinedagent-rdp drive list
undefinedSession management
会话管理
bash
agent-rdp session list # List active sessions
agent-rdp session info # Current session info
agent-rdp --session work connect ... # Named session
agent-rdp --session work screenshot # Use named sessionbash
agent-rdp session list # 列出活跃会话
agent-rdp session info # 当前会话信息
agent-rdp --session work connect ... # 命名会话
agent-rdp --session work screenshot # 使用命名会话Wait
等待操作
bash
agent-rdp wait 2000 # Wait 2 secondsbash
agent-rdp wait 2000 # 等待2秒Locate (OCR)
定位(OCR)
bash
agent-rdp locate "Cancel" # Find lines containing "Cancel"
agent-rdp locate "Save*" --pattern # Glob pattern matching
agent-rdp locate --all # Get all text on screen
agent-rdp locate "OK" --json # JSON output with coordinatesReturns text lines with bounding boxes and center coordinates for clicking:
Found 1 line(s) containing 'Cancel':
'Cancel' at (650, 420) size 45x14 - center: (672, 427)
To click the first match: agent-rdp mouse click 672 427bash
agent-rdp locate "Cancel" # 查找包含"Cancel"的文本行
agent-rdp locate "Save*" --pattern # 通配符匹配
agent-rdp locate --all # 获取屏幕上所有文本
agent-rdp locate "OK" --json # 返回带坐标的JSON格式输出返回带边界框和点击中心坐标的文本行:
Found 1 line(s) containing 'Cancel':
'Cancel' at (650, 420) size 45x14 - center: (672, 427)
To click the first match: agent-rdp mouse click 672 427UI Automation
UI自动化
bash
undefinedbash
undefinedConnect with automation enabled
开启自动化连接
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
Snapshot - get accessibility tree (refs always included)
快照 - 获取可访问性树(始终包含引用标识)
agent-rdp automate snapshot # Full desktop tree
agent-rdp automate snapshot -i # Interactive elements only
agent-rdp automate snapshot -c # Compact (remove empty elements)
agent-rdp automate snapshot -d 5 # Limit depth to 5 levels
agent-rdp automate snapshot -s "~Notepad"# Scope to a window/element
agent-rdp automate snapshot -f # Start from focused element
agent-rdp automate snapshot -i -c -d 3 # Combine options
agent-rdp automate snapshot # 完整桌面树
agent-rdp automate snapshot -i # 仅显示可交互元素
agent-rdp automate snapshot -c # 精简模式(移除空元素)
agent-rdp automate snapshot -d 5 # 限制层级为5级
agent-rdp automate snapshot -s "~Notepad"# 限定范围到某个窗口/元素
agent-rdp automate snapshot -f # 从当前聚焦元素开始
agent-rdp automate snapshot -i -c -d 3 # 组合参数
Pattern-based element operations (use selectors: @eN, #automationId, .className, or name)
基于模式的元素操作(使用选择器:@eN、#automationId、.className或名称)
agent-rdp automate click "#SaveButton" # Click button
agent-rdp automate click "@e5" # Click by ref number
agent-rdp automate click "@e5" -d # Double-click (for file list items)
agent-rdp automate select "@e10" # Select item (SelectionItemPattern)
agent-rdp automate select "@e5" --item "Option 1" # Select item by name in container
agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern)
agent-rdp automate toggle "@e7" --state on # Set specific state
agent-rdp automate expand "@e3" # Expand menu/tree (ExpandCollapsePattern)
agent-rdp automate collapse "@e3" # Collapse menu/tree
agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10)
agent-rdp automate focus <selector> # Focus element
agent-rdp automate get <selector> # Get element properties
agent-rdp automate click "#SaveButton" # 点击按钮
agent-rdp automate click "@e5" # 通过引用编号点击
agent-rdp automate click "@e5" -d # 双击(适用于文件列表项)
agent-rdp automate select "@e10" # 选中项(SelectionItemPattern)
agent-rdp automate select "@e5" --item "Option 1" # 在容器中按名称选中项
agent-rdp automate toggle "@e7" # 切换复选框状态(TogglePattern)
agent-rdp automate toggle "@e7" --state on # 设置指定状态
agent-rdp automate expand "@e3" # 展开菜单/树(ExpandCollapsePattern)
agent-rdp automate collapse "@e3" # 收起菜单/树
agent-rdp automate context-menu "@e5" # 打开上下文菜单(Shift+F10)
agent-rdp automate focus <selector> # 聚焦元素
agent-rdp automate get <selector> # 获取元素属性
Text input
文本输入
agent-rdp automate fill <selector> "text" # Clear and fill text (ValuePattern)
agent-rdp automate clear <selector> # Just clear
agent-rdp automate fill <selector> "text" # 清空并填充文本(ValuePattern)
agent-rdp automate clear <selector> # 仅清空内容
Scrolling
滚动
agent-rdp automate scroll <selector> --direction down --amount 3
agent-rdp automate scroll <selector> --direction down --amount 3
Window operations
窗口操作
agent-rdp automate window list
agent-rdp automate window focus "Notepad"
agent-rdp automate window maximize
agent-rdp automate window minimize
agent-rdp automate window restore
agent-rdp automate window close "Notepad"
agent-rdp automate window list
agent-rdp automate window focus "Notepad"
agent-rdp automate window maximize
agent-rdp automate window minimize
agent-rdp automate window restore
agent-rdp automate window close "Notepad"
Run commands/apps (best way to open apps)
运行命令/应用(打开应用的最佳方式)
agent-rdp automate run "notepad.exe" # Open Notepad
agent-rdp automate run "Start-Process ms-settings:" --wait # Open Settings
agent-rdp automate run "calc.exe" # Open Calculator
agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout
agent-rdp automate run "notepad.exe" # 打开记事本
agent-rdp automate run "Start-Process ms-settings:" --wait # 打开设置
agent-rdp automate run "calc.exe" # 打开计算器
agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # 5秒超时
Wait for element
等待元素
agent-rdp automate wait-for <selector> --timeout 5000
agent-rdp automate wait-for <selector> --state visible
agent-rdp automate wait-for <selector> --timeout 5000
agent-rdp automate wait-for <selector> --state visible
Status
状态查询
agent-rdp automate status
**Selector syntax:**
- `@e5` or `@5` - Reference number from snapshot (e prefix recommended)
- `#SaveButton` - Automation ID
- `.Edit` - Win32 class name
- `~*pattern*` - Name with wildcard
- `File` - Element name (exact match)
**Snapshot output format:**- Window "Notepad" [ref=e1, id=Notepad]
- MenuBar "Application" [ref=e2]
- MenuItem "File" [ref=e3]
- Edit "Text Editor" [ref=e5, value="Hello"]
- MenuBar "Application" [ref=e2]
undefinedagent-rdp automate status
**选择器语法:**
- `@e5` 或 `@5` - 快照中的引用编号(推荐带e前缀)
- `#SaveButton` - 自动化ID
- `.Edit` - Win32类名
- `~*pattern*` - 带通配符的名称
- `File` - 元素名称(精确匹配)
**快照输出格式:**- Window "Notepad" [ref=e1, id=Notepad]
- MenuBar "Application" [ref=e2]
- MenuItem "File" [ref=e3]
- Edit "Text Editor" [ref=e5, value="Hello"]
- MenuBar "Application" [ref=e2]
undefinedJSON output
JSON输出
Add for machine-readable output:
--jsonbash
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshot添加参数获取机器可读格式输出:
--jsonbash
agent-rdp --json clipboard get
agent-rdp --json session info
agent-rdp --json automate snapshotExample: Open PowerShell and run command
示例:打开PowerShell并运行命令
bash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000 # Wait for desktop
agent-rdp keyboard press "win+r" # Open Run dialog
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000 # Wait for PowerShell
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press enter
agent-rdp screenshot --output result.png
agent-rdp disconnectbash
agent-rdp connect --host 192.168.1.100 -u Admin -p secret
agent-rdp wait 3000 # 等待桌面加载
agent-rdp keyboard press "win+r" # 打开运行对话框
agent-rdp wait 1000
agent-rdp keyboard type "powershell"
agent-rdp keyboard press enter
agent-rdp wait 2000 # 等待PowerShell加载
agent-rdp keyboard type "Get-Process"
agent-rdp keyboard press enter
agent-rdp screenshot --output result.png
agent-rdp disconnectExample: File transfer via mapped drive
示例:通过映射驱动器传输文件
bash
undefinedbash
undefinedConnect with local directory mapped
连接时映射本地目录
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --drive /tmp/transfer:Transfer
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --drive /tmp/transfer:Transfer
On Windows, access files at \tsclient\Transfer
在Windows端访问路径为\tsclient\Transfer
agent-rdp keyboard press "win+r"
agent-rdp wait 500
agent-rdp keyboard type "\\tsclient\Transfer"
agent-rdp keyboard press enter
undefinedagent-rdp keyboard press "win+r"
agent-rdp wait 500
agent-rdp keyboard type "\\tsclient\Transfer"
agent-rdp keyboard press enter
undefinedExample: Automate Notepad with UI Automation
示例:使用UI自动化操作记事本
bash
undefinedbash
undefinedConnect with automation enabled
开启自动化连接
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation
Open Notepad
打开记事本
agent-rdp automate run "notepad.exe"
agent-rdp wait 2000
agent-rdp automate run "notepad.exe"
agent-rdp wait 2000
Get accessibility snapshot (refs are always included)
获取可访问性快照(始终包含引用标识)
agent-rdp automate snapshot -i # Interactive elements only
agent-rdp automate snapshot -i # 仅显示可交互元素
Type text into the edit control (use ref from snapshot)
在编辑控件中输入文本(使用快照中的引用编号)
agent-rdp automate fill "@e5" "Hello from automation!"
agent-rdp automate fill "@e5" "Hello from automation!"
Use File menu to save - expand menu, then invoke menu item
使用文件菜单保存 - 展开菜单,点击菜单项
agent-rdp automate expand "File" # Expand menu (ExpandCollapsePattern)
agent-rdp wait 500
agent-rdp automate click "Save As..." # Click menu item
agent-rdp automate expand "File" # 展开菜单(ExpandCollapsePattern)
agent-rdp wait 500
agent-rdp automate click "Save As..." # 点击菜单项
Wait for Save dialog
等待保存对话框加载
agent-rdp automate wait-for "#FileNameControlHost" --timeout 5000
agent-rdp automate wait-for "#FileNameControlHost" --timeout 5000
Fill filename and save
填写文件名并保存
agent-rdp automate fill "#FileNameControlHost" "test.txt"
agent-rdp automate click "#1" # Click Save button
undefinedagent-rdp automate fill "#FileNameControlHost" "test.txt"
agent-rdp automate click "#1" # 点击保存按钮
undefinedEnvironment variables
环境变量
bash
export AGENT_RDP_HOST=192.168.1.100
export AGENT_RDP_PORT=3389
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
export AGENT_RDP_SESSION=default
agent-rdp connect # Uses env vars for connectionbash
export AGENT_RDP_HOST=192.168.1.100
export AGENT_RDP_PORT=3389
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
export AGENT_RDP_SESSION=default
agent-rdp connect # 使用环境变量连接Debugging with WebSocket streaming
使用WebSocket流调试
bash
undefinedbash
undefinedEnable streaming viewer on port 9224
在9224端口开启流查看器
agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
Open web viewer in browser
在浏览器中打开网页查看器
agent-rdp view --port 9224
agent-rdp view --port 9224
Or manually access WebSocket at ws://localhost:9224 (broadcasts JPEG frames)
或手动访问WebSocket地址ws://localhost:9224(广播JPEG帧)
undefinedundefinedTips
使用提示
Prefer over when automation is enabled—it's lossless (no dropped characters) and faster.
automate fillkeyboard type开启自动化时优先使用而非——它不会丢字符,速度也更快。
automate fillkeyboard typeOpening applications
打开应用
Use to launch apps directly:
automate runbash
agent-rdp automate run "notepad.exe"
agent-rdp automate run "calc.exe"
agent-rdp automate run "Start-Process ms-settings:" --wait # Settings
agent-rdp automate run "explorer.exe C:\\" # File Explorer使用直接启动应用:
automate runbash
agent-rdp automate run "notepad.exe"
agent-rdp automate run "calc.exe"
agent-rdp automate run "Start-Process ms-settings:" --wait # 设置
agent-rdp automate run "explorer.exe C:\\" # 文件资源管理器Limitations
限制说明
IMPORTANT: Read these limitations carefully before attempting automation tasks.
重要:执行自动化任务前请仔细阅读以下限制。
UI Automation cannot access WebViews
UI Automation无法访问WebViews
- The Windows Start menu search, Edge browser content, Electron app content, and other WebView-based UIs are NOT accessible via .
automate snapshot - Workaround: Use (Run dialog) or
Win+Rto launch programs directly instead of navigating through the Start menu.automate run
- Windows开始菜单搜索、Edge浏览器内容、Electron应用内容及其他基于WebView的UI无法通过访问。
automate snapshot - 解决方案:使用(运行对话框)或
Win+R直接启动程序,无需通过开始菜单导航。automate run
UI Automation cannot handle UAC dialogs
UI Automation无法处理UAC对话框
- User Account Control elevation prompts run on a secure desktop isolated from UI Automation.
- UAC dialogs will NOT appear in output.
automate snapshot - Workaround: Use command (OCR) to find button text and
locateto interact. This is unreliable but may work for simple Yes/No dialogs.mouse click
- 用户账户控制提升提示运行在与UI Automation隔离的安全桌面中。
- UAC对话框不会出现在输出中。
automate snapshot - 解决方案:使用命令(OCR)查找按钮文本,再通过
locate交互。该方案可靠性不高,但可应对简单的是/否对话框。mouse click
OCR (locate
) is not highly reliable
locateOCR(locate
)可靠性有限
locate- The command uses OCR which can misread characters, miss text entirely, or return imprecise coordinates.
locate - Use it as a last resort when UI Automation cannot access elements.
- Always verify coordinates before clicking critical buttons.
- 命令使用的OCR可能会识别错误、漏识别文本或返回不精确的坐标。
locate - 仅当UI Automation无法访问元素时作为最后手段使用。
- 点击关键按钮前请务必验证坐标。
DO NOT estimate coordinates from screenshots (Claude only)
请勿从截图估算坐标(仅针对Claude)
- Claude models in non-computer-use mode (like Claude Code) are very bad at pixel counting.
- Do NOT look at a screenshot and try to guess coordinates - the estimates will likely be wrong.
- Note: Gemini models are generally good at pixel coordinate estimation.
- If you need vision-based coordinate detection with Claude, the user must implement a harness using Claude's Computer Use Tool.
- 非计算机使用模式下的Claude模型(如Claude Code)像素计数能力很差。
- 不要通过查看截图猜测坐标——估算结果大概率是错误的。
- 注意:Gemini模型通常擅长像素坐标估算。
- 若需要使用Claude进行基于视觉的坐标检测,用户需要基于Claude的Computer Use Tool实现对应组件。
Recommended workflow when UI Automation fails
UI Automation失败时的推荐工作流
- First, always try (with and without
automate snapshotflag)-i - If element not found, try to find via OCR
locate "text" - Use coordinates from output with
locatemouse click - Never estimate coordinates by looking at screenshots
- 首先始终尝试(带和不带
automate snapshot参数都试一遍)-i - 如果找不到元素,尝试通过OCR查找
locate "text" - 使用输出的坐标执行
locatemouse click - 绝对不要通过查看截图估算坐标