harmonyos-device-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHarmonyOS Device Automation
HarmonyOS设备自动化
CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:
- Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
- Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
- Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex
commands may need even longer.act
Automate HarmonyOS NEXT devices using . Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots.
npx @midscene/harmony@1重要规则 — 违反将导致工作流中断:
- 切勿在后台运行midscene命令。 每个命令必须同步执行,这样你可以在决定下一步操作前读取它的输出(尤其是截图)。后台执行会破坏「截图-分析-执行」循环。
- 每次仅运行一个midscene命令。 等待上一个命令执行完成,读取截图后再决定下一步操作,切勿将多个命令串联执行。
- 为每个命令预留足够的执行时间。 Midscene命令涉及AI推理和屏幕交互,耗时比普通shell命令更长。常规命令大约需要1分钟,复杂的
命令可能需要更久。act
使用实现HarmonyOS NEXT设备自动化。每个CLI命令直接对应一个MCP工具——你(AI Agent)作为决策中枢,基于截图决定要执行的操作。
npx @midscene/harmony@1Prerequisites
前置条件
Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a file in the current working directory (Midscene loads automatically):
.env.envbash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"Example: Gemini (Gemini-3-Flash)
bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"Example: Qwen 3.5
bash
MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"Midscene需要具备强大视觉定位能力的模型。必须配置以下环境变量——可以是系统环境变量,也可以放在当前工作目录的文件中(Midscene会自动加载):
.env.envbash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"示例:Gemini (Gemini-3-Flash)
bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"示例:通义千问 3.5
bash
MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"If using OpenRouter, set:
如果使用OpenRouter,请设置:
MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"
MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"
MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"
Example: Doubao Seed 2.0 Lite
```bash
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.
If the model is not configured, ask the user to set it up. See Model Configuration for supported providers.
示例:豆包Seed 2.0 Lite
```bash
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"常用模型:豆包Seed 2.0 Lite、通义千问3.5、智谱GLM-4.6V、Gemini-3-Pro、Gemini-3-Flash。
如果未配置模型,请提示用户完成配置。支持的服务商可查看模型配置文档。
HDC Setup
HDC配置
HDC (HarmonyOS Device Connector) must be installed and accessible. Common setup:
- Install via DevEco Studio
- Or set environment variable to point to the HDC directory
HDC_HOME
Verify HDC is working:
bash
hdc version
hdc list targets必须安装并可正常访问HDC(HarmonyOS Device Connector)。常规配置方式:
- 通过DevEco Studio安装
- 或者设置环境变量指向HDC目录
HDC_HOME
验证HDC是否正常工作:
bash
hdc version
hdc list targetsCommands
命令说明
Connect to Device
连接设备
bash
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 connect --deviceId 0123456789ABCDEFbash
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 connect --deviceId 0123456789ABCDEFTake Screenshot
截图
bash
npx @midscene/harmony@1 take_screenshotAfter taking a screenshot, read the saved image file to understand the current screen state before deciding the next action.
bash
npx @midscene/harmony@1 take_screenshot截图完成后,读取保存的图片文件了解当前屏幕状态,再决定下一步操作。
Perform Action
执行操作
Use to interact with the device and get the result. It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating — so you should give it complex, high-level tasks as a whole rather than breaking them into small steps. Describe what you want to do and the desired effect in natural language:
actbash
undefined使用与设备交互并获取结果。它会在内部自动处理所有UI交互——点击、输入、滚动、滑动、等待、导航——因此你应该直接传入复杂的高阶任务整体,而不要拆分为多个小步骤。用自然语言描述你想要执行的操作和预期效果:
actbash
undefinedspecific instructions
明确指令
npx @midscene/harmony@1 act --prompt "type hello world in the search field and press Enter"
npx @midscene/harmony@1 act --prompt "long press the message bubble and tap Delete in the popup menu"
npx @midscene/harmony@1 act --prompt "type hello world in the search field and press Enter"
npx @midscene/harmony@1 act --prompt "long press the message bubble and tap Delete in the popup menu"
or target-driven instructions
或者目标导向指令
npx @midscene/harmony@1 act --prompt "open Settings and navigate to Wi-Fi settings, tell me the connected network name"
undefinednpx @midscene/harmony@1 act --prompt "open Settings and navigate to Wi-Fi settings, tell me the connected network name"
undefinedDisconnect
断开连接
bash
npx @midscene/harmony@1 disconnectbash
npx @midscene/harmony@1 disconnectWorkflow Pattern
工作流模式
Since CLI commands are stateless between invocations, follow this pattern:
- Connect to establish a session
- Launch the target app and take screenshot to see the current state, make sure the app is launched and visible on the screen.
- Execute action using to perform the desired action or target-driven instructions.
act - Disconnect when done
由于CLI命令在两次调用之间是无状态的,请遵循以下模式:
- 连接设备建立会话
- 启动目标应用并截图确认当前状态,确保应用已启动且在屏幕上可见
- 使用执行操作完成预期动作或目标导向指令
act - 操作完成后断开连接
Best Practices
最佳实践
- Bring the target app to the foreground before using this skill: For best efficiency, launch the app using HDC (e.g., ) before invoking any midscene commands. Then take a screenshot to confirm the app is actually in the foreground. Only after visual confirmation should you proceed with UI automation using this skill. HDC commands are significantly faster than using midscene to navigate to and open apps.
hdc shell aa start -a EntryAbility -b <bundleName> - Be specific about UI elements: Instead of vague descriptions, provide clear, specific details. Say instead of
"the Wi-Fi toggle switch on the right side"."the toggle" - Describe locations when possible: Help target elements by describing their position (e.g., ,
"the search icon at the top right")."the third item in the list" - Never run in background: Every midscene command must run synchronously — background execution breaks the screenshot-analyze-act loop.
- Batch related operations into a single command: When performing consecutive operations within the same app, combine them into one
actprompt instead of splitting them into separate commands. For example, "open Settings, tap Wi-Fi, and toggle it on" should be a singleactcall, not three. This reduces round-trips, avoids unnecessary screenshot-analyze cycles, and is significantly faster.act - Summarize report files after completion: After finishing the automation task, collect and summarize all report files (screenshots, logs, output files, etc.) for the user. Present a clear summary of what was accomplished, what files were generated, and where they are located, making it easy for the user to review the results.
Example — App launch and interaction:
bash
hdc shell aa start -a EntryAbility -b com.huawei.hmos.settings
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 act --prompt "scroll down the settings list and tap About device"
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 disconnectExample — Form interaction:
bash
npx @midscene/harmony@1 act --prompt "fill in the username field with 'testuser' and the password field with 'pass123', then tap the Login button"
npx @midscene/harmony@1 take_screenshot- 使用本工具前先将目标应用切到前台:为了最高效率,在调用任何midscene命令前,先通过HDC启动应用(例如),然后截图确认应用确实在前台,仅在视觉确认后再使用本工具执行UI自动化。HDC命令的执行速度远快于使用midscene导航打开应用。
hdc shell aa start -a EntryAbility -b <bundleName> - 明确描述UI元素:不要使用模糊描述,提供清晰具体的细节。比如写而不是
"右侧的Wi-Fi开关"。"开关" - 尽可能描述位置信息:通过描述位置帮助定位元素(例如、
"右上角的搜索图标")。"列表中的第三个条目" - 切勿在后台运行:所有midscene命令必须同步执行——后台执行会破坏「截图-分析-执行」循环。
- 将相关操作合并到单个命令中:在同一个应用内执行连续操作时,将它们合并到一个
act提示中,不要拆分为多个独立命令。例如"打开设置,点击Wi-Fi,打开开关"应该是一次act调用,而不是三次。这可以减少往返次数,避免不必要的截图-分析循环,大幅提升速度。act - 任务完成后汇总报告文件:自动化任务完成后,收集并汇总所有报告文件(截图、日志、输出文件等)给用户,清晰说明完成的操作、生成的文件以及存储位置,方便用户查看结果。
示例——应用启动与交互:
bash
hdc shell aa start -a EntryAbility -b com.huawei.hmos.settings
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 act --prompt "scroll down the settings list and tap About device"
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 disconnect示例——表单交互:
bash
npx @midscene/harmony@1 act --prompt "fill in the username field with 'testuser' and the password field with 'pass123', then tap the Login button"
npx @midscene/harmony@1 take_screenshotCommon HarmonyOS Bundle Names
常用HarmonyOS应用包名
| App | Bundle Name |
|---|---|
| Settings | com.huawei.hmos.settings |
| Camera | com.huawei.hmos.camera |
| Gallery | com.huawei.hmos.photos |
| Calendar | com.huawei.hmos.calendar |
| Clock | com.huawei.hmos.clock |
| Calculator | com.huawei.hmos.calculator |
| Browser | com.huawei.hmos.browser |
| Weather | com.huawei.hmos.weather |
| 应用 | 包名 |
|---|---|
| 设置 | com.huawei.hmos.settings |
| 相机 | com.huawei.hmos.camera |
| 图库 | com.huawei.hmos.photos |
| 日历 | com.huawei.hmos.calendar |
| 时钟 | com.huawei.hmos.clock |
| 计算器 | com.huawei.hmos.calculator |
| 浏览器 | com.huawei.hmos.browser |
| 天气 | com.huawei.hmos.weather |
Troubleshooting
问题排查
| Problem | Solution |
|---|---|
| HDC not found | Install via DevEco Studio or set |
| Device not listed | Check USB connection, ensure USB debugging is enabled in Developer Options, and run |
| Command timeout | The device screen may be off or locked. Wake the device and unlock it. |
| API key error | Check |
| Wrong device targeted | If multiple devices are connected, use |
| 问题 | 解决方案 |
|---|---|
| 找不到HDC | 通过DevEco Studio安装,或者设置 |
| 设备未列出 | 检查USB连接,确保开发者选项中已开启USB调试,然后运行 |
| 命令超时 | 设备屏幕可能处于熄灭或锁定状态,唤醒设备并解锁。 |
| API密钥错误 | 检查 |
| 操作的设备错误 | 如果连接了多台设备,在 |