harmonyos-device-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HarmonyOS Device Automation

HarmonyOS设备自动化

CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:
  1. Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
  2. Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
  3. Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex
    act
    commands may need even longer.
Automate HarmonyOS NEXT devices using
npx @midscene/harmony@1
. Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots.
重要规则 — 违反将导致工作流中断:
  1. 切勿在后台运行midscene命令。 每个命令必须同步执行,这样你可以在决定下一步操作前读取它的输出(尤其是截图)。后台执行会破坏「截图-分析-执行」循环。
  2. 每次仅运行一个midscene命令。 等待上一个命令执行完成,读取截图后再决定下一步操作,切勿将多个命令串联执行。
  3. 为每个命令预留足够的执行时间。 Midscene命令涉及AI推理和屏幕交互,耗时比普通shell命令更长。常规命令大约需要1分钟,复杂的
    act
    命令可能需要更久。
使用
npx @midscene/harmony@1
实现HarmonyOS NEXT设备自动化。每个CLI命令直接对应一个MCP工具——你(AI Agent)作为决策中枢,基于截图决定要执行的操作。

Prerequisites

前置条件

Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a
.env
file in the current working directory (Midscene loads
.env
automatically):
bash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
Example: Gemini (Gemini-3-Flash)
bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
Example: Qwen 3.5
bash
MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"
Midscene需要具备强大视觉定位能力的模型。必须配置以下环境变量——可以是系统环境变量,也可以放在当前工作目录的
.env
文件中(Midscene会自动加载
.env
):
bash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
示例:Gemini (Gemini-3-Flash)
bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
示例:通义千问 3.5
bash
MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"

If using OpenRouter, set:

如果使用OpenRouter,请设置:

MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"

MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"

MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"

MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"

MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"

MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"


Example: Doubao Seed 2.0 Lite

```bash
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"
Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.
If the model is not configured, ask the user to set it up. See Model Configuration for supported providers.

示例:豆包Seed 2.0 Lite

```bash
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"
常用模型:豆包Seed 2.0 Lite、通义千问3.5、智谱GLM-4.6V、Gemini-3-Pro、Gemini-3-Flash。
如果未配置模型,请提示用户完成配置。支持的服务商可查看模型配置文档

HDC Setup

HDC配置

HDC (HarmonyOS Device Connector) must be installed and accessible. Common setup:
  • Install via DevEco Studio
  • Or set
    HDC_HOME
    environment variable to point to the HDC directory
Verify HDC is working:
bash
hdc version
hdc list targets
必须安装并可正常访问HDC(HarmonyOS Device Connector)。常规配置方式:
  • 通过DevEco Studio安装
  • 或者设置
    HDC_HOME
    环境变量指向HDC目录
验证HDC是否正常工作:
bash
hdc version
hdc list targets

Commands

命令说明

Connect to Device

连接设备

bash
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 connect --deviceId 0123456789ABCDEF
bash
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 connect --deviceId 0123456789ABCDEF

Take Screenshot

截图

bash
npx @midscene/harmony@1 take_screenshot
After taking a screenshot, read the saved image file to understand the current screen state before deciding the next action.
bash
npx @midscene/harmony@1 take_screenshot
截图完成后,读取保存的图片文件了解当前屏幕状态,再决定下一步操作。

Perform Action

执行操作

Use
act
to interact with the device and get the result. It autonomously handles all UI interactions internally — tapping, typing, scrolling, swiping, waiting, and navigating — so you should give it complex, high-level tasks as a whole rather than breaking them into small steps. Describe what you want to do and the desired effect in natural language:
bash
undefined
使用
act
与设备交互并获取结果。它会在内部自动处理所有UI交互——点击、输入、滚动、滑动、等待、导航——因此你应该直接传入复杂的高阶任务整体,而不要拆分为多个小步骤。用自然语言描述你想要执行的操作和预期效果
bash
undefined

specific instructions

明确指令

npx @midscene/harmony@1 act --prompt "type hello world in the search field and press Enter" npx @midscene/harmony@1 act --prompt "long press the message bubble and tap Delete in the popup menu"
npx @midscene/harmony@1 act --prompt "type hello world in the search field and press Enter" npx @midscene/harmony@1 act --prompt "long press the message bubble and tap Delete in the popup menu"

or target-driven instructions

或者目标导向指令

npx @midscene/harmony@1 act --prompt "open Settings and navigate to Wi-Fi settings, tell me the connected network name"
undefined
npx @midscene/harmony@1 act --prompt "open Settings and navigate to Wi-Fi settings, tell me the connected network name"
undefined

Disconnect

断开连接

bash
npx @midscene/harmony@1 disconnect
bash
npx @midscene/harmony@1 disconnect

Workflow Pattern

工作流模式

Since CLI commands are stateless between invocations, follow this pattern:
  1. Connect to establish a session
  2. Launch the target app and take screenshot to see the current state, make sure the app is launched and visible on the screen.
  3. Execute action using
    act
    to perform the desired action or target-driven instructions.
  4. Disconnect when done
由于CLI命令在两次调用之间是无状态的,请遵循以下模式:
  1. 连接设备建立会话
  2. 启动目标应用并截图确认当前状态,确保应用已启动且在屏幕上可见
  3. 使用
    act
    执行操作完成预期动作或目标导向指令
  4. 操作完成后断开连接

Best Practices

最佳实践

  1. Bring the target app to the foreground before using this skill: For best efficiency, launch the app using HDC (e.g.,
    hdc shell aa start -a EntryAbility -b <bundleName>
    ) before invoking any midscene commands. Then take a screenshot to confirm the app is actually in the foreground. Only after visual confirmation should you proceed with UI automation using this skill. HDC commands are significantly faster than using midscene to navigate to and open apps.
  2. Be specific about UI elements: Instead of vague descriptions, provide clear, specific details. Say
    "the Wi-Fi toggle switch on the right side"
    instead of
    "the toggle"
    .
  3. Describe locations when possible: Help target elements by describing their position (e.g.,
    "the search icon at the top right"
    ,
    "the third item in the list"
    ).
  4. Never run in background: Every midscene command must run synchronously — background execution breaks the screenshot-analyze-act loop.
  5. Batch related operations into a single
    act
    command
    : When performing consecutive operations within the same app, combine them into one
    act
    prompt instead of splitting them into separate commands. For example, "open Settings, tap Wi-Fi, and toggle it on" should be a single
    act
    call, not three. This reduces round-trips, avoids unnecessary screenshot-analyze cycles, and is significantly faster.
  6. Summarize report files after completion: After finishing the automation task, collect and summarize all report files (screenshots, logs, output files, etc.) for the user. Present a clear summary of what was accomplished, what files were generated, and where they are located, making it easy for the user to review the results.
Example — App launch and interaction:
bash
hdc shell aa start -a EntryAbility -b com.huawei.hmos.settings
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 act --prompt "scroll down the settings list and tap About device"
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 disconnect
Example — Form interaction:
bash
npx @midscene/harmony@1 act --prompt "fill in the username field with 'testuser' and the password field with 'pass123', then tap the Login button"
npx @midscene/harmony@1 take_screenshot
  1. 使用本工具前先将目标应用切到前台:为了最高效率,在调用任何midscene命令前,先通过HDC启动应用(例如
    hdc shell aa start -a EntryAbility -b <bundleName>
    ),然后截图确认应用确实在前台,仅在视觉确认后再使用本工具执行UI自动化。HDC命令的执行速度远快于使用midscene导航打开应用。
  2. 明确描述UI元素:不要使用模糊描述,提供清晰具体的细节。比如写
    "右侧的Wi-Fi开关"
    而不是
    "开关"
  3. 尽可能描述位置信息:通过描述位置帮助定位元素(例如
    "右上角的搜索图标"
    "列表中的第三个条目"
    )。
  4. 切勿在后台运行:所有midscene命令必须同步执行——后台执行会破坏「截图-分析-执行」循环。
  5. 将相关操作合并到单个
    act
    命令中
    :在同一个应用内执行连续操作时,将它们合并到一个
    act
    提示中,不要拆分为多个独立命令。例如"打开设置,点击Wi-Fi,打开开关"应该是一次
    act
    调用,而不是三次。这可以减少往返次数,避免不必要的截图-分析循环,大幅提升速度。
  6. 任务完成后汇总报告文件:自动化任务完成后,收集并汇总所有报告文件(截图、日志、输出文件等)给用户,清晰说明完成的操作、生成的文件以及存储位置,方便用户查看结果。
示例——应用启动与交互:
bash
hdc shell aa start -a EntryAbility -b com.huawei.hmos.settings
npx @midscene/harmony@1 connect
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 act --prompt "scroll down the settings list and tap About device"
npx @midscene/harmony@1 take_screenshot
npx @midscene/harmony@1 disconnect
示例——表单交互:
bash
npx @midscene/harmony@1 act --prompt "fill in the username field with 'testuser' and the password field with 'pass123', then tap the Login button"
npx @midscene/harmony@1 take_screenshot

Common HarmonyOS Bundle Names

常用HarmonyOS应用包名

AppBundle Name
Settingscom.huawei.hmos.settings
Cameracom.huawei.hmos.camera
Gallerycom.huawei.hmos.photos
Calendarcom.huawei.hmos.calendar
Clockcom.huawei.hmos.clock
Calculatorcom.huawei.hmos.calculator
Browsercom.huawei.hmos.browser
Weathercom.huawei.hmos.weather
应用包名
设置com.huawei.hmos.settings
相机com.huawei.hmos.camera
图库com.huawei.hmos.photos
日历com.huawei.hmos.calendar
时钟com.huawei.hmos.clock
计算器com.huawei.hmos.calculator
浏览器com.huawei.hmos.browser
天气com.huawei.hmos.weather

Troubleshooting

问题排查

ProblemSolution
HDC not foundInstall via DevEco Studio or set
HDC_HOME
environment variable.
Device not listedCheck USB connection, ensure USB debugging is enabled in Developer Options, and run
hdc list targets
.
Command timeoutThe device screen may be off or locked. Wake the device and unlock it.
API key errorCheck
.env
file contains
MIDSCENE_MODEL_API_KEY=<your-key>
. See Model Configuration.
Wrong device targetedIf multiple devices are connected, use
--deviceId <id>
flag with the
connect
command.
问题解决方案
找不到HDC通过DevEco Studio安装,或者设置
HDC_HOME
环境变量。
设备未列出检查USB连接,确保开发者选项中已开启USB调试,然后运行
hdc list targets
命令超时设备屏幕可能处于熄灭或锁定状态,唤醒设备并解锁。
API密钥错误检查
.env
文件是否包含
MIDSCENE_MODEL_API_KEY=<your-key>
,参考模型配置文档
操作的设备错误如果连接了多台设备,在
connect
命令中使用
--deviceId <id>
参数指定设备。