Loading...
Loading...
Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name.
npx skill4agent add httprunner/skills ai-vision~/.eval/screenshots/~/.agents/skills/ai-vision/cd ~/.agents/skills/ai-vision(cd ~/.agents/skills/ai-vision && npx tsx scripts/ai_vision.ts --help)ARK_BASE_URLhttps://ark.cn-beijing.volces.com/api/v3ARK_API_KEYARK_MODEL_NAME--base-url--api-key--modeldoubao-seed-1-6-vision-250815scripts/ai_vision.tsnpx tsx scripts/ai_vision.ts --helpnpx tsx scripts/ai_vision.ts --log-level debug <command> [flags]--log-jsonnpx tsx scripts/ai_vision.ts query \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "请识别屏幕上的‘搜索’按钮,并返回其坐标"npx tsx scripts/ai_vision.ts assert \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "当前页面包含搜索框"npx tsx scripts/ai_vision.ts plan-next \
--screenshot ~/.eval/screenshots/ui_YYYYMMDD_HHMMSS.png \
--prompt "点击放大镜图标进入搜索页"plan-nextadb shell input tap X Y--log-level debugdoubao-seed-1-8-251228doubao-seed-1-6-vision-250815references/doubao-api.md