agent-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agentic Browser

Agent智能浏览器

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple
@e
ref system for element interaction.
Agentic Browser
通过inference.sh为AI Agent实现浏览器自动化。底层基于Playwright,采用简单的
@e
引用系统实现元素交互。
Agentic Browser

Quick Start

快速开始

bash
undefined
bash
undefined

Install CLI

安装CLI

curl -fsSL https://cli.inference.sh | sh && infsh login
curl -fsSL https://cli.inference.sh | sh && infsh login

Open a page and get interactive elements

打开页面并获取可交互元素

infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
undefined
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
undefined

Core Workflow

核心工作流程

Every browser automation follows this pattern:
  1. Open - Navigate to URL, get
    @e
    refs for elements
  2. Interact - Use refs to click, fill, drag, etc.
  3. Re-snapshot - After navigation/changes, get fresh refs
  4. Close - End session (returns video if recording)
bash
undefined
所有浏览器自动化都遵循以下流程:
  1. 打开 - 导航至URL,获取元素的
    @e
    引用
  2. 交互 - 使用引用执行点击、填写、拖拽等操作
  3. 重新快照 - 导航或页面变更后,获取最新的元素引用
  4. 关闭 - 结束会话(若开启录制则返回视频)
bash
undefined

1. Start session

1. 启动会话

RESULT=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com/login" }') SESSION_ID=$(echo $RESULT | jq -r '.session_id')
RESULT=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com/login" }') SESSION_ID=$(echo $RESULT | jq -r '.session_id')

Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

元素:@e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

2. Fill and submit

2. 填写并提交

infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "user@example.com" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e2", "text": "password123" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e3" }'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "user@example.com" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e2", "text": "password123" }' infsh app run agent-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e3" }'

3. Re-snapshot after navigation

3. 导航后重新快照

infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

4. Close when done

4. 完成后关闭

infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
undefined
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
undefined

Functions

功能函数

FunctionDescription
open
Navigate to URL, configure browser (viewport, proxy, video recording)
snapshot
Re-fetch page state with
@e
refs after DOM changes
interact
Perform actions using
@e
refs (click, fill, drag, upload, etc.)
screenshot
Take page screenshot (viewport or full page)
execute
Run JavaScript code on the page
close
Close session, returns video if recording was enabled
函数描述
open
导航至URL,配置浏览器(视口、代理、视频录制)
snapshot
DOM变更后重新获取页面状态及
@e
引用
interact
使用
@e
引用执行操作(点击、填写、拖拽、上传等)
screenshot
截取页面截图(视口或整页)
execute
在页面中运行JavaScript代码
close
关闭会话,若开启录制则返回视频

Interact Actions

交互操作

ActionDescriptionRequired Fields
click
Click element
ref
dblclick
Double-click element
ref
fill
Clear and type text
ref
,
text
type
Type text (no clear)
text
press
Press key (Enter, Tab, etc.)
text
select
Select dropdown option
ref
,
text
hover
Hover over element
ref
check
Check checkbox
ref
uncheck
Uncheck checkbox
ref
drag
Drag and drop
ref
,
target_ref
upload
Upload file(s)
ref
,
file_paths
scroll
Scroll page
direction
(up/down/left/right),
scroll_amount
back
Go back in history-
wait
Wait milliseconds
wait_ms
goto
Navigate to URL
url
操作描述必填字段
click
点击元素
ref
dblclick
双击元素
ref
fill
清空并输入文本
ref
,
text
type
输入文本(不清空原有内容)
text
press
按下按键(回车、制表符等)
text
select
选择下拉选项
ref
,
text
hover
悬浮在元素上
ref
check
勾选复选框
ref
uncheck
取消勾选复选框
ref
drag
拖拽元素
ref
,
target_ref
upload
上传文件
ref
,
file_paths
scroll
滚动页面
direction
(上/下/左/右),
scroll_amount
back
返回历史页面-
wait
等待指定毫秒数
wait_ms
goto
导航至指定URL
url

Element Refs

元素引用

Elements are returned with
@e
refs:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
Important: Refs are invalidated after navigation. Always re-snapshot after:
  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading
元素会以
@e
引用的形式返回:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
重要提示:导航后引用会失效。在以下操作后务必重新快照:
  • 点击会触发导航的链接/按钮
  • 表单提交
  • 动态内容加载

Features

功能特性

Video Recording

视频录制

Record browser sessions for debugging or documentation:
bash
undefined
录制浏览器会话用于调试或文档记录:
bash
undefined

Start with recording enabled (optionally show cursor indicator)

启动会话并开启录制(可选择显示光标指示器)

SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "record_video": true, "show_cursor": true }' | jq -r '.session_id')
SESSION=$(infsh app run agent-browser --function open --session new --input '{ "url": "https://example.com", "record_video": true, "show_cursor": true }' | jq -r '.session_id')

... perform actions ...

... 执行操作 ...

Close to get the video file

关闭会话以获取视频文件

infsh app run agent-browser --function close --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Returns: {"success": true, "video": <File>}

返回结果:{"success": true, "video": <File>}

undefined
undefined

Cursor Indicator

光标指示器

Show a visible cursor in screenshots and video (useful for demos):
bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'
The cursor appears as a red dot that follows mouse movements and shows click feedback.
在截图和视频中显示可见光标(适用于演示场景):
bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'
光标会显示为一个红点,跟随鼠标移动并在点击时提供反馈。

Proxy Support

代理支持

Route traffic through a proxy server:
bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'
通过代理服务器路由流量:
bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

文件上传

Upload files to file inputs:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'
向文件输入框上传文件:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

拖拽操作

Drag elements to targets:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'
将元素拖拽至目标位置:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

JavaScript执行

Run custom JavaScript:
bash
infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
运行自定义JavaScript代码:
bash
infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'

Returns: {"result": "5", "screenshot": <File>}

返回结果:{"result": "5", "screenshot": <File>}

undefined
undefined

Deep-Dive Documentation

深度文档

ReferenceDescription
references/commands.mdFull function reference with all options
references/snapshot-refs.mdRef lifecycle, invalidation rules, troubleshooting
references/session-management.mdSession persistence, parallel sessions
references/authentication.mdLogin flows, OAuth, 2FA handling
references/video-recording.mdRecording workflows for debugging
references/proxy-support.mdProxy configuration, geo-testing
参考文档描述
references/commands.md包含所有选项的完整函数参考
references/snapshot-refs.md引用生命周期、失效规则及故障排除
references/session-management.md会话持久化、并行会话
references/authentication.md登录流程、OAuth、双因素认证处理
references/video-recording.md用于调试的录制工作流
references/proxy-support.md代理配置、地域测试

Ready-to-Use Templates

即用型模板

TemplateDescription
templates/form-automation.shForm filling with validation
templates/authenticated-session.shLogin once, reuse session
templates/capture-workflow.shContent extraction with screenshots
模板描述
templates/form-automation.sh带验证的表单填写自动化
templates/authenticated-session.sh一次登录,复用会话
templates/capture-workflow.sh带截图的内容提取

Examples

示例

Form Submission

表单提交

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

获取元素:@e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}' infsh app run agent-browser --function close --session $SESSION --input '{}'
undefined
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}' infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}' infsh app run agent-browser --function close --session $SESSION --input '{}'
undefined

Search and Extract

搜索与提取

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

截图与视频录制

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

Take full page screenshot

截取整页截图

infsh app run agent-browser --function screenshot --session $SESSION --input '{ "full_page": true }'
infsh app run agent-browser --function screenshot --session $SESSION --input '{ "full_page": true }'

Close and get video

关闭会话并获取视频

RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}') echo $RESULT | jq '.video'
undefined
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}') echo $RESULT | jq '.video'
undefined

Sessions

会话管理

Browser state persists within a session. Always:
  1. Start with
    --session new
    on first call
  2. Use returned
    session_id
    for subsequent calls
  3. Close session when done
浏览器状态在会话内保持持久化。请务必遵循以下步骤:
  1. 首次调用时使用
    --session new
    启动会话
  2. 后续调用使用返回的
    session_id
  3. 完成操作后关闭会话

Related Skills

相关技能

bash
undefined
bash
undefined

Web search (for research + browse)

网页搜索(用于科研+浏览)

npx skills add inference-sh/skills@web-search
npx skills add inference-sh/skills@web-search

LLM models (analyze extracted content)

LLM模型(用于分析提取的内容)

npx skills add inference-sh/skills@llm-models
undefined
npx skills add inference-sh/skills@llm-models
undefined

Documentation

官方文档