agent-browser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgentic Browser
Agent智能浏览器
Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple ref system for element interaction.
@e
通过inference.sh为AI Agent实现浏览器自动化。底层基于Playwright,采用简单的引用系统实现元素交互。
@e
Quick Start
快速开始
bash
undefinedbash
undefinedInstall CLI
安装CLI
curl -fsSL https://cli.inference.sh | sh && infsh login
curl -fsSL https://cli.inference.sh | sh && infsh login
Open a page and get interactive elements
打开页面并获取可交互元素
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
undefinedinfsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
undefinedCore Workflow
核心工作流程
Every browser automation follows this pattern:
- Open - Navigate to URL, get refs for elements
@e - Interact - Use refs to click, fill, drag, etc.
- Re-snapshot - After navigation/changes, get fresh refs
- Close - End session (returns video if recording)
bash
undefined所有浏览器自动化都遵循以下流程:
- 打开 - 导航至URL,获取元素的引用
@e - 交互 - 使用引用执行点击、填写、拖拽等操作
- 重新快照 - 导航或页面变更后,获取最新的元素引用
- 关闭 - 结束会话(若开启录制则返回视频)
bash
undefined1. Start session
1. 启动会话
RESULT=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
RESULT=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
元素:@e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
2. Fill and submit
2. 填写并提交
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
3. Re-snapshot after navigation
3. 导航后重新快照
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
4. Close when done
4. 完成后关闭
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
undefinedinfsh app run agent-browser --function close --session $SESSION_ID --input '{}'
undefinedFunctions
功能函数
| Function | Description |
|---|---|
| Navigate to URL, configure browser (viewport, proxy, video recording) |
| Re-fetch page state with |
| Perform actions using |
| Take page screenshot (viewport or full page) |
| Run JavaScript code on the page |
| Close session, returns video if recording was enabled |
| 函数 | 描述 |
|---|---|
| 导航至URL,配置浏览器(视口、代理、视频录制) |
| DOM变更后重新获取页面状态及 |
| 使用 |
| 截取页面截图(视口或整页) |
| 在页面中运行JavaScript代码 |
| 关闭会话,若开启录制则返回视频 |
Interact Actions
交互操作
| Action | Description | Required Fields |
|---|---|---|
| Click element | |
| Double-click element | |
| Clear and type text | |
| Type text (no clear) | |
| Press key (Enter, Tab, etc.) | |
| Select dropdown option | |
| Hover over element | |
| Check checkbox | |
| Uncheck checkbox | |
| Drag and drop | |
| Upload file(s) | |
| Scroll page | |
| Go back in history | - |
| Wait milliseconds | |
| Navigate to URL | |
| 操作 | 描述 | 必填字段 |
|---|---|---|
| 点击元素 | |
| 双击元素 | |
| 清空并输入文本 | |
| 输入文本(不清空原有内容) | |
| 按下按键(回车、制表符等) | |
| 选择下拉选项 | |
| 悬浮在元素上 | |
| 勾选复选框 | |
| 取消勾选复选框 | |
| 拖拽元素 | |
| 上传文件 | |
| 滚动页面 | |
| 返回历史页面 | - |
| 等待指定毫秒数 | |
| 导航至指定URL | |
Element Refs
元素引用
Elements are returned with refs:
@e@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"Important: Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading
元素会以引用的形式返回:
@e@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"重要提示:导航后引用会失效。在以下操作后务必重新快照:
- 点击会触发导航的链接/按钮
- 表单提交
- 动态内容加载
Features
功能特性
Video Recording
视频录制
Record browser sessions for debugging or documentation:
bash
undefined录制浏览器会话用于调试或文档记录:
bash
undefinedStart with recording enabled (optionally show cursor indicator)
启动会话并开启录制(可选择显示光标指示器)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
... perform actions ...
... 执行操作 ...
Close to get the video file
关闭会话以获取视频文件
infsh app run agent-browser --function close --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
Returns: {"success": true, "video": <File>}
返回结果:{"success": true, "video": <File>}
undefinedundefinedCursor Indicator
光标指示器
Show a visible cursor in screenshots and video (useful for demos):
bash
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'The cursor appears as a red dot that follows mouse movements and shows click feedback.
在截图和视频中显示可见光标(适用于演示场景):
bash
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'光标会显示为一个红点,跟随鼠标移动并在点击时提供反馈。
Proxy Support
代理支持
Route traffic through a proxy server:
bash
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'通过代理服务器路由流量:
bash
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'File Upload
文件上传
Upload files to file inputs:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'向文件输入框上传文件:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'Drag and Drop
拖拽操作
Drag elements to targets:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'将元素拖拽至目标位置:
bash
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'JavaScript Execution
JavaScript执行
Run custom JavaScript:
bash
infsh app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'运行自定义JavaScript代码:
bash
infsh app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'Returns: {"result": "5", "screenshot": <File>}
返回结果:{"result": "5", "screenshot": <File>}
undefinedundefinedDeep-Dive Documentation
深度文档
| Reference | Description |
|---|---|
| references/commands.md | Full function reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Session persistence, parallel sessions |
| references/authentication.md | Login flows, OAuth, 2FA handling |
| references/video-recording.md | Recording workflows for debugging |
| references/proxy-support.md | Proxy configuration, geo-testing |
| 参考文档 | 描述 |
|---|---|
| references/commands.md | 包含所有选项的完整函数参考 |
| references/snapshot-refs.md | 引用生命周期、失效规则及故障排除 |
| references/session-management.md | 会话持久化、并行会话 |
| references/authentication.md | 登录流程、OAuth、双因素认证处理 |
| references/video-recording.md | 用于调试的录制工作流 |
| references/proxy-support.md | 代理配置、地域测试 |
Ready-to-Use Templates
即用型模板
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse session |
| templates/capture-workflow.sh | Content extraction with screenshots |
| 模板 | 描述 |
|---|---|
| templates/form-automation.sh | 带验证的表单填写自动化 |
| templates/authenticated-session.sh | 一次登录,复用会话 |
| templates/capture-workflow.sh | 带截图的内容提取 |
Examples
示例
Form Submission
表单提交
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
获取元素:@e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
undefinedinfsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
undefinedSearch and Extract
搜索与提取
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'Screenshot with Video
截图与视频录制
bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')Take full page screenshot
截取整页截图
infsh app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
infsh app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
Close and get video
关闭会话并获取视频
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
undefinedRESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
undefinedSessions
会话管理
Browser state persists within a session. Always:
- Start with on first call
--session new - Use returned for subsequent calls
session_id - Close session when done
浏览器状态在会话内保持持久化。请务必遵循以下步骤:
- 首次调用时使用启动会话
--session new - 后续调用使用返回的
session_id - 完成操作后关闭会话
Related Skills
相关技能
bash
undefinedbash
undefinedWeb search (for research + browse)
网页搜索(用于科研+浏览)
npx skills add inference-sh/skills@web-search
npx skills add inference-sh/skills@web-search
LLM models (analyze extracted content)
LLM模型(用于分析提取的内容)
npx skills add inference-sh/skills@llm-models
undefinednpx skills add inference-sh/skills@llm-models
undefinedDocumentation
官方文档
- inference.sh Sessions - Session management
- Multi-function Apps - How functions work
- inference.sh 会话管理 - 会话管理说明
- 多功能应用 - 函数工作原理