agentic-browser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agentic Browser

Agentic Browser

Browser automation for AI agents via inference.sh.
通过inference.sh为AI Agent实现浏览器自动化。

Quick Start

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login
bash
curl -fsSL https://cli.inference.sh | sh && infsh login

Open a page and get interactive elements

Open a page and get interactive elements

infsh app run agentic-browser --function open --input '{"url": "https://example.com"}' --session new
undefined
infsh app run agentic-browser --function open --input '{"url": "https://example.com"}' --session new
undefined

Core Workflow

核心工作流

Every browser automation follows this pattern:
  1. Open: Navigate to URL, get element refs
  2. Snapshot: Re-fetch elements after DOM changes
  3. Interact: Use
    @e
    refs to click, fill, etc.
  4. Re-snapshot: After navigation, get fresh refs
bash
undefined
所有浏览器自动化都遵循以下模式:
  1. 打开:导航至URL,获取元素引用
  2. 快照:DOM变化后重新获取元素
  3. 交互:使用
    @e
    引用进行点击、填写等操作
  4. 重新快照:导航完成后获取新的引用
bash
undefined

Start session

Start session

RESULT=$(infsh app run agentic-browser --function open --session new --input '{ "url": "https://example.com/login" }') SESSION_ID=$(echo $RESULT | jq -r '.session_id')
RESULT=$(infsh app run agentic-browser --function open --session new --input '{ "url": "https://example.com/login" }') SESSION_ID=$(echo $RESULT | jq -r '.session_id')

Elements returned like: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

Elements returned like: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

Fill form

Fill form

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "user@example.com" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e2", "text": "password123" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "user@example.com" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e2", "text": "password123" }'

Click submit

Click submit

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e3" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e3" }'

Close when done

Close when done

infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
undefined
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
undefined

Functions

功能函数

open

open

Navigate to URL and configure browser. Returns page snapshot with
@e
refs.
bash
infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com",
  "width": 1280,
  "height": 720,
  "user_agent": "Mozilla/5.0..."
}'
Returns:
  • url
    : Current page URL
  • title
    : Page title
  • elements
    : List of interactive elements with
    @e
    refs
  • screenshot
    : Page screenshot (for vision agents)
导航至URL并配置浏览器。返回包含
@e
引用的页面快照。
bash
infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com",
  "width": 1280,
  "height": 720,
  "user_agent": "Mozilla/5.0..."
}'
返回内容:
  • url
    : 当前页面URL
  • title
    : 页面标题
  • elements
    : 包含
    @e
    引用的可交互元素列表
  • screenshot
    : 页面截图(供视觉Agent使用)

snapshot

snapshot

Re-fetch page state after DOM changes. Always call after clicks that navigate.
bash
infsh app run agentic-browser --function snapshot --session $SESSION_ID --input '{}'
DOM变化后重新获取页面状态。点击触发导航的元素后务必调用此函数。
bash
infsh app run agentic-browser --function snapshot --session $SESSION_ID --input '{}'

interact

interact

Interact with elements using
@e
refs from snapshot.
ActionDescriptionRequired Fields
click
Click element
ref
fill
Clear and type text
ref
,
text
type
Type text (no clear)
text
press
Press key
text
(e.g., "Enter")
select
Select dropdown
ref
,
text
hover
Hover over element
ref
scroll
Scroll page
direction
(up/down)
back
Go back in history-
wait
Wait milliseconds
wait_ms
bash
undefined
使用快照中的
@e
引用与元素交互。
操作描述必填字段
click
点击元素
ref
fill
清空并输入文本
ref
,
text
type
输入文本(不清空原有内容)
text
press
按下按键
text
(例如:"Enter")
select
选择下拉选项
ref
,
text
hover
悬停在元素上
ref
scroll
滚动页面
direction
(up/down)
back
返回历史页面-
wait
等待指定毫秒数
wait_ms
bash
undefined

Click

Click

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e5" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "click", "ref": "@e5" }'

Fill input

Fill input

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "hello@example.com" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "fill", "ref": "@e1", "text": "hello@example.com" }'

Press Enter

Press Enter

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "press", "text": "Enter" }'
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "press", "text": "Enter" }'

Scroll down

Scroll down

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "scroll", "direction": "down" }'
undefined
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{ "action": "scroll", "direction": "down" }'
undefined

screenshot

screenshot

Take page screenshot.
bash
infsh app run agentic-browser --function screenshot --session $SESSION_ID --input '{
  "full_page": true
}'
截取页面截图。
bash
infsh app run agentic-browser --function screenshot --session $SESSION_ID --input '{
  "full_page": true
}'

execute

execute

Run JavaScript on the page.
bash
infsh app run agentic-browser --function execute --session $SESSION_ID --input '{
  "code": "document.title"
}'
在页面上运行JavaScript代码。
bash
infsh app run agentic-browser --function execute --session $SESSION_ID --input '{
  "code": "document.title"
}'

close

close

Close browser session.
bash
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
关闭浏览器会话。
bash
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'

Element Refs

元素引用

Elements are returned with
@e
refs like:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
Important: Refs are invalidated after navigation. Always re-snapshot after:
  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading
元素会以
@e
引用的形式返回,例如:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
重要提示:导航后引用会失效。在以下操作后务必重新获取快照:
  • 点击会触发导航的链接/按钮
  • 表单提交
  • 动态内容加载

Examples

示例

Form Submission

表单提交

bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')
bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea] "Message", @e4 [button] "Send"

Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea] "Message", @e4 [button] "Send"

infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

Check result

Check result

infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
undefined
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
undefined

Search and Extract

搜索与提取

bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')
bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

Fill search box and submit

Fill search box and submit

infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}' infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

Get results page

Get results page

infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
undefined
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'
infsh app run agentic-browser --function close --session $SESSION --input '{}'
undefined

Extract Data with JavaScript

使用JavaScript提取数据

bash
infsh app run agentic-browser --function execute --session $SESSION --input '{
  "code": "Array.from(document.querySelectorAll(\"h2\")).map(h => h.textContent)"
}'
bash
infsh app run agentic-browser --function execute --session $SESSION --input '{
  "code": "Array.from(document.querySelectorAll(\"h2\")).map(h => h.textContent)"
}'

Sessions

会话

Browser state persists within a session. Always:
  1. Start with
    --session new
    on first call
  2. Use returned
    session_id
    for subsequent calls
  3. Close session when done
浏览器状态会在会话中持久化。请始终遵循以下步骤:
  1. 首次调用时使用
    --session new
    启动会话
  2. 后续调用使用返回的
    session_id
  3. 使用完成后关闭会话

Related Skills

相关技能

bash
undefined
bash
undefined

Web search (for research + browse)

Web search (for research + browse)

npx skills add inference-sh/skills@web-search
npx skills add inference-sh/skills@web-search

LLM models (analyze extracted content)

LLM models (analyze extracted content)

npx skills add inference-sh/skills@llm-models
undefined
npx skills add inference-sh/skills@llm-models
undefined

Documentation

文档