mini-browser
Original:🇺🇸 English
Translated
Browser automation skill for AI agents using the mb CLI. Use when the agent needs to browse the web, take screenshots, scrape text, fill forms, click elements, record screencasts, run JS in pages, or audit designs. Triggers on: "browse", "open a page", "take a screenshot", "scrape", "fill form", "click button", "web automation", "record screen", "design audit", "accessibility check".
112installs
Sourcerunablehq/mini-browser
Added on
NPX Install
npx skill4agent add runablehq/mini-browser mini-browserTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →mini-browser (mb) — Browser CLI for Agents
mbSetup (only if not already available)
Setup is only needed when is not installed or Chrome is not reachable.
Run these checks first — if both pass, skip straight to the Command Reference.
mbCheck if ready
bash
# 1. Is mb installed?
which mb && echo "mb: ok" || echo "mb: MISSING"
# 2. Is Chrome listening on CDP?
curl -sf http://127.0.0.1:9222/json/version > /dev/null && echo "chrome: ok" || echo "chrome: NOT RUNNING"If both print "ok", everything is ready — go use commands directly.
mbInstall (only if mb
is missing)
mbbash
npm install -g @runablehq/mini-browserStart Chrome (only if not running)
bash
mb-start-chromeThis launches Chrome with , a fresh profile, and a
1024×768 window. It no-ops if Chrome is already running.
--remote-debugging-port=9222To kill and relaunch:
bash
mb-restart-chromeVerify
bash
mb go "https://example.com" && mb textEnvironment Variables
| Variable | Default | Description |
|---|---|---|
| | CDP port |
| auto-detected | Path to Chrome/Chromium binary |
| | PID file location |
| | Chrome profile directory |
Command Reference
Navigation
| Command | Description |
|---|---|
| Navigate to URL (waits for networkidle) |
| Print current URL |
| Go back |
| Go forward |
Observation
| Command | Description |
|---|---|
| Visible text content (default: body) |
| Screenshot to PNG (default: ./shot.png) |
| List interactive elements with coordinates |
Interaction
| Command | Description |
|---|---|
| Click at coordinates |
| Type text (with coords: selects first) |
| Fill form fields by label/name/placeholder |
| Press keys (Enter, Tab, Meta+a) |
| Hover at coordinates |
| Drag between points |
| Scroll (default: down 500) |
Recording
| Command | Description |
|---|---|
| Start recording (.webm, .mp4, .gif) |
| Stop recording and save |
| Check if recording is active |
Tabs
| Command | Description |
|---|---|
| List open tabs |
| Open new tab, print index |
| Close tab (default: last) |
Other
| Command | Description |
|---|---|
| Run JavaScript in page context |
| Wait for ms / selector / networkidle / url:pattern |
| Design audit (palette, typography, contrast, a11y, SEO) |
| Stream console logs (Ctrl+C to stop) |
Flags
| Flag | Default | Description |
|---|---|---|
| 30000 | Command timeout |
| 0 | Target tab index |
| false | Structured JSON output |
| false | Right-click |
| false | Double-click |
| 30 | Recording frame rate |
| 1 | Recording scale factor |
Usage Patterns
Observe → Act loop
The standard agent loop: snapshot the page, pick an element, act on it.
bash
mb snap # list interactive elements with (x, y)
mb click 512 380 # click the button at those coordinates
mb wait networkidle # wait for the page to settle
mb snap # observe againFill and submit a form
bash
mb go "https://example.com/login"
mb fill "Email=user@example.com" "Password=hunter2"
mb key Enter
mb wait url:/dashboardTake a screenshot
bash
mb shot page.pngExtract text
bash
mb text "main" # text from <main>
mb text "#content" # text from #content
mb text # full body textRun JavaScript
bash
mb js 'document.title'
echo 'document.querySelectorAll("a").length' | mb js -Record a screencast
bash
mb record start demo.mp4 --fps 30 --scale 1
# ... interact with the page ...
mb record stopDesign audit
bash
mb audit # human-readable report
mb audit --json # structured JSON outputDismiss overlays
Cookie banners and modals block clicks. Remove them with JS:
bash
mb js 'document.querySelector("[class*=cookie]")?.remove()'Wait strategies
bash
mb wait 2000 # sleep 2 seconds
mb wait ".modal" # wait for selector to appear
mb wait networkidle # wait for no network activity
mb wait url:/dashboard # wait for URL to contain stringImportant Notes
- Viewport is 1024×768. only returns elements in the current viewport — scroll and snap again to find more.
snap - uses querySelector — returns first match only. Use
textovertext "main"for better results.text "p" - waits for networkidle. For heavy SPAs, follow up with
go.wait ".selector" - with coordinates triple-clicks first to select existing text, then types the replacement.
type - field matching order: aria-label → placeholder → name attr → id → label text → CSS selector (use
fill/#/.prefix).[ - output:
--json→snap,[{role, name, x, y, state}]→tab list,[{index, url, title}]→ JSON lines,logs→ full audit object.audit - Recording state is stored in . Only one recording at a time.
~/.mb-recorder.json - cannot close the last remaining tab.
tab close
Troubleshooting
| Problem | Fix |
|---|---|
| "Chrome not found" | Set |
| Connection refused | Run |
| Stale recording state | Delete |
| Chrome window wrong size | |
| Element not in snap output | |