browser-screenshot
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: Browser Screenshot
技能:Browser Screenshot
Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.
Prerequisite: agent-browser must be installed and Chrome must have remote debugging enabled. Seeif unsure.references/agent-browser-setup.md
截取网页上特定区域的聚焦截图——比如Reddit帖子、推文、文章章节、图表等——而不只是整页截图。
前提条件:必须安装agent-browser,且Chrome需启用远程调试。如有疑问,请查看。references/agent-browser-setup.md
Overview
概述
This skill handles the full pipeline:
- Research the best page to screenshot (web search, fetch)
- Navigate to the right page in the browser
- Locate the target element/region on the page
- Capture a focused, cropped screenshot of just that region
该技能处理完整流程:
- 调研最佳截图页面(网页搜索、获取)
- 在浏览器中导航到正确页面
- 在页面上定位目标元素/区域
- 捕获仅该区域的聚焦裁剪截图
Hard Rule: No Full-Screen Screenshots
硬性规则:禁止全屏截图
NEVER output an uncropped full-viewport or full-page screenshot as a final result. Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.
绝对不要输出未裁剪的全屏或整页截图作为最终结果。 完整截图包含太多无关内容(导航栏、侧边栏、广告、不相关内容),不适合作为文章插图。所有截图都必须裁剪到聚焦区域。
Step 0: Research — Find the Right Page First
步骤0:调研——先找到正确页面
Before opening anything in the browser, figure out which page to screenshot. Use WebSearch and WebFetch tools (not the browser) for this research phase — they're faster and don't require tab management.
在打开浏览器之前,先确定要截图的页面。此调研阶段使用WebSearch和WebFetch工具(而非浏览器)——它们速度更快,无需管理标签页。
Page Selection Strategy
页面选择策略
The right page depends on the context of the article and how recent/notable the subject is:
| Subject Type | Best Page to Find | How to Find It |
|---|---|---|
| New model/feature launch (< 6 months) | Official blog post announcing it | WebSearch |
| Established product (> 6 months) | Product landing page or docs overview | WebSearch |
| Open-source model | HuggingFace model card or GitHub repo | Direct URL: |
| API service | API documentation page | WebSearch |
合适的页面取决于文章的上下文以及主题的时效性/知名度:
| 主题类型 | 最佳查找页面 | 查找方式 |
|---|---|---|
| 新模型/功能发布(<6个月) | 官方发布博客文章 | WebSearch |
| 成熟产品(>6个月) | 产品落地页或文档概览 | WebSearch |
| 开源模型 | HuggingFace模型卡片或GitHub仓库 | 直接URL: |
| API服务 | API文档页面 | WebSearch |
What Makes a Good Screenshot Source
优质截图来源的特征
- Official blog posts are ideal: they have hero images, prominent titles, and concise descriptions designed for sharing
- Product landing pages work well: hero sections with taglines and key features
- HuggingFace model cards are reliable for open-source models: consistent layout, model name + description always at top
- API docs are acceptable fallback: show the product name and key specs
- 官方博客文章是理想选择:它们有首屏图、醒目标题和专为分享设计的简洁描述
- 产品落地页效果很好:包含标语和核心功能的首屏区域
- HuggingFace模型卡片对开源模型很可靠:布局一致,模型名称+描述始终在顶部
- API文档是可接受的备选:展示产品名称和关键规格
Pre-Flight URL Validation
预验证URL
Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:
WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate在浏览器中打开之前,使用WebFetch(轻量级HEAD/GET)验证URL,避免在404或重定向页面上浪费时间:
WebFetch: <候选URL>
→ 检查状态码、标题和内容片段
→ 如果是404或重定向到不相关页面,尝试下一个候选Region Selection Strategy
区域选择策略
Think about what the article reader needs to see in this screenshot:
| Article Context | What to Capture | Target Region |
|---|---|---|
| Introducing a model in a lineup | Model name + key tagline/description | Blog hero section or HF model card header |
| Comparing capabilities | Feature highlights or spec table | Blog section showing specs/features |
| Discussing a specific feature | The feature description | Relevant section heading + 1-2 paragraphs |
| Showing a product/service | Brand identity + value prop | Landing page hero (title + subtitle + visual) |
The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"
思考文章读者需要在截图中看到什么:
| 文章上下文 | 要捕获的内容 | 目标区域 |
|---|---|---|
| 在系列中介绍模型 | 模型名称+关键标语/描述 | 博客首屏区域或HF模型卡片头部 |
| 对比功能 | 功能亮点或规格表格 | 展示规格/功能的博客章节 |
| 讨论特定功能 | 功能描述 | 相关章节标题+1-2段落 |
| 展示产品/服务 | 品牌标识+价值主张 | 落地页首屏(标题+副标题+视觉元素) |
截图应让读者觉得“哦,这就是这个模型/产品”——而不是“我看的是什么?”
Step 1: Navigate to the Target Page
步骤1:导航到目标页面
Always Start by Listing Tabs
始终先列出标签页
bash
agent-browser --auto-connect tab listCheck if the page is already open. Reuse existing tabs — they have login sessions and correct state.
bash
agent-browser --auto-connect tab list检查页面是否已打开。重复使用现有标签页——它们有登录会话和正确状态。
Navigation by Input Type
按输入类型的导航策略
| User Provides | Strategy |
|---|---|
| Direct URL | |
| Search query | |
| Platform + topic | Construct platform search URL (see below) → locate target content |
| Vague description | Google search → evaluate results → navigate to best match |
| 用户提供的内容 | 策略 |
|---|---|
| 直接URL | |
| 搜索查询 | |
| 平台+主题 | 构造平台搜索URL(见下文)→ 定位目标内容 |
| 模糊描述 | Google搜索 → 评估结果 → 导航到最佳匹配页面 |
Platform-Specific Search URLs
平台特定搜索URL
| Platform | Search URL Pattern |
|---|---|
| |
| X / Twitter | |
| |
| Hacker News | |
| GitHub | |
| YouTube | |
| 平台 | 搜索URL模板 |
|---|---|
| |
| X / Twitter | |
| |
| Hacker News | |
| GitHub | |
| YouTube | |
Wait for Page Load
等待页面加载
After navigation, wait for content to settle:
bash
agent-browser --auto-connect wait --load networkidleNote: Some sites (Reddit, X, LinkedIn) never reach. Ifnetworkidlealready shows the page title in its output, skip the wait. Useopenas a safe alternative.wait 2000
导航后,等待内容加载完成:
bash
agent-browser --auto-connect wait --load networkidle注意:部分网站(Reddit、X、LinkedIn)永远不会达到状态。如果networkidle的输出已显示页面标题,则跳过等待。使用open作为安全替代方案。wait 2000
Step 2: Locate the Target Region
步骤2:定位目标区域
This is the critical step. The goal is to find a CSS selector that precisely wraps the content to capture.
这是关键步骤。目标是找到能精准包裹要捕获内容的CSS选择器。
Primary Method: DOM Selector Discovery
主要方法:DOM选择器发现
-
Take an annotated screenshot to understand the page layout:bash
agent-browser --auto-connect screenshot --annotate -
Take a snapshot to see the page's accessibility tree:bash
agent-browser --auto-connect snapshot -i -
Identify the target container element. Look for:
- Semantic HTML containers: ,
<article>,<main><section> - Platform-specific components (see Platform Selectors)
- Data attributes: ,
[data-testid="..."][data-id="..."]
- Semantic HTML containers:
-
Verify withto confirm the element has a reasonable bounding box:
get boxbashagent-browser --auto-connect get box "<selector>"This returns. Sanity-check:{ x, y, width, height }- Width should be > 100px and < viewport width
- Height should be > 50px
- If the box is the entire page, the selector is too broad — refine it
-
If the selector is hard to find, useto explore the DOM:
evalbashagent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
-
拍摄带注释的截图以了解页面布局:bash
agent-browser --auto-connect screenshot --annotate -
生成快照以查看页面的可访问性树:bash
agent-browser --auto-connect snapshot -i -
识别目标容器元素。寻找:
- 语义化HTML容器:、
<article>、<main><section> - 平台特定组件(见平台选择器)
- 数据属性:、
[data-testid="..."][data-id="..."]
- 语义化HTML容器:
-
使用验证以确认元素有合理的边界框:
get boxbashagent-browser --auto-connect get box "<选择器>"它会返回。检查合理性:{ x, y, width, height }- 宽度应>100px且<视口宽度
- 高度应>50px
- 如果边界框是整个页面,说明选择器太宽泛——需细化
-
如果选择器难以找到,使用探索DOM:
evalbashagent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
Platform Selectors
平台选择器
Common container selectors for popular platforms:
| Platform | Target | Typical Selector |
|---|---|---|
| A post | | |
| X / Twitter | A tweet | |
| A feed post | | |
| Hacker News | A story + comments | |
| GitHub | A repo card | |
| YouTube | Video player area | |
| Generic article | Main content | |
These selectors may change over time. Always verify withbefore using.get box
热门平台的常见容器选择器:
| 平台 | 目标 | 典型选择器 |
|---|---|---|
| 帖子 | | |
| X / Twitter | 推文 | |
| 动态帖子 | | |
| Hacker News | 故事+评论 | |
| GitHub | 仓库卡片 | |
| YouTube | 视频播放器区域 | |
| 通用文章 | 主要内容 | |
这些选择器可能会随时间变化。使用前务必用验证。get box
Multiple Matching Elements
多个匹配元素
If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:
bash
undefined如果选择器匹配多个元素(例如时间线上的多条推文),请缩小范围:
bash
undefinedCount matches
统计匹配数量
agent-browser --auto-connect get count "article[data-testid='tweet']"
agent-browser --auto-connect get count "article[data-testid='tweet']"
Use nth-child or :first-of-type, or a more specific selector
使用nth-child或:first-of-type,或更具体的选择器
Or use eval to find the right one by text content:
或使用eval通过文本内容找到正确的元素:
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
const text = posts[i].textContent.substring(0, 80);
console.log(i, text);
}
EOF
Then target a specific one using `:nth-of-type(N)` or a unique parent selector.
---agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
const text = posts[i].textContent.substring(0, 80);
console.log(i, text);
}
EOF
然后使用`:nth-of-type(N)`或更具体的父选择器定位特定元素。
---Step 3: Capture the Focused Screenshot
步骤3:捕获聚焦截图
Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets)
方法A:滚动+视口截图(适用于视口大小的目标)
Best when the target element fits within the viewport.
bash
undefined当目标元素能容纳在视口中时最佳。
bash
undefinedScroll the target into view
将目标滚动到视图中
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500
agent-browser --auto-connect scrollintoview "<选择器>"
agent-browser --auto-connect wait 500
Take viewport screenshot
拍摄视口截图
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png
Then crop using the bounding box (see [Cropping](#cropping)).agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png
然后使用边界框裁剪(见[裁剪](#裁剪))。Method B: Full-Page Screenshot + Crop (For Any Size Target)
方法B:整页截图+裁剪(适用于任意大小的目标)
Best when the target might be larger than the viewport or when precise cropping is needed.
bash
undefined当目标可能大于视口或需要精准裁剪时最佳。
bash
undefinedTake full-page screenshot
拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png
Get the target element's bounding box
获取目标元素的边界框
agent-browser --auto-connect get box "<selector>"
agent-browser --auto-connect get box "<选择器>"
Output: { x: 200, y: 450, width: 680, height: 520 }
输出:{ x: 200, y: 450, width: 680, height: 520 }
Then crop (see [Cropping](#cropping)).
然后裁剪(见[裁剪](#裁剪))。Cropping
裁剪
Use ImageMagick ( on IMv7, is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.
magickconvert使用ImageMagick(IMv7为,已弃用)将截图裁剪到目标区域。添加内边距以获得视觉呼吸空间。
magickconvertRetina Display Handling
Retina显示屏处理
Critical: On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:
-
Detect the scale factor: Compare viewport size vs actual image dimensions:bash
# Check actual image dimensions magick identify /tmp/screenshot.png # → 3456x1880 means 2x scale on a 1728x940 viewport -
Multiplycoordinates by the scale factor before cropping:
get boxbash# get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 } # For 2x Retina, actual image coordinates are: SCALE=2 X=$((200 * SCALE)) Y=$((450 * SCALE)) W=$((680 * SCALE)) H=$((520 * SCALE)) PADDING=$((16 * SCALE))
关键:在macOS Retina显示屏上,截图以2倍分辨率捕获。1728x940的视口会生成3456x1880的图像。你必须考虑这一点:
-
检测缩放比例:对比视口大小与实际图像尺寸:bash
# 检查实际图像尺寸 magick identify /tmp/screenshot.png # → 3456x1880表示在1728x940视口上是2倍缩放 -
裁剪前将坐标乘以缩放比例:
get boxbash# get box返回视口坐标:{ x: 200, y: 450, width: 680, height: 520 } # 对于2倍Retina,实际图像坐标为: SCALE=2 X=$((200 * SCALE)) Y=$((450 * SCALE)) W=$((680 * SCALE)) H=$((520 * SCALE)) PADDING=$((16 * SCALE))
Crop Command
裁剪命令
bash
magick /tmp/browser-screenshot-full.png \
-crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
+repage \
<output-path>.pngImportant:returns floating-point values. Round them to integers before passing to ImageMagick.get box
Padding: Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.
bash
magick /tmp/browser-screenshot-full.png \
-crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
+repage \
<输出路径>.png重要:返回浮点值。传递给ImageMagick前需四舍五入为整数。get box
内边距:使用12–20px(视口像素)。如果目标有明显的视觉边界(卡片、带边框的盒子),增加到约30px。如果用户需要紧凑裁剪,使用0。
Output Path
输出路径
- If the user specifies an output path, use that
- Otherwise, save to a descriptive name in the current directory, e.g., ,
reddit-post-screenshot.pngtweet-screenshot.png
- 如果用户指定输出路径,使用该路径
- 否则,保存到当前目录的描述性名称,例如、
reddit-post-screenshot.pngtweet-screenshot.png
Step 4: Verify the Result
步骤4:验证结果
After cropping, read the output image to verify it captured the right content:
bash
undefined裁剪后,读取输出图像以验证是否捕获了正确内容:
bash
undefinedUse the Read tool to visually inspect the cropped screenshot
使用Read工具视觉检查裁剪后的截图
If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.
---
如果裁剪错误(遗漏内容、空白过多、错误元素),调整选择器或边界框并重试。
---Fallback: Visual Highlight Confirmation
备选方案:视觉高亮确认
When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use JS-injected highlighting to visually confirm before cropping.
当基于DOM的定位不确定时——选择器可能错误、存在多个候选元素或目标不明确——使用注入JS高亮在裁剪前视觉确认。
How It Works
工作方式
-
Inject a highlight border on the candidate element:bash
agent-browser --auto-connect eval --stdin <<'EOF' (function() { const el = document.querySelector('<selector>'); if (!el) { console.log('NOT_FOUND'); return; } el.style.outline = '4px solid red'; el.style.outlineOffset = '2px'; el.scrollIntoView({ block: 'center' }); })(); EOF -
Take a screenshot and visually inspect:bash
agent-browser --auto-connect screenshot /tmp/highlight-check.pngRead the screenshot to check if the red border surrounds the correct content. -
If correct, remove the highlight and proceed with cropping:bash
agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';" -
If wrong, try the next candidate or refine the selector, re-highlight, and re-check.
-
在候选元素上注入高亮边框:bash
agent-browser --auto-connect eval --stdin <<'EOF' (function() { const el = document.querySelector('<选择器>'); if (!el) { console.log('NOT_FOUND'); return; } el.style.outline = '4px solid red'; el.style.outlineOffset = '2px'; el.scrollIntoView({ block: 'center' }); })(); EOF -
拍摄截图并视觉检查:bash
agent-browser --auto-connect screenshot /tmp/highlight-check.png读取截图以检查红色边框是否包围正确内容。 -
如果正确,移除高亮并继续裁剪:bash
agent-browser --auto-connect eval "document.querySelector('<选择器>').style.outline = ''; document.querySelector('<选择器>').style.outlineOffset = '';" -
如果错误,尝试下一个候选元素或细化选择器,重新高亮并检查。
When to Use This Fallback
何时使用此备选方案
- The page has complex/nested components and you're not sure which container is right
- Multiple similar elements exist and you need to pick the correct one
- The user's description is vague ("that chart in the middle of the page")
- The result looks suspicious (too large, too small, zero-sized)
get box
- 页面有复杂/嵌套组件,不确定哪个容器正确
- 存在多个相似元素,需要选择正确的那个
- 用户描述模糊(“页面中间的那个图表”)
- 结果可疑(太大、太小、零尺寸)
get box
Page Preparation: Clean Up Before Capture
页面准备:捕获前清理
Before taking the final screenshot, clean up the page for a better result:
bash
undefined拍摄最终截图前,清理页面以获得更好结果:
bash
undefinedDismiss cookie banners, popups, overlays
关闭cookie横幅、弹窗、覆盖层
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
// Common cookie/popup selectors
const selectors = [
'[class*="cookie"] button',
'[class*="consent"] button',
'[class*="banner"] [class*="close"]',
'[class*="modal"] [class*="close"]',
'[class*="popup"] [class*="close"]',
'[aria-label="Close"]',
'[data-testid="close"]'
];
selectors.forEach(sel => {
document.querySelectorAll(sel).forEach(el => {
if (el.offsetParent !== null) el.click();
});
});
// Hide fixed/sticky elements that overlay content (nav bars, banners)
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
el.style.display = 'none';
}
});
})();
EOF
> **Use with caution**: Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
// 常见cookie/弹窗选择器
const selectors = [
'[class*="cookie"] button',
'[class*="consent"] button',
'[class*="banner"] [class*="close"]',
'[class*="modal"] [class*="close"]',
'[class*="popup"] [class*="close"]',
'[aria-label="Close"]',
'[data-testid="close"]'
];
selectors.forEach(sel => {
document.querySelectorAll(sel).forEach(el => {
if (el.offsetParent !== null) el.click();
});
});
// 隐藏覆盖内容的固定/粘性元素(导航栏、横幅)
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
el.style.display = 'none';
}
});
})();
EOF
> **谨慎使用**:隐藏固定元素可能会移除重要上下文。仅当覆盖层明显遮挡目标区域时运行此操作。Cookie Banners That Won't Dismiss
无法关闭的Cookie横幅
Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS or . Don't waste time with multiple JS attempts. Instead:
click()remove()- Crop it out — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach.
- Scroll past it — scroll the target content away from the banner area before capturing.
部分cookie同意横幅(例如Jina AI的Usercentrics)位于shadow DOM或iframe中,无法通过JS 或关闭。不要在多次JS尝试上浪费时间。而是:
click()remove()- 裁剪掉——如果横幅在顶部或底部,只需调整裁剪区域以排除它。这是最快最可靠的方法。
- 滚动过去——在捕获前将目标内容滚动到远离横幅的区域。
Viewport Sizing
视口大小设置
For consistent, high-quality screenshots, set the viewport before capturing:
bash
undefined为获得一致、高质量的截图,在捕获前设置视口:
bash
undefinedStandard desktop viewport
标准桌面视口
agent-browser --auto-connect set viewport 1280 800
agent-browser --auto-connect set viewport 1280 800
Wider for dashboard/data-heavy pages
更宽的视口(适用于仪表盘/数据密集型页面)
agent-browser --auto-connect set viewport 1440 900
agent-browser --auto-connect set viewport 1440 900
Narrower for mobile-like content (social media posts)
更窄的视口(适用于类移动端内容,如社交媒体帖子)
agent-browser --auto-connect set viewport 800 600
Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.
---agent-browser --auto-connect set viewport 800 600
选择能让目标内容清晰渲染的视口宽度——不要太拥挤,也不要太拉伸。
---Complete Example: Screenshot a Reddit Post
完整示例:截取Reddit帖子
User: "Screenshot the top post on r/programming"
bash
undefined用户:“截取r/programming上的置顶帖子”
bash
undefined1. List existing tabs
1. 列出现有标签页
agent-browser --auto-connect tab list
agent-browser --auto-connect tab list
2. Navigate to subreddit
2. 导航到子版块
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000
3. Find the first post container
3. 找到第一个帖子容器
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"
4. Scroll it into view
4. 将其滚动到视图中
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500
5. Get bounding box
5. 获取边界框
agent-browser --auto-connect get box "shreddit-post"
agent-browser --auto-connect get box "shreddit-post"
→ { x: 312, y: 80, width: 656, height: 420 }
→ { x: 312, y: 80, width: 656, height: 420 }
6. Take full-page screenshot
6. 拍摄整页截图
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png
7. Crop with padding
7. 添加内边距裁剪
convert /tmp/reddit-raw.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png
convert /tmp/reddit-raw.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png
-crop 688x452+296+64 +repage
reddit-post-screenshot.png
8. Verify by reading the output image
8. 通过读取输出图像验证
---
---Key Commands Quick Reference
关键命令快速参考
| Command | Purpose |
|---|---|
| List open tabs |
| Navigate to URL |
| Wait for content to settle |
| See interactive elements |
| Visual overview with labels |
| Full-page screenshot |
| Get element bounding box |
| Scroll element into view |
| Run JavaScript in page |
| Set viewport dimensions |
| 命令 | 用途 |
|---|---|
| 列出打开的标签页 |
| 导航到URL |
| 等待内容加载完成 |
| 查看交互元素 |
| 带标签的视觉概览 |
| 整页截图 |
| 获取元素边界框 |
| 将元素滚动到视图中 |
| 在页面中运行JavaScript |
| 设置视口尺寸 |
Troubleshooting
故障排除
get box
returns null or zero-sized
get boxget box
返回null或零尺寸
get box- The selector doesn't match any element. Use to verify.
get count "<selector>" - The element may be hidden or not yet rendered. Try and retry.
wait 2000
- 选择器不匹配任何元素。使用验证。
get count "<选择器>" - 元素可能隐藏或尚未渲染。尝试并重试。
wait 2000
Cropped image is blank or wrong area
裁剪后的图像空白或区域错误
- The full-page screenshot coordinates may differ from viewport coordinates. Use with
screenshot --full(they use the same coordinate system).get box - Check if the page has horizontal scroll — x values may be offset.
get box
- 整页截图坐标可能与视口坐标不同。将与
screenshot --full配合使用(它们使用相同的坐标系)。get box - 检查页面是否有水平滚动——的x值可能偏移。
get box
Target element is inside an iframe
目标元素在iframe内
- and
get boxcannot see inside iframes.snapshot -i - Use to access iframe content:
evalNote: Only works for same-origin iframes.bashagent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"
- 和
get box无法看到iframe内部。snapshot -i - 使用访问iframe内容:
eval注意:仅适用于同源iframe。bashagent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<选择器>').getBoundingClientRect()"
open
succeeded but page content is wrong
openopen
成功但页面内容错误
open- The browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation:
bash
agent-browser --auto-connect eval "document.location.href" - If the URL is wrong, use to find the correct tab and
tab listto switch.tab goto <N>
- 浏览器可能切换到了其他标签页(例如弹窗或重定向打开了新标签页)。导航后始终验证:
bash
agent-browser --auto-connect eval "document.location.href" - 如果URL错误,使用找到正确标签页,然后使用
tab list切换。tab goto <N>
Screenshot command times out on fonts
截图命令因字体超时
- Some pages (e.g., Google developer docs) hang on . Force-resolve it first:
document.fonts.readyThen retry the screenshot.bashagent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
- 部分页面(例如Google开发者文档)会在处挂起。先强制解析:
document.fonts.ready然后重试截图。bashagent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
Page has lazy-loaded content
页面有懒加载内容
- Scroll down to trigger loading before taking the screenshot:
bash
agent-browser --auto-connect scroll down 1000 agent-browser --auto-connect wait 1500 agent-browser --auto-connect scroll up 1000
- 在截图前向下滚动触发加载:
bash
agent-browser --auto-connect scroll down 1000 agent-browser --auto-connect wait 1500 agent-browser --auto-connect scroll up 1000