github-upload-image-to-pr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUpload Image to PR
向PR上传图片
Upload local images to a GitHub PR and embed them in the description or comments using browser automation tools.
通过浏览器自动化工具将本地图片上传至GitHub PR,并嵌入到描述或评论中。
How It Works
工作原理
Since the GitHub API does not support direct image uploads, this skill uses the PR comment textarea as a staging area for GitHub's image hosting — uploading files there to obtain persistent URLs, then updating the PR description or posting a comment via the CLI.
user-attachments/assets/gh由于GitHub API不支持直接上传图片,此技能会利用PR评论输入框作为GitHub图片托管的临时区域——在此处上传文件以获取持久的格式URL,随后通过 CLI更新PR描述或发布评论。
user-attachments/assets/ghStep 0: Resolve PR context
步骤0:解析PR上下文
If the user didn't specify a PR number or URL, auto-detect it:
bash
undefined如果用户未指定PR编号或URL,自动检测:
bash
undefinedGet PR number from the current branch
从当前分支获取PR编号
gh pr view --json number,url -q '"(.number) (.url)"'
If multiple repos or branches are involved, confirm with the user which PR to target.
Also, normalize the image paths to absolute paths. If a path contains special characters (e.g., Unicode narrow spaces from CleanShot X), copy the file to `/tmp/` first:
```bashgh pr view --json number,url -q '"(.number) (.url)"'
如果涉及多个仓库或分支,请与用户确认目标PR。
同时,将图片路径转换为绝对路径。如果路径包含特殊字符(例如CleanShot X生成的Unicode窄空格),请先将文件复制到`/tmp/`目录:
```bashe.g., to handle glob-matched paths with special chars
示例:处理包含特殊字符的通配符匹配路径
cp /path/to/CleanShotkeyword.png /tmp/screenshot.png
undefinedcp /path/to/CleanShotkeyword.png /tmp/screenshot.png
undefinedTool Detection and Selection
工具检测与选择
Priority Order
优先级顺序
- Playwright MCP (MCP connection, ) — connects to existing browser, login state preserved
mcp__playwright__* - Chrome DevTools MCP (MCP connection, ) — connects to existing browser, login state preserved
mcp__chrome-devtools__* - agent-browser (CLI via Bash — fallback, login state preserved with )
--profile
MCP-based tools connect to an already-running browser instance, so GitHub login state is automatically preserved. agent-browser can persist login state using .
--profile ~/.agent-browser-github- Playwright MCP(MCP连接,)——连接到已运行的浏览器,保留登录状态
mcp__playwright__* - Chrome DevTools MCP(MCP连接,)——连接到已运行的浏览器,保留登录状态
mcp__chrome-devtools__* - agent-browser(通过Bash调用CLI——备选方案,使用保留登录状态)
--profile
基于MCP的工具会连接到已运行的浏览器实例,因此GitHub登录状态会自动保留。agent-browser可通过持久化登录状态。
--profile ~/.agent-browser-githubDetection
检测方式
undefinedundefined1. Search for MCP-based browser tools (preferred)
1. 优先搜索基于MCP的浏览器工具
ToolSearch: "browser navigate upload"
ToolSearch: "browser navigate upload"
2. Fall back to agent-browser only if no MCP tools found
2. 仅在未找到MCP工具时使用agent-browser作为备选
Bash: agent-browser --version
undefinedBash: agent-browser --version
undefinedTool Compatibility Matrix
工具兼容性矩阵
| Operation | Playwright MCP | Chrome DevTools MCP | agent-browser (CLI/Bash) |
|---|---|---|---|
| Navigate | | | |
| Snapshot | | | |
| Screenshot | | | |
| Click | | | |
| File Upload | | | |
| JS Eval | | | |
| Login State | Preserved | Preserved | Preserved with |
| 操作 | Playwright MCP | Chrome DevTools MCP | agent-browser (CLI/Bash) |
|---|---|---|---|
| 页面导航 | | | |
| 页面快照 | | | |
| 截图 | | | |
| 点击操作 | | | |
| 文件上传 | | | |
| JS执行 | | | |
| 登录状态 | 已保留 | 已保留 | 使用--profile时保留 |
Steps
具体步骤
Step 1: Navigate to PR page and check login state
步骤1:导航至PR页面并检查登录状态
Navigate to the PR page and immediately take a snapshot to verify login state.
javascript
// Playwright MCP
browser_navigate({ url: "https://github.com/{owner}/{repo}/pull/{number}" })
// Chrome DevTools MCP
navigate_page({ url: "https://github.com/{owner}/{repo}/pull/{number}", type: "url" })
// agent-browser (use --profile to persist login state)
agent-browser --headed --profile ~/.agent-browser-github open "https://github.com/{owner}/{repo}/pull/{number}"If SSO authentication screen appears: Take a snapshot, locate the "Continue" button, and click it.
If NOT logged in (agent-browser only):
- Navigate to
https://github.com/login - Ask the user to log in manually in the headed browser window.
- Wait for user confirmation, then navigate back to the PR page.
导航至PR页面后立即截取快照,验证登录状态。
javascript
// Playwright MCP
browser_navigate({ url: "https://github.com/{owner}/{repo}/pull/{number}" })
// Chrome DevTools MCP
navigate_page({ url: "https://github.com/{owner}/{repo}/pull/{number}", type: "url" })
// agent-browser(使用--profile持久化登录状态)
agent-browser --headed --profile ~/.agent-browser-github open "https://github.com/{owner}/{repo}/pull/{number}"若出现SSO认证界面:截取快照,定位“Continue”按钮并点击。
若未登录(仅agent-browser):
- 导航至
https://github.com/login - 请用户在可视化浏览器窗口中手动登录。
- 等待用户确认后,返回PR页面。
Step 2: Locate the file upload input
步骤2:定位文件上传输入框
Take a snapshot/screenshot and scroll to the bottom to find the comment area.
GitHub renders a file upload input in the comment form. Try these selectors in order (GitHub's UI can change — if one fails, try the next):
javascript
// Shared JS for MCP-based tools — tries multiple known selectors
() => {
const selectors = [
'input[type="file"][id*="comment"]',
'input[type="file"][id="fc-new_comment_field"]',
'#new_comment_field',
'input[type="file"]'
];
for (const sel of selectors) {
const el = document.querySelector(sel);
if (el) return { found: true, id: el.id, selector: sel };
}
return { found: false };
}For Chrome DevTools MCP, you can also take a snapshot to find the of the file upload element directly.
uid截取快照/截图,滚动至页面底部找到评论区域。
GitHub会在评论表单中渲染文件上传输入框。按以下顺序尝试选择器(GitHub UI可能变更——若一个失败,尝试下一个):
javascript
// 基于MCP工具的通用JS代码——尝试多个已知选择器
() => {
const selectors = [
'input[type="file"][id*="comment"]',
'input[type="file"][id="fc-new_comment_field"]',
'#new_comment_field',
'input[type="file"]'
];
for (const sel of selectors) {
const el = document.querySelector(sel);
if (el) return { found: true, id: el.id, selector: sel };
}
return { found: false };
}对于Chrome DevTools MCP,也可直接截取快照以找到文件上传元素的。
uidStep 3: Upload images one by one
步骤3:逐个上传图片
Upload each image file using the detected tool. Wait 2–3 seconds between uploads to allow GitHub to process each file.
For multiple images, upload them all to the same comment textarea before extracting URLs — this is more efficient than navigating between uploads.
javascript
// Chrome DevTools MCP: upload_file requires the uid of the input element
// Playwright MCP: browser_file_upload takes the element ref and file path(s) array
// agent-browser: agent-browser upload {ref} {absolute_path}Important: Always use absolute file paths.
使用检测到的工具上传每个图片文件。每次上传间隔2-3秒,以便GitHub处理文件。
若上传多张图片,先将所有图片上传至同一评论输入框,再提取URL——这比多次导航上传更高效。
javascript
// Chrome DevTools MCP:upload_file需要输入元素的uid
// Playwright MCP:browser_file_upload接收元素引用和文件路径数组
// agent-browser:agent-browser upload {ref} {absolute_path}重要提示:始终使用绝对文件路径。
Step 4: Retrieve uploaded image URLs
步骤4:获取已上传图片的URL
Wait 3–5 seconds after the last upload, then read the textarea value. GitHub injects markdown image syntax like into the textarea:
javascript
// Shared JS — tries both known textarea IDs
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
return ta ? ta.value : 'textarea not found';
}bash
undefined最后一次上传后等待3-5秒,然后读取输入框内容。GitHub会将类似的Markdown图片语法注入输入框:
javascript
// 通用JS代码——尝试两个已知的输入框ID
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
return ta ? ta.value : 'textarea not found';
}bash
undefinedagent-browser
agent-browser
agent-browser eval 'document.getElementById("new_comment_field")?.value || document.querySelector("textarea[id*=comment]")?.value || "not found"'
The response contains URLs in the format:
Extract all image URLs/markdown from the textarea value before clearing it.agent-browser eval 'document.getElementById("new_comment_field")?.value || document.querySelector("textarea[id*=comment]")?.value || "not found"'
返回结果包含如下格式的URL:
在清空输入框前,提取其中所有图片URL/Markdown内容。Step 5: Clear the textarea (do not submit the comment)
步骤5:清空输入框(请勿提交评论)
javascript
// MCP-based tools
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
if (ta) { ta.value = ""; return "cleared"; }
return "textarea not found";
}bash
undefinedjavascript
// 基于MCP的工具
() => {
const ta = document.getElementById('new_comment_field')
|| document.querySelector('textarea[id*="comment"]');
if (ta) { ta.value = ""; return "cleared"; }
return "textarea not found";
}bash
undefinedagent-browser
agent-browser
agent-browser eval 'const ta = document.getElementById("new_comment_field") || document.querySelector("textarea[id*=comment]"); if(ta){ta.value=""} "cleared"'
undefinedagent-browser eval 'const ta = document.getElementById("new_comment_field") || document.querySelector("textarea[id*=comment]"); if(ta){ta.value=""} "cleared"'
undefinedStep 6: Embed images in the PR
步骤6:将图片嵌入PR
Option A — Update PR description (append images to existing body):
bash
EXISTING_BODY=$(gh pr view {PR_NUMBER} --json body -q .body)
gh pr edit {PR_NUMBER} --body "$(printf '%s\n\n## Screenshots\n\n%s' "$EXISTING_BODY" "")"Option B — Post as a new comment:
bash
gh pr comment {PR_NUMBER} --body "## Screenshots
"Use Option A by default unless the user explicitly asks for a comment, or if the PR description is already long and a comment would be cleaner.
选项A——更新PR描述(将图片追加到现有内容后):
bash
EXISTING_BODY=$(gh pr view {PR_NUMBER} --json body -q .body)
gh pr edit {PR_NUMBER} --body "$(printf '%s\n\n## 截图\n\n%s' "$EXISTING_BODY" "")"选项B——发布为新评论:
bash
gh pr comment {PR_NUMBER} --body "## 截图
"默认使用选项A,除非用户明确要求使用评论,或PR描述已过长,使用评论会更清晰。
Step 7: Verify the result
步骤7:验证结果
Reload the page and take a screenshot to confirm the images are displayed correctly.
重新加载页面并截取截图,确认图片已正确显示。
Tips
技巧
- Image sizing: Control display size via HTML tags:
<img><img width="800" alt="description" src="..." /> - Multiple images: Upload all images in one session to the same textarea; extract all URLs before clearing
- Prefer MCP tools: Always prefer Playwright or Chrome DevTools MCP over agent-browser for simpler setup
- agent-browser login persistence: Use to persist GitHub login across sessions
--profile ~/.agent-browser-github
- 图片尺寸:通过HTML 标签控制显示尺寸:
<img><img width="800" alt="description" src="..." /> - 多张图片:在同一会话中将所有图片上传至同一输入框;清空前提取所有URL
- 优先使用MCP工具:始终优先选择Playwright或Chrome DevTools MCP,而非agent-browser,前者设置更简单
- agent-browser登录持久化:使用在会话间持久化GitHub登录状态
--profile ~/.agent-browser-github
Troubleshooting
故障排除
| Issue | Solution |
|---|---|
| Not logged in (MCP tools) | SSO screen may appear — take snapshot, find "Continue" button, click it |
| Not logged in (agent-browser) | Use |
| Browser window not visible | For agent-browser, ensure |
| File path with special characters (e.g., Unicode narrow spaces from CleanShot) | Copy file to |
| File upload fails | Ensure the file path is absolute |
| Textarea doesn't contain URLs yet | Wait 3–5 seconds after upload before running JS eval; retry once if needed |
| Textarea selector not found | GitHub UI changes occasionally — use the multi-selector JS in Step 2 to find the current element |
| Chrome DevTools MCP disconnected | Reconnect via |
| agent-browser not found | |
| No browser tools found | Use |
| PR not found / 404 | Private repos return 404 for unauthenticated users — check login state |
| 问题 | 解决方案 |
|---|---|
| 未登录(MCP工具) | 可能出现SSO界面——截取快照,找到“Continue”按钮并点击 |
| 未登录(agent-browser) | 使用 |
| 浏览器窗口不可见 | 对于agent-browser,确保使用 |
| 文件路径包含特殊字符(例如CleanShot生成的Unicode窄空格) | 将文件复制到 |
| 文件上传失败 | 确保使用绝对文件路径 |
| 输入框中尚未出现URL | 上传后等待3-5秒再执行JS代码;若需要,重试一次 |
| 未找到输入框选择器 | GitHub UI偶尔会变更——使用步骤2中的多选择器JS代码查找当前元素 |
| Chrome DevTools MCP断开连接 | 通过 |
| 未找到agent-browser | 执行 |
| 未找到浏览器工具 | 使用 |
| PR未找到 / 404 | 私有仓库对未认证用户返回404——检查登录状态 |
Notes
注意事项
- GitHub URLs are persistent — images remain accessible even without submitting the comment
user-attachments/assets/ - Editing the description directly in the browser UI is fragile due to GitHub UI structure changes — updating via is strongly preferred
gh pr edit - Multiple images can be uploaded in a single session before extracting URLs
- MCP-based tools connect to existing browser instances, preserving cookies and login sessions
- GitHub的格式URL是持久化的——即使不提交评论,图片仍可访问
user-attachments/assets/ - 直接在浏览器UI中编辑描述易受GitHub UI结构变更影响——强烈建议通过命令更新
gh pr edit - 可在单次会话中上传多张图片,之后再提取URL
- 基于MCP的工具连接到已运行的浏览器实例,可保留Cookie和登录会话