web-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Search & Scrape
网页搜索与爬取
You have three tools for web access. Use them in combination based on what the task needs.
你有三种网页访问工具,请根据任务需求组合使用。
The Stack
工具栈
SearXNG — Search Engine
SearXNG — 搜索引擎
Local meta-search aggregating 25+ engines (Google, Bing, DuckDuckGo, Brave, etc). No tracking, no rate limits, JSON API.
bash
undefined本地元搜索引擎,聚合了25+个引擎(Google、Bing、DuckDuckGo、Brave等)。无追踪、无请求频率限制,提供JSON API。
bash
undefinedBasic search
Basic search
curl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('results', [])[:10]:
print(r.get('title', ''))
print(r.get('url', ''))
print(r.get('content', '')[:200])
print()
"
**Category search** — append `&categories=` with: `general`, `news`, `images`, `files`, `science`, `it`, `music`, `videos`
```bashcurl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data.get('results', [])[:10]:
print(r.get('title', ''))
print(r.get('url', ''))
print(r.get('content', '')[:200])
print()
"
**分类搜索** —— 追加`&categories=`参数,可选值:`general`、`news`、`images`、`files`、`science`、`it`、`music`、`videos`
```bashNews search
News search
Multiple categories
Multiple categories
**Pagination** — append `&pageno=2` (or 3, 4, etc) for more results.
**分页** —— 追加`&pageno=2`(或3、4等)获取更多结果。Lightpanda — Fast Headless Fetch
Lightpanda — 快速无头抓取工具
Built in Zig. 10x faster than Chrome, tiny memory footprint. Use this as the default for fetching page content.
bash
undefined基于Zig构建,比Chrome快10倍,内存占用极低。作为获取页面内容的默认工具。
bash
undefinedFetch as markdown (best for reading/summarizing)
Fetch as markdown (best for reading/summarizing)
lightpanda fetch --dump markdown https://example.com
lightpanda fetch --dump markdown https://example.com
Fetch as HTML (when you need structure)
Fetch as HTML (when you need structure)
lightpanda fetch --dump html https://example.com
lightpanda fetch --dump html https://example.com
Semantic tree (useful for understanding page layout)
Semantic tree (useful for understanding page layout)
lightpanda fetch --dump semantic_tree https://example.com
lightpanda fetch --dump semantic_tree https://example.com
Strip unnecessary elements
Strip unnecessary elements
lightpanda fetch --dump markdown --strip_mode js,css https://example.com
lightpanda fetch --dump markdown --strip_mode js,css https://example.com
Include iframe content
Include iframe content
lightpanda fetch --dump markdown --with_frames https://example.com
undefinedlightpanda fetch --dump markdown --with_frames https://example.com
undefinedAgent-Browser — Full Browser Automation
Agent-Browser — 全功能浏览器自动化工具
Playwright-based. Use when Lightpanda can't handle the page (JS-heavy SPAs, login-required pages, dynamic content, form interactions).
bash
undefined基于Playwright构建。当Lightpanda无法处理页面时使用(JS重度单页应用、需要登录的页面、动态内容、表单交互场景)。
bash
undefinedOpen and snapshot
Open and snapshot
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser snapshot -i
Get text content
Get text content
agent-browser get text body
agent-browser get text body
Interact with elements
Interact with elements
agent-browser fill @e1 "search query"
agent-browser click @e2
agent-browser fill @e1 "search query"
agent-browser click @e2
Screenshot for visual inspection
Screenshot for visual inspection
agent-browser screenshot --annotate
agent-browser screenshot --annotate
Always close when done
Always close when done
agent-browser close
undefinedagent-browser close
undefinedDecision Guide
决策指南
Need to find something? → SearXNG first. Always.
Need page content? → Lightpanda. It's fast, it returns clean markdown, and it handles 90% of pages.
Lightpanda returns garbage or empty content? → The page probably needs JavaScript to render. Switch to Agent-Browser.
Need to log in, fill forms, click through flows? → Agent-Browser. Save auth state for reuse:
bash
agent-browser state save auth.json需要查找信息? → 优先使用SearXNG,永远如此。
需要页面内容? → 用Lightpanda。速度快,返回干净的markdown格式,可处理90%的页面。
Lightpanda返回无效或空内容? → 页面大概率需要JavaScript渲染,切换到Agent-Browser。
需要登录、填写表单、点击跳转流程? → 用Agent-Browser。保存认证状态以便复用:
bash
agent-browser state save auth.jsonLater:
Later:
agent-browser state load auth.json
undefinedagent-browser state load auth.json
undefinedThe web-search
CLI
web-searchweb-search
命令行工具
web-searchThere's also a unified CLI at (also available as on PATH) that chains these together:
~/.agents/tools/web-searchweb-searchbash
undefined还有一个统一的CLI工具,路径为(也已加入PATH,可直接使用调用),可以将上述工具串联使用:
~/.agents/tools/web-searchweb-searchbash
undefinedSearch only
Search only
web-search "hospice compliance CMS 2026"
web-search "hospice compliance CMS 2026"
Search + scrape top results
Search + scrape top results
web-search "hospice compliance CMS 2026" --scrape -n 3
web-search "hospice compliance CMS 2026" --scrape -n 3
Fetch a single URL
Fetch a single URL
web-search --fetch https://example.com
web-search --fetch https://example.com
Use Agent-Browser for JS-heavy pages
Use Agent-Browser for JS-heavy pages
web-search --fetch https://spa-app.com --browser
web-search --fetch https://spa-app.com --browser
News search + scrape
News search + scrape
web-search "CMS hospice updates" --categories news --scrape
undefinedweb-search "CMS hospice updates" --categories news --scrape
undefinedCommon Patterns
常用模式
Research a topic
调研某个主题
bash
undefinedbash
undefined1. Search
1. Search
curl -s "http://localhost:8888/search?q=topic+here&format=json" > /tmp/results.json
curl -s "http://localhost:8888/search?q=topic+here&format=json" > /tmp/results.json
2. Review results, pick the best URLs
2. Review results, pick the best URLs
3. Fetch the good ones
3. Fetch the good ones
lightpanda fetch --dump markdown https://good-result.com
undefinedlightpanda fetch --dump markdown https://good-result.com
undefinedGet current/breaking info
获取最新/突发信息
bash
undefinedbash
undefinedNews category + recent results
News category + recent results
undefinedundefinedDeep scrape multiple pages
深度爬取多个页面
bash
undefinedbash
undefinedSearch, extract URLs, fetch each
Search, extract URLs, fetch each
curl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
undefinedcurl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
undefinedHandle a stubborn JS-heavy page
处理难加载的JS重度页面
bash
undefinedbash
undefinedLightpanda returned nothing useful? Switch to agent-browser
Lightpanda returned nothing useful? Switch to agent-browser
agent-browser open https://stubborn-spa.com
agent-browser wait --load networkidle
agent-browser get text body > /tmp/page-content.txt
agent-browser close
undefinedagent-browser open https://stubborn-spa.com
agent-browser wait --load networkidle
agent-browser get text body > /tmp/page-content.txt
agent-browser close
undefinedImportant Notes
重要注意事项
- SearXNG runs at . If it's down, check:
http://localhost:8888and restart withdocker ps | grep searxngdocker start searxng - Lightpanda is at
/opt/homebrew/bin/lightpanda - Agent-Browser is at (v0.21.1)
/opt/homebrew/bin/agent-browser - The CLI is at
web-searchand symlinked to~/.agents/tools/web-search/opt/homebrew/bin/web-search - When SearXNG returns results, the field has a snippet — often enough to answer simple factual questions without fetching the full page
content - For URL encoding in curl, use python:
python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"
- SearXNG运行在。如果服务不可用,检查:
http://localhost:8888,然后用docker ps | grep searxng重启。docker start searxng - Lightpanda路径为
/opt/homebrew/bin/lightpanda - Agent-Browser路径为(版本v0.21.1)
/opt/homebrew/bin/agent-browser - CLI路径为
web-search,软链接到~/.agents/tools/web-search/opt/homebrew/bin/web-search - 当SearXNG返回结果时,字段包含摘要——通常足以回答简单的事实类问题,无需获取完整页面
content - 要在curl中做URL编码,可以使用python:
python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"
Bundled Resources
附带资源
This skill includes everything needed to rebuild or troubleshoot the stack:
- — The unified CLI script (also installed at
scripts/web-search)~/.agents/tools/web-search - — Full infrastructure docs: binary locations, SearXNG API reference, container management, OrbStack setup, troubleshooting guide. Read this if something breaks or you need to reconfigure.
references/infrastructure.md - — SearXNG config (engines, formats, API settings). Edit and copy to
references/searxng-settings.ymlthen~/.agents/searxng/config/settings.ymlto apply changes.docker restart searxng
该技能包含重建或排查工具栈问题所需的所有内容:
- —— 统一CLI脚本(也安装在
scripts/web-search)~/.agents/tools/web-search - —— 完整基础设施文档:二进制文件位置、SearXNG API参考、容器管理、OrbStack设置、故障排查指南。如果出现故障或需要重新配置请阅读本文档。
references/infrastructure.md - —— SearXNG配置文件(引擎、格式、API设置)。编辑后复制到
references/searxng-settings.yml,然后执行~/.agents/searxng/config/settings.yml即可生效。docker restart searxng
Related Skills
相关技能
- [[agent-browser]] — full browser automation for JS-heavy pages and form interaction
- [[human-browser]] — stealth browsing with residential proxies for bot-protected sites
- [[seo]] — SEO audits and optimization that complement web research
- [[agent-browser]] —— 适用于JS重度页面和表单交互的全功能浏览器自动化工具
- [[human-browser]] —— 使用住宅代理的隐身浏览工具,适用于有反爬保护的站点
- [[seo]] —— 与网页调研互补的SEO审计与优化工具