web-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Web Search & Scrape

网页搜索与爬取

You have three tools for web access. Use them in combination based on what the task needs.
你有三种网页访问工具,请根据任务需求组合使用。

The Stack

工具栈

SearXNG — Search Engine

SearXNG — 搜索引擎

Local meta-search aggregating 25+ engines (Google, Bing, DuckDuckGo, Brave, etc). No tracking, no rate limits, JSON API.
bash
undefined
本地元搜索引擎,聚合了25+个引擎(Google、Bing、DuckDuckGo、Brave等)。无追踪、无请求频率限制,提供JSON API。
bash
undefined

Basic search

Basic search

curl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c " import json, sys data = json.load(sys.stdin) for r in data.get('results', [])[:10]: print(r.get('title', '')) print(r.get('url', '')) print(r.get('content', '')[:200]) print() "

**Category search** — append `&categories=` with: `general`, `news`, `images`, `files`, `science`, `it`, `music`, `videos`

```bash
curl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c " import json, sys data = json.load(sys.stdin) for r in data.get('results', [])[:10]: print(r.get('title', '')) print(r.get('url', '')) print(r.get('content', '')[:200]) print() "

**分类搜索** —— 追加`&categories=`参数,可选值:`general`、`news`、`images`、`files`、`science`、`it`、`music`、`videos`

```bash

News search

News search

Multiple categories

Multiple categories


**Pagination** — append `&pageno=2` (or 3, 4, etc) for more results.

**分页** —— 追加`&pageno=2`(或3、4等)获取更多结果。

Lightpanda — Fast Headless Fetch

Lightpanda — 快速无头抓取工具

Built in Zig. 10x faster than Chrome, tiny memory footprint. Use this as the default for fetching page content.
bash
undefined
基于Zig构建,比Chrome快10倍,内存占用极低。作为获取页面内容的默认工具。
bash
undefined

Fetch as markdown (best for reading/summarizing)

Fetch as markdown (best for reading/summarizing)

lightpanda fetch --dump markdown https://example.com
lightpanda fetch --dump markdown https://example.com

Fetch as HTML (when you need structure)

Fetch as HTML (when you need structure)

lightpanda fetch --dump html https://example.com
lightpanda fetch --dump html https://example.com

Semantic tree (useful for understanding page layout)

Semantic tree (useful for understanding page layout)

lightpanda fetch --dump semantic_tree https://example.com
lightpanda fetch --dump semantic_tree https://example.com

Strip unnecessary elements

Strip unnecessary elements

lightpanda fetch --dump markdown --strip_mode js,css https://example.com
lightpanda fetch --dump markdown --strip_mode js,css https://example.com

Include iframe content

Include iframe content

lightpanda fetch --dump markdown --with_frames https://example.com
undefined
lightpanda fetch --dump markdown --with_frames https://example.com
undefined

Agent-Browser — Full Browser Automation

Agent-Browser — 全功能浏览器自动化工具

Playwright-based. Use when Lightpanda can't handle the page (JS-heavy SPAs, login-required pages, dynamic content, form interactions).
bash
undefined
基于Playwright构建。当Lightpanda无法处理页面时使用(JS重度单页应用、需要登录的页面、动态内容、表单交互场景)。
bash
undefined

Open and snapshot

Open and snapshot

agent-browser open https://example.com agent-browser wait --load networkidle agent-browser snapshot -i
agent-browser open https://example.com agent-browser wait --load networkidle agent-browser snapshot -i

Get text content

Get text content

agent-browser get text body
agent-browser get text body

Interact with elements

Interact with elements

agent-browser fill @e1 "search query" agent-browser click @e2
agent-browser fill @e1 "search query" agent-browser click @e2

Screenshot for visual inspection

Screenshot for visual inspection

agent-browser screenshot --annotate
agent-browser screenshot --annotate

Always close when done

Always close when done

agent-browser close
undefined
agent-browser close
undefined

Decision Guide

决策指南

Need to find something? → SearXNG first. Always.
Need page content? → Lightpanda. It's fast, it returns clean markdown, and it handles 90% of pages.
Lightpanda returns garbage or empty content? → The page probably needs JavaScript to render. Switch to Agent-Browser.
Need to log in, fill forms, click through flows? → Agent-Browser. Save auth state for reuse:
bash
agent-browser state save auth.json
需要查找信息? → 优先使用SearXNG,永远如此。
需要页面内容? → 用Lightpanda。速度快,返回干净的markdown格式,可处理90%的页面。
Lightpanda返回无效或空内容? → 页面大概率需要JavaScript渲染,切换到Agent-Browser。
需要登录、填写表单、点击跳转流程? → 用Agent-Browser。保存认证状态以便复用:
bash
agent-browser state save auth.json

Later:

Later:

agent-browser state load auth.json
undefined
agent-browser state load auth.json
undefined

The
web-search
CLI

web-search
命令行工具

There's also a unified CLI at
~/.agents/tools/web-search
(also available as
web-search
on PATH) that chains these together:
bash
undefined
还有一个统一的CLI工具,路径为
~/.agents/tools/web-search
(也已加入PATH,可直接使用
web-search
调用),可以将上述工具串联使用:
bash
undefined

Search only

Search only

web-search "hospice compliance CMS 2026"
web-search "hospice compliance CMS 2026"

Search + scrape top results

Search + scrape top results

web-search "hospice compliance CMS 2026" --scrape -n 3
web-search "hospice compliance CMS 2026" --scrape -n 3

Fetch a single URL

Fetch a single URL

web-search --fetch https://example.com
web-search --fetch https://example.com

Use Agent-Browser for JS-heavy pages

Use Agent-Browser for JS-heavy pages

web-search --fetch https://spa-app.com --browser
web-search --fetch https://spa-app.com --browser

News search + scrape

News search + scrape

web-search "CMS hospice updates" --categories news --scrape
undefined
web-search "CMS hospice updates" --categories news --scrape
undefined

Common Patterns

常用模式

Research a topic

调研某个主题

bash
undefined
bash
undefined

1. Search

1. Search

2. Review results, pick the best URLs

2. Review results, pick the best URLs

3. Fetch the good ones

3. Fetch the good ones

lightpanda fetch --dump markdown https://good-result.com
undefined
lightpanda fetch --dump markdown https://good-result.com
undefined

Get current/breaking info

获取最新/突发信息

bash
undefined
bash
undefined

News category + recent results

News category + recent results

Deep scrape multiple pages

深度爬取多个页面

bash
undefined
bash
undefined

Search, extract URLs, fetch each

Search, extract URLs, fetch each

curl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
undefined
curl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done
undefined

Handle a stubborn JS-heavy page

处理难加载的JS重度页面

bash
undefined
bash
undefined

Lightpanda returned nothing useful? Switch to agent-browser

Lightpanda returned nothing useful? Switch to agent-browser

agent-browser open https://stubborn-spa.com agent-browser wait --load networkidle agent-browser get text body > /tmp/page-content.txt agent-browser close
undefined
agent-browser open https://stubborn-spa.com agent-browser wait --load networkidle agent-browser get text body > /tmp/page-content.txt agent-browser close
undefined

Important Notes

重要注意事项

  • SearXNG runs at
    http://localhost:8888
    . If it's down, check:
    docker ps | grep searxng
    and restart with
    docker start searxng
  • Lightpanda is at
    /opt/homebrew/bin/lightpanda
  • Agent-Browser is at
    /opt/homebrew/bin/agent-browser
    (v0.21.1)
  • The
    web-search
    CLI is at
    ~/.agents/tools/web-search
    and symlinked to
    /opt/homebrew/bin/web-search
  • When SearXNG returns results, the
    content
    field has a snippet — often enough to answer simple factual questions without fetching the full page
  • For URL encoding in curl, use python:
    python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"
  • SearXNG运行在
    http://localhost:8888
    。如果服务不可用,检查:
    docker ps | grep searxng
    ,然后用
    docker start searxng
    重启。
  • Lightpanda路径为
    /opt/homebrew/bin/lightpanda
  • Agent-Browser路径为
    /opt/homebrew/bin/agent-browser
    (版本v0.21.1)
  • web-search
    CLI路径为
    ~/.agents/tools/web-search
    ,软链接到
    /opt/homebrew/bin/web-search
  • 当SearXNG返回结果时,
    content
    字段包含摘要——通常足以回答简单的事实类问题,无需获取完整页面
  • 要在curl中做URL编码,可以使用python:
    python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"

Bundled Resources

附带资源

This skill includes everything needed to rebuild or troubleshoot the stack:
  • scripts/web-search
    — The unified CLI script (also installed at
    ~/.agents/tools/web-search
    )
  • references/infrastructure.md
    — Full infrastructure docs: binary locations, SearXNG API reference, container management, OrbStack setup, troubleshooting guide. Read this if something breaks or you need to reconfigure.
  • references/searxng-settings.yml
    — SearXNG config (engines, formats, API settings). Edit and copy to
    ~/.agents/searxng/config/settings.yml
    then
    docker restart searxng
    to apply changes.
该技能包含重建或排查工具栈问题所需的所有内容:
  • scripts/web-search
    —— 统一CLI脚本(也安装在
    ~/.agents/tools/web-search
  • references/infrastructure.md
    —— 完整基础设施文档:二进制文件位置、SearXNG API参考、容器管理、OrbStack设置、故障排查指南。如果出现故障或需要重新配置请阅读本文档。
  • references/searxng-settings.yml
    —— SearXNG配置文件(引擎、格式、API设置)。编辑后复制到
    ~/.agents/searxng/config/settings.yml
    ,然后执行
    docker restart searxng
    即可生效。

Related Skills

相关技能

  • [[agent-browser]] — full browser automation for JS-heavy pages and form interaction
  • [[human-browser]] — stealth browsing with residential proxies for bot-protected sites
  • [[seo]] — SEO audits and optimization that complement web research
  • [[agent-browser]] —— 适用于JS重度页面和表单交互的全功能浏览器自动化工具
  • [[human-browser]] —— 使用住宅代理的隐身浏览工具,适用于有反爬保护的站点
  • [[seo]] —— 与网页调研互补的SEO审计与优化工具