Web Search & Scrape

网页搜索与爬取

You have three tools for web access. Use them in combination based on what the task needs.

你有三种网页访问工具，请根据任务需求组合使用。

The Stack

工具栈

SearXNG — Search Engine

SearXNG — 搜索引擎

Local meta-search aggregating 25+ engines (Google, Bing, DuckDuckGo, Brave, etc). No tracking, no rate limits, JSON API.

bash

undefined

本地元搜索引擎，聚合了25+个引擎（Google、Bing、DuckDuckGo、Brave等）。无追踪、无请求频率限制，提供JSON API。

bash

undefined

Basic search

curl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c " import json, sys data = json.load(sys.stdin) for r in data.get('results', [])[:10]: print(r.get('title', '')) print(r.get('url', '')) print(r.get('content', '')[:200]) print() "


**Category search** — append `&categories=` with: `general`, `news`, `images`, `files`, `science`, `it`, `music`, `videos`

```bash

curl -s "http://localhost:8888/search?q=QUERY&format=json" | python3 -c " import json, sys data = json.load(sys.stdin) for r in data.get('results', [])[:10]: print(r.get('title', '')) print(r.get('url', '')) print(r.get('content', '')[:200]) print() "


**分类搜索** —— 追加`&categories=`参数，可选值：`general`、`news`、`images`、`files`、`science`、`it`、`music`、`videos`

```bash

News search

curl -s "http://localhost:8888/search?q=QUERY&format=json&categories=news"

Multiple categories

curl -s "http://localhost:8888/search?q=QUERY&format=json&categories=science,it"


**Pagination** — append `&pageno=2` (or 3, 4, etc) for more results.

curl -s "http://localhost:8888/search?q=QUERY&format=json&categories=science,it"


**分页** —— 追加`&pageno=2`（或3、4等）获取更多结果。

Lightpanda — Fast Headless Fetch

Lightpanda — 快速无头抓取工具

Built in Zig. 10x faster than Chrome, tiny memory footprint. Use this as the default for fetching page content.

bash

undefined

基于Zig构建，比Chrome快10倍，内存占用极低。作为获取页面内容的默认工具。

bash

undefined

Fetch as markdown (best for reading/summarizing)

lightpanda fetch --dump markdown https://example.com

Fetch as HTML (when you need structure)

lightpanda fetch --dump html https://example.com

Semantic tree (useful for understanding page layout)

lightpanda fetch --dump semantic_tree https://example.com

Strip unnecessary elements

lightpanda fetch --dump markdown --strip_mode js,css https://example.com

Include iframe content

lightpanda fetch --dump markdown --with_frames https://example.com

undefined

lightpanda fetch --dump markdown --with_frames https://example.com

undefined

Agent-Browser — Full Browser Automation

Agent-Browser — 全功能浏览器自动化工具

Playwright-based. Use when Lightpanda can't handle the page (JS-heavy SPAs, login-required pages, dynamic content, form interactions).

bash

undefined

基于Playwright构建。当Lightpanda无法处理页面时使用（JS重度单页应用、需要登录的页面、动态内容、表单交互场景）。

bash

undefined

Open and snapshot

agent-browser open https://example.com agent-browser wait --load networkidle agent-browser snapshot -i

Get text content

agent-browser get text body

Interact with elements

agent-browser fill @e1 "search query" agent-browser click @e2

Screenshot for visual inspection

agent-browser screenshot --annotate

Always close when done

agent-browser close

undefined

agent-browser close

undefined

Decision Guide

决策指南

Need to find something? → SearXNG first. Always.

Need page content? → Lightpanda. It's fast, it returns clean markdown, and it handles 90% of pages.

Lightpanda returns garbage or empty content? → The page probably needs JavaScript to render. Switch to Agent-Browser.

Need to log in, fill forms, click through flows? → Agent-Browser. Save auth state for reuse:

bash

agent-browser state save auth.json

需要查找信息？ → 优先使用SearXNG，永远如此。

需要页面内容？ → 用Lightpanda。速度快，返回干净的markdown格式，可处理90%的页面。

Lightpanda返回无效或空内容？ → 页面大概率需要JavaScript渲染，切换到Agent-Browser。

需要登录、填写表单、点击跳转流程？ → 用Agent-Browser。保存认证状态以便复用：

bash

agent-browser state save auth.json

Later:

agent-browser state load auth.json

undefined

agent-browser state load auth.json

undefined

The

web-search

CLI

web-search

命令行工具

There's also a unified CLI at

~/.agents/tools/web-search

(also available as

web-search

on PATH) that chains these together:

bash

undefined

还有一个统一的CLI工具，路径为

~/.agents/tools/web-search

（也已加入PATH，可直接使用

web-search

调用），可以将上述工具串联使用：

bash

undefined

Search only

web-search "hospice compliance CMS 2026"

Search + scrape top results

web-search "hospice compliance CMS 2026" --scrape -n 3

Fetch a single URL

web-search --fetch https://example.com

Use Agent-Browser for JS-heavy pages

web-search --fetch https://spa-app.com --browser

News search + scrape

web-search "CMS hospice updates" --categories news --scrape

undefined

web-search "CMS hospice updates" --categories news --scrape

undefined

Common Patterns

常用模式

Research a topic

调研某个主题

bash

undefined

bash

undefined

1. Search

curl -s "http://localhost:8888/search?q=topic+here&format=json" > /tmp/results.json

2. Review results, pick the best URLs

3. Fetch the good ones

lightpanda fetch --dump markdown https://good-result.com

undefined

lightpanda fetch --dump markdown https://good-result.com

undefined

Get current/breaking info

获取最新/突发信息

bash

undefined

bash

undefined

News category + recent results

curl -s "http://localhost:8888/search?q=topic&format=json&categories=news"

undefined

curl -s "http://localhost:8888/search?q=topic&format=json&categories=news"

undefined

Deep scrape multiple pages

深度爬取多个页面

bash

undefined

bash

undefined

Search, extract URLs, fetch each

curl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done

undefined

curl -s "http://localhost:8888/search?q=topic&format=json" |
python3 -c "import json,sys; [print(r['url']) for r in json.load(sys.stdin)['results'][:5]]" |
while read url; do echo "=== $url ===" lightpanda fetch --dump markdown "$url" 2>/dev/null done

undefined

Handle a stubborn JS-heavy page

处理难加载的JS重度页面

bash

undefined

bash

undefined

Lightpanda returned nothing useful? Switch to agent-browser

agent-browser open https://stubborn-spa.com agent-browser wait --load networkidle agent-browser get text body > /tmp/page-content.txt agent-browser close

undefined

agent-browser open https://stubborn-spa.com agent-browser wait --load networkidle agent-browser get text body > /tmp/page-content.txt agent-browser close

undefined

Important Notes

重要注意事项

SearXNG runs at

http://localhost:8888

. If it's down, check:

docker ps | grep searxng

and restart with

docker start searxng

Lightpanda is at
```
/opt/homebrew/bin/lightpanda
```
Agent-Browser is at
```
/opt/homebrew/bin/agent-browser
```
(v0.21.1)

The

web-search

CLI is at

~/.agents/tools/web-search

and symlinked to

/opt/homebrew/bin/web-search

When SearXNG returns results, the
```
content
```
field has a snippet — often enough to answer simple factual questions without fetching the full page

For URL encoding in curl, use python:

python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"

SearXNG运行在
```
http://localhost:8888
```
。如果服务不可用，检查：
```
docker ps | grep searxng
```
，然后用
```
docker start searxng
```
重启。
Lightpanda路径为
```
/opt/homebrew/bin/lightpanda
```
Agent-Browser路径为
```
/opt/homebrew/bin/agent-browser
```
（版本v0.21.1）

web-search

CLI路径为

~/.agents/tools/web-search

，软链接到

/opt/homebrew/bin/web-search

当SearXNG返回结果时，
```
content
```
字段包含摘要——通常足以回答简单的事实类问题，无需获取完整页面

要在curl中做URL编码，可以使用python：

python3 -c "import urllib.parse; print(urllib.parse.quote('my query'))"

Bundled Resources

附带资源

This skill includes everything needed to rebuild or troubleshoot the stack:

scripts/web-search
— The unified CLI script (also installed at
```
~/.agents/tools/web-search
```
)
references/infrastructure.md
— Full infrastructure docs: binary locations, SearXNG API reference, container management, OrbStack setup, troubleshooting guide. Read this if something breaks or you need to reconfigure.

references/searxng-settings.yml
— SearXNG config (engines, formats, API settings). Edit and copy to

~/.agents/searxng/config/settings.yml

then

docker restart searxng

to apply changes.

该技能包含重建或排查工具栈问题所需的所有内容：

scripts/web-search
—— 统一CLI脚本（也安装在

~/.agents/tools/web-search

）

references/infrastructure.md
—— 完整基础设施文档：二进制文件位置、SearXNG API参考、容器管理、OrbStack设置、故障排查指南。如果出现故障或需要重新配置请阅读本文档。
references/searxng-settings.yml
—— SearXNG配置文件（引擎、格式、API设置）。编辑后复制到
```
~/.agents/searxng/config/settings.yml
```
，然后执行
```
docker restart searxng
```
即可生效。

web-search

Original

Translation

Web Search & Scrape

网页搜索与爬取

The Stack

工具栈

SearXNG — Search Engine

SearXNG — 搜索引擎

Basic search

Basic search

News search

News search

Multiple categories

Multiple categories

Lightpanda — Fast Headless Fetch

Lightpanda — 快速无头抓取工具

Fetch as markdown (best for reading/summarizing)

Fetch as markdown (best for reading/summarizing)

Fetch as HTML (when you need structure)

Fetch as HTML (when you need structure)

Semantic tree (useful for understanding page layout)

Semantic tree (useful for understanding page layout)

Strip unnecessary elements

Strip unnecessary elements

Include iframe content

Include iframe content

Agent-Browser — Full Browser Automation

Agent-Browser — 全功能浏览器自动化工具

Open and snapshot

Open and snapshot

Get text content

Get text content

Interact with elements

Interact with elements

Screenshot for visual inspection

Screenshot for visual inspection

Always close when done

Always close when done

Decision Guide

决策指南

Later:

Later:

The web-search CLI

web-search 命令行工具

Search only

Search only

Search + scrape top results

Search + scrape top results

Fetch a single URL

Fetch a single URL

Use Agent-Browser for JS-heavy pages

Use Agent-Browser for JS-heavy pages

News search + scrape

News search + scrape

Common Patterns

常用模式

Research a topic

调研某个主题

1. Search

1. Search

2. Review results, pick the best URLs

2. Review results, pick the best URLs

3. Fetch the good ones

3. Fetch the good ones

Get current/breaking info

获取最新/突发信息

News category + recent results

News category + recent results

Deep scrape multiple pages

深度爬取多个页面

Search, extract URLs, fetch each

Search, extract URLs, fetch each

Handle a stubborn JS-heavy page

处理难加载的JS重度页面

Lightpanda returned nothing useful? Switch to agent-browser

Lightpanda returned nothing useful? Switch to agent-browser

Important Notes

重要注意事项

Bundled Resources

The
`web-search`
CLI

`web-search`
命令行工具