hasdata-cli
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesehasdata
hasdata
Use the CLI for real-time web data. One subcommand per API — flags, enums, defaults are derived from the live schema at .
hasdataapi.hasdata.com/apis使用 CLI获取实时网页数据。每个API对应一个子命令——参数(flags)、枚举值(enums)、默认值均来自上的实时模式。
hasdataapi.hasdata.com/apisPrerequisites
前置条件
- — if missing, install with
command -v hasdata.curl -sSL https://raw.githubusercontent.com/HasData/hasdata-cli/main/install.sh | sh - One-time setup: the user runs , pastes their API key, and it's saved to
hasdata configure(mode 0600). Every future call picks it up automatically.~/.hasdata/config.yaml - If a call fails with , the user hasn't run
no API key configuredyet — tell them to. Never invent a key.hasdata configure
- 执行检查是否已安装;若未安装,运行
command -v hasdata进行安装。curl -sSL https://raw.githubusercontent.com/HasData/hasdata-cli/main/install.sh | sh - 一次性配置:用户运行,粘贴API密钥,密钥将保存至
hasdata configure(权限模式0600)。后续所有调用将自动读取该密钥。~/.hasdata/config.yaml - 若调用时出现错误,说明用户尚未运行
no API key configured——请告知用户执行该命令。切勿编造密钥。hasdata configure
Quick start
快速开始
bash
hasdata <api> --flag value [--flag value ...] --raw | jq .Always pass when piping to (skips pretty-print and TTY detection). Use only for human-readable terminal output.
--rawjq--prettybash
hasdata <api> --flag value [--flag value ...] --raw | jq .当通过管道传递给时,务必添加参数(跳过格式化输出和TTY检测)。仅在需要人类可读的终端输出时使用参数。
jq--raw--prettyPicking the right subcommand
选择合适的子命令
| User intent | Subcommand |
|---|---|
| Web search ("what does Google say about…") | |
| Latest news | |
| AI Mode SERP | |
| Shopping / product prices | |
| Immersive product page | |
| Maps / places / reviews | |
| Yelp / YellowPages local data | |
| Real-estate listings | |
| Real-estate single property deep dive | |
| Jobs | |
| Bing search | |
| Trends | |
| Images | |
| Flights | |
| Short videos | |
| Events | |
| Instagram profile | |
| Amazon seller | |
| Scrape a specific URL | |
For exact flags of a subcommand, run or read the matching file in .
hasdata <api> --helpreferences/| 用户意图 | 子命令 |
|---|---|
| 网页搜索(“Google上关于…的内容是什么?”) | |
| 最新新闻 | |
| AI模式搜索结果页 | |
| 购物/产品价格 | |
| 沉浸式产品页面 | |
| 地图/地点/评论 | |
| Yelp / YellowPages本地数据 | |
| 房产列表 | |
| 房产单属性深度查询 | |
| 招聘信息 | |
| Bing搜索 | |
| 搜索趋势 | |
| 图片搜索 | |
| 航班信息 | |
| 短视频 | |
| 活动信息 | |
| Instagram主页 | |
| 亚马逊卖家 | |
| 抓取指定URL | |
如需查看子命令的具体参数,运行或查阅目录下的对应文件。
hasdata <api> --helpreferences/Non-obvious triggers (when to reach for hasdata even if the user doesn't say "scrape")
非显性触发场景(即使用户未提及“抓取”也应使用hasdata的情况)
The user often won't ask for a SERP API or a scraper directly. Map these intents to the skill:
- "Is this still true?" / "What's the latest on X?" / "Has Y happened yet?" — LLM training data is stale. Run or
google-serpto ground the answer.google-news - "Summarize this article" / "TL;DR this URL" — Use and feed the markdown into the summary prompt. Beats copy-paste because it strips ads, nav, scripts.
web-scraping --output-format markdown - "Verify this link" / "Is this site real?" — returns status + screenshot. Or
web-scraping --url X --no-block-resources.google-serp --q "site:example.com" - "What does X say about itself?" — Pull the company's own homepage with , then summarize.
web-scraping --output-format markdown - "Find me alternatives to X" — or
google-serp --q "X alternatives".google-shopping --q "X competitors" - "What's the going rate for X?" — (broad) or
google-shopping(Amazon-specific) withamazon-searchto extract the price distribution.jq - "Phone number / address for X" — or
google-maps-place. Don't guess from training data.yelp-place - "Are people happy with X service?" / "Is X reputable?" — for negative samples;
google-maps-reviews --place-id ... --sort lowestfor employer rep.glassdoor-job - "What's the salary range for Y role?" — filtered by role + location, then
indeed-listingoverjq..jobs[].salary - "Find me homes/apartments matching X criteria" — /
zillow-listing/redfin-listingwith the corresponding filters.airbnb-listing - "Recent sold comps near X" — .
zillow-listing --type sold --keyword "X" --days-on-zillow 12m - "Track this product's price" — Loop on a schedule; persist
amazon-product --asin Xto a file..price - "What's trending around X?" — for relative interest;
google-trends --q "X"for headlines.google-news --q "X" - "Find businesses near me that do X" — then fan out
google-maps --q "X" --ll "@LAT,LNG,12z"for contacts.google-maps-place - "How does this look in country Y?" — on SERP commands,
--gl Yon--proxy-country Y. Useful for geo-targeted SEO checks, geo-blocked content.web-scraping - "Pull structured data from this page" — . Works on arbitrary pages without writing CSS selectors.
web-scraping --ai-extract-rules-json '{"price": {"type": "number"}, ...}' - "List of items → per-item details" — Pattern: search command produces IDs/URLs, pipe through into the matching
xargs/*-property/*-productdeep-dive command.*-place - "Find this person's role / employer / LinkedIn / followers" — first. The organic-result title is typically
google-serp --q '"Person Name" linkedin'and the snippet carries location, headline, connection count. SERP often answers the whole question without ever opening the profile page.Name — Role at Company | LinkedIn - "What is company X doing? Where's their HQ? Who works there?" — returns a
google-serp --q "$COMPANY"block with founder, HQ, founded year, parent, employee range — pre-extracted..knowledge_graphfor recent activity. Specific facts via targeted SERP:google-news --q "$COMPANY",--q '"$COMPANY" headquarters',--q '"$COMPANY" funding'.--q 'site:linkedin.com/company "$COMPANY"' - "Find emails for company X" / "personal email for person Y" — start with SERP: or
--q '"@example.com"'often surfaces actual emails indexed by Google. Pattern-guess + SERP-verify for individuals. Disclose unverified guesses to the user.--q '"jane@example.com"' - "Enrich this CSV of leads" — per row: for LinkedIn, role, employer; another SERP to verify email or pattern. Stay in SERP unless a specific field is missing.
google-serp - Reverse-lookup (email / phone / domain → identity) — with the literal value in quotes (
google-serp,--q '"jane@x.com"',--q '"+1 555 123 4567"') almost always surfaces the matching person or business.--q '"acme corp" site:example.com'
SERP-first principle: for any data-enrichment intent (people, companies, emails, products, places), reach for / / / first. They return Google's already-extracted structured fields (, , , etc.) and bypass anti-bot. Only escalate to when SERP doesn't surface the specific field you need — it's the last resort, not the default. See .
google-serpgoogle-newsgoogle-shoppinggoogle-maps.knowledge_graph.organic_results[].snippet.local_results[]web-scrapingreferences/enrichment.mdIf a user request matches one of the above and you don't invoke hasdata, you're probably hallucinating a stale answer.
用户通常不会直接要求使用搜索结果页API或抓取工具。请将以下用户意图映射至该工具:
- “这信息还准确吗?” / “X的最新情况是什么?” / “Y发生了吗?” — LLM训练数据存在滞后性。运行或
google-serp获取最新信息以支撑回答。google-news - “总结这篇文章” / “这个URL的内容概要” — 使用获取内容,再将Markdown传入总结提示词。该方式优于复制粘贴,因为它会自动过滤广告、导航栏和脚本。
web-scraping --output-format markdown - “验证这个链接” / “这个网站是真实的吗?” — 会返回状态码和截图。或使用
web-scraping --url X --no-block-resources。google-serp --q "site:example.com" - “X对自己的描述是什么?” — 使用抓取公司主页,再进行总结。
web-scraping --output-format markdown - “找X的替代方案” — 或
google-serp --q "X alternatives"。google-shopping --q "X competitors" - “X的当前市场价是多少?” — 使用(通用)或
google-shopping(亚马逊专属),结合amazon-search提取价格分布数据。jq - “X的电话/地址是什么?” — 使用或
google-maps-place。切勿依赖训练数据猜测。yelp-place - “人们对X服务满意吗?” / “X靠谱吗?” — 获取负面评价样本;
google-maps-reviews --place-id ... --sort lowest查询雇主口碑。glassdoor-job - “Y岗位的薪资范围是多少?” — 按岗位和地点筛选,再通过
indeed-listing提取jq字段。.jobs[].salary - “找符合X条件的住宅/公寓” — /
zillow-listing/redfin-listing搭配对应筛选参数。airbnb-listing - “X附近近期已售房源” — 。
zillow-listing --type sold --keyword "X" --days-on-zillow 12m - “追踪这款产品的价格” — 定期循环运行,将
amazon-product --asin X字段存储至文件。.price - “X相关的热门内容是什么?” — 获取相对热度;
google-trends --q "X"获取头条新闻。google-news --q "X" - “找我附近提供X服务的商家” — ,再调用
google-maps --q "X" --ll "@LAT,LNG,12z"获取联系方式。google-maps-place - “在Y国访问这个页面是什么样子?” — 搜索命令添加参数,
--gl Y添加web-scraping参数。适用于地域定向SEO检查、访问地域限制内容。--proxy-country Y - “从这个页面提取结构化数据” — 。无需编写CSS选择器即可处理任意页面。
web-scraping --ai-extract-rules-json '{"price": {"type": "number"}, ...}' - “物品列表→单个物品详情” — 流程:搜索命令生成ID/URL,通过管道传递给,再调用对应的
xargs/*-property/*-product深度查询命令。*-place - “查找某人的职位/雇主/LinkedIn账号/粉丝数” — 先运行。自然搜索结果标题通常为
google-serp --q '"Person Name" linkedin',摘要包含地点、职位简介、人脉数量。通常无需打开主页即可通过搜索结果页获取全部信息。Name — Role at Company | LinkedIn - “X公司在做什么?总部在哪里?员工构成如何?” — 会返回
google-serp --q "$COMPANY"模块,包含创始人、总部、成立年份、母公司、员工规模等预提取信息。.knowledge_graph获取近期动态。通过定向搜索获取具体信息:google-news --q "$COMPANY",--q '"$COMPANY" headquarters',--q '"$COMPANY" funding'。--q 'site:linkedin.com/company "$COMPANY"' - “找X公司的邮箱” / “找Y个人的邮箱” — 先通过搜索结果页尝试:或
--q '"@example.com"'通常会找到Google收录的真实邮箱。针对个人可先猜测格式再通过搜索结果页验证。需向用户说明未经验证的猜测。--q '"jane@example.com"' - “丰富潜在客户CSV数据” — 逐行处理:获取LinkedIn账号、职位、雇主;再通过搜索验证邮箱或格式。除非缺少特定字段,否则优先使用搜索结果页。
google-serp - 反向查询(邮箱/电话/域名→身份) — 使用带引号的字面值(
google-serp,--q '"jane@x.com"',--q '"+1 555 123 4567"')几乎总能找到匹配的个人或商家。--q '"acme corp" site:example.com'
优先使用搜索结果页原则:对于任何数据丰富需求(人物、公司、邮箱、产品、地点),优先使用 / / / 。它们会返回Google已提取的结构化字段(, , 等),且无需应对反爬机制。仅当搜索结果页无法提供所需特定字段时,才使用——这是最后手段,而非默认选择。详见。
google-serpgoogle-newsgoogle-shoppinggoogle-maps.knowledge_graph.organic_results[].snippet.local_results[]web-scrapingreferences/enrichment.md若用户请求符合上述场景但未调用hasdata,你很可能给出了基于过期数据的错误回答。
Universal flag patterns
通用参数模式
- Kebab-case flag names. The CLI maps them back to the original camelCase before sending to the API.
- Booleans defaulting to have a paired negation:
true,--no-block-ads,--no-screenshot,--no-js-rendering,--no-extract-emails. Setting both--no-block-resourcesand--block-adserrors.--no-block-ads - Anything ending in accepts:
-json- inline JSON:
--extract-rules-json '{"title":"h1"}' - file:
--extract-rules-json @rules.json - stdin:
cat rules.json | hasdata web-scraping ... --extract-rules-json -
- inline JSON:
- Repeatable key=value flags split on the first (so values containing
=survive):=. Pair with--headers User-Agent=foo --headers Cookie=session=abcfor a JSON base; kv items override per key.--headers-json - List flags accept either repeats or comma-joined: or
--lr lang_en --lr lang_fr. Serialized as--lr lang_en,lang_frfor GET endpoints.key[]=value - Enum flags validate client-side. If you guess wrong, the error lists the allowed values — read the message and retry.
- 参数名称采用短横线命名法(Kebab-case)。CLI会在发送至API前将其转换为原始的驼峰命名法(camelCase)。
- 默认值为的布尔型参数配有对应的否定参数:
true,--no-block-ads,--no-screenshot,--no-js-rendering,--no-extract-emails。同时设置--no-block-resources和--block-ads会报错。--no-block-ads - 名称以结尾的参数接受:
-json- 内联JSON:
--extract-rules-json '{"title":"h1"}' - 文件:
--extract-rules-json @rules.json - 标准输入:
cat rules.json | hasdata web-scraping ... --extract-rules-json -
- 内联JSON:
- 可重复的键值对参数以第一个分割(因此值中包含
=的内容会被保留):=。可搭配--headers User-Agent=foo --headers Cookie=session=abc指定基础JSON,键值对会按键覆盖对应内容。--headers-json - 列表型参数接受重复输入或逗号分隔:或
--lr lang_en --lr lang_fr。对于GET端点会序列化为--lr lang_en,lang_fr。key[]=value - 枚举型参数会在客户端验证。若猜测错误,错误信息会列出允许的值——请阅读信息后重试。
Global flags (apply to every subcommand)
全局参数(适用于所有子命令)
| Flag | Effect |
|---|---|
| Write response bytes as-is (use this when piping to |
| Pretty-print JSON (default when stdout is a TTY) |
| Write response to file instead of stdout (works for binary like screenshots) |
| Log outgoing URL and |
| Override env var (rarely needed) |
| Per-request timeout (default 2m) |
| Max retries on 429/5xx (default 2) |
| 参数 | 作用 |
|---|---|
| 按原样输出响应字节(管道传递给 |
| 格式化输出JSON(当标准输出为TTY时默认启用) |
| 将响应写入文件而非标准输出(适用于截图等二进制内容) |
| 将请求URL和 |
| 覆盖环境变量中的密钥(极少需要) |
| 单请求超时时间(默认2分钟) |
| 429/5xx错误的最大重试次数(默认2次) |
Output contract
输出约定
Responses are JSON. Pipe through for extraction:
jqbash
hasdata google-serp --q "espresso machine" --num 10 --raw \
| jq -c '.organic_results[] | {title, link, snippet}'For real-estate / e-commerce results, the array shape is API-specific — read a single response with first to learn the schema, then write the filter.
--prettyjq响应为JSON格式。可通过管道传递给进行提取:
jqbash
hasdata google-serp --q "espresso machine" --num 10 --raw \
| jq -c '.organic_results[] | {title, link, snippet}'对于房产/电商结果,数组结构因API而异——先使用查看单个响应以了解模式,再编写过滤规则。
--prettyjqExit codes (script-safe)
退出码(适用于脚本)
| Code | Meaning |
|---|---|
| 0 | success |
| 1 | user / CLI-input error (missing required flag, bad enum value, missing API key) |
| 2 | network error |
| 3 | API returned 4xx (auth, quota, validation) |
| 4 | API returned 5xx |
| 代码 | 含义 |
|---|---|
| 0 | 成功 |
| 1 | 用户/CLI输入错误(缺少必填参数、枚举值无效、未配置API密钥) |
| 2 | 网络错误 |
| 3 | API返回4xx错误(认证、配额、验证失败) |
| 4 | API返回5xx错误 |
References
参考文档
- — person and company enrichment (LinkedIn lookup, emails, HQ/funding/news, CSV-row enrichment, reverse-lookup) — the highest-leverage cross-API workflows
references/enrichment.md - — Google SERP / Bing / News / Trends flag catalog
references/search.md - —
references/web-scraping.mdflags, JS scenarios, AI extractionweb-scraping - — Zillow / Redfin / Airbnb filters and bracketed params
references/real-estate.md - — Amazon / Shopify
references/ecommerce.md - — Maps / Yelp / YellowPages
references/local-business.md - — Indeed / Glassdoor
references/jobs.md - — full subcommand index with credit costs
references/all-commands.md
- — 人物与公司数据丰富(LinkedIn查询、邮箱、总部/融资/新闻、CSV行数据丰富、反向查询)——跨API最高效的工作流
references/enrichment.md - — Google搜索结果页 / Bing / 新闻 / 趋势参数目录
references/search.md - —
references/web-scraping.md参数、JS场景、AI提取web-scraping - — Zillow / Redfin / Airbnb筛选器和括号参数
references/real-estate.md - — Amazon / Shopify
references/ecommerce.md - — 地图 / Yelp / YellowPages
references/local-business.md - — Indeed / Glassdoor
references/jobs.md - — 完整子命令索引及费用说明
references/all-commands.md