healthcare-providers-extract
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHealthcare Providers Extract
医疗服务提供者数据提取
Structured practitioner extraction from healthcare practice websites, powered by
Nimble's web data APIs.
User request: $ARGUMENTS
Before running any commands, read for Claude Code
constraints (no shell state, no /, sub-agent permissions, communication style).
references/nimble-playbook.md&wait基于Nimble网页数据API,从医疗诊所网站提取结构化从业者数据。
用户请求:$ARGUMENTS
执行任何命令前,请阅读了解Claude Code的限制(无shell状态、禁止使用/、子Agent权限、沟通风格)。
references/nimble-playbook.md&waitInstructions
操作说明
Step 0: Preflight + WSA Discovery
步骤0:预检 + WSA发现
Run the preflight pattern from (5 simultaneous Bash
calls: date calc, today, CLI check, profile load, index.md load).
references/nimble-playbook.mdAlso simultaneously — run WSA discovery and setup:
mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null- Run Layer 1 (vertical) and Layer 3 (general tools) WSA discovery from
. Layer 2 (session-specific) runs after Step 1 when you know the user's specialty.
references/wsa-reference.md
Classify discovered agents into phases and validate with per
.
nimble agent getreferences/wsa-reference.mdFrom the preflight results:
- CLI missing or API key unset -> , stop
references/profile-and-onboarding.md - Profile exists -> note it for context. Determine mode using smart date windowing
from :
references/nimble-playbook.md- Full mode: first run OR last run > 14 days ago
- Quick refresh: last run < 14 days ago (re-extract only new/changed pages)
- Same-day repeat: if is today, check for existing report at
last_runs.healthcare-providers-extract. If found, ask: "Already ran today. Run again for fresh data?"~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
- No profile -> that's fine. This skill doesn't require onboarding. Proceed to Step 1.
执行中的预检流程(5个并行Bash调用:日期计算、当前日期、CLI检查、配置文件加载、index.md加载)。
references/nimble-playbook.md同时执行WSA发现与设置:
mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null- 从执行第1层(垂直领域)和第3层(通用工具)WSA发现。第2层(会话专属)将在步骤1确定用户所需专业领域后执行。
references/wsa-reference.md
将发现的Agent按阶段分类,并根据使用验证。
references/wsa-reference.mdnimble agent get根据预检结果处理:
- CLI缺失或API密钥未设置 -> 参考,停止操作
references/profile-and-onboarding.md - 配置文件存在 -> 记录上下文信息。根据中的智能日期窗口确定模式:
references/nimble-playbook.md- 完整模式:首次运行 或 上次运行已超过14天
- 快速刷新模式:上次运行不足14天(仅重新提取新增/变更页面)
- 当日重复运行:若为今日,检查
last_runs.healthcare-providers-extract是否存在现有报告。若存在,询问:“今日已运行过该任务。是否重新运行以获取最新数据?”~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
- 无配置文件 -> 无需担心,本技能无需完成注册流程。继续执行步骤1。
Step 1: Parse Input & Starting Questions
步骤1:解析输入与初始问题
Parse for input type using the Input Parsing Pattern from
. Key routing:
$ARGUMENTSreferences/nimble-playbook.md- URLs detected -> proceed to Step 3
- Specialty + location (no URLs) -> proceed to Step 2 (practice discovery)
- Unclear -> ask (counts as 1 of max 2 prompts)
If input is clear, confirm and ask one shaping question (plain text, not
AskUserQuestion):
"Extracting providers from N practice sites. Quick questions:
- Healthcare vertical? (ophthalmology, dental, dermatology, general, or other)
- Quick scan (names + credentials only) or full extraction (all 5 fields)?"
If input is ambiguous, use AskUserQuestion (counts as 1 of max 2 prompts):
What practice sites should I extract providers from?
- Paste URLs directly (one per line)
- Provide a CSV file path or Google Sheet URL with practice URLs
- Or describe what you're looking for (e.g., "ophthalmologists in Austin, TX") and I'll find practices first
Skip questions the user already answered in their initial message.
使用中的输入解析模式解析的输入类型。核心路由规则:
references/nimble-playbook.md$ARGUMENTS- 检测到URL -> 进入步骤3
- 专业领域+地点(无URL)-> 进入步骤2(诊所发现)
- 输入模糊 -> 询问用户(最多允许2次提示,此为第1次)
若输入清晰,确认后提出一个明确问题(纯文本,不使用AskUserQuestion):
“将从N个诊所网站提取医疗服务提供者信息。快速确认:
- 医疗垂直领域?(眼科、牙科、皮肤科、全科或其他)
- 仅快速扫描(姓名+资质)还是完整提取(全部5类字段)?”
若输入模糊,使用AskUserQuestion(最多允许2次提示,此为第1次):
请明确需要提取哪些诊所的医疗服务提供者信息?
- 直接粘贴URL(每行一个)
- 提供包含诊所URL的CSV文件路径或Google表格链接
- 或描述需求(例如:“德克萨斯州奥斯汀市的眼科医生”),我将先为您查找相关诊所
跳过用户在初始请求中已回答的问题。
Step 2: Practice Discovery (Optional)
步骤2:诊所发现(可选)
Only if the user provided a specialty + location instead of URLs.
Two input paths into discovery:
Path A — Fresh discovery. User gave specialty + location. Run Layer 2 WSA
discovery for session-specific agents:
bash
nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"See for the full discovery strategy, agent evaluation
criteria, and healthcare discovery prioritization.
references/wsa-reference.mdRun all discovery-phase agents simultaneously. Validate params with
first.
nimble agent getPath B — Market-finder handoff. User ran first and wants to
extract providers from those results. Read the market-finder output:
market-finderbash
cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/nullExtract practice records. Note: Google Maps results contain (a Maps
link) but not the practice's actual website URL. Proceed to Step 2b to resolve
real website URLs before site mapping.
place_urlAfter either path: Deduplicate by domain. Present discovered practices:
"Found N practices for [specialty] in [location] across [M] data sources. Proceeding to extract providers from these sites..."
Fallback — if no discovery WSAs were found, or results are sparse (< 3):
bash
nimble search --query "[specialty] in [location]" --max-results 20 --search-depth lite仅当用户未提供URL,而是给出专业领域+地点时执行此步骤。
发现流程分为两种路径:
路径A — 全新发现。用户提供了专业领域+地点。执行会话专属Agent的第2层WSA发现:
bash
nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"完整发现策略、Agent评估标准及医疗领域发现优先级请参考。
references/wsa-reference.md并行运行所有发现阶段的Agent,先使用验证参数。
nimble agent get路径B — 对接market-finder。用户先运行了,希望从其结果中提取医疗服务提供者信息。读取market-finder输出:
market-finderbash
cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/null提取诊所记录。注意:Google Maps结果包含(Maps链接),但不包含诊所的实际网站URL。进入步骤2b解析真实网站URL后,再进行站点映射。
place_url完成任一路径后:按域名去重,展示发现的诊所:
“在[M]个数据源中为[location]地区的[specialty]领域找到了N家诊所。即将从这些网站提取医疗服务提供者信息...”
备选方案 — 若未发现WSA,或结果不足(<3家):
bash
nimble search --query "[specialty] in [location]" --max-results 20 --search-depth liteStep 2b: Resolve Practice Website URLs
步骤2b:解析诊所网站URL
Discovery sources (Google Maps, Yelp, BBB) return listing URLs, not practice
website URLs. Before site mapping, resolve the actual website for each practice:
- Check structured data first — Google Maps results often include a field in the structured output. Use it if present.
website - Extract from listing page — if no field, extract the Maps listing to find the practice website link:
websitebashnimble extract --url "[maps-listing-url]" --format markdown - Search fallback — if extraction fails:
bash
nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite
Skip practices where no website URL can be resolved — note them in the "Data
Quality Summary" output section.
发现数据源(Google Maps、Yelp、BBB)返回的是列表URL,而非诊所官网URL。在站点映射前,需为每家诊所解析真实官网:
- 优先检查结构化数据 — Google Maps结果通常在结构化输出中包含字段。若存在则直接使用。
website - 从列表页提取 — 若无字段,提取Maps列表页内容以查找诊所官网链接:
websitebashnimble extract --url "[maps-listing-url]" --format markdown - 搜索备选方案 — 若提取失败:
bash
nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite
跳过无法解析官网URL的诊所,并在输出的“数据质量总结”部分记录。
Step 3: Site Mapping
步骤3:站点映射
Follow the Site Mapping Pattern from for each
practice URL. Skill-specific settings:
references/nimble-playbook.md- Keyword weight table:
references/provider-extraction-patterns.md - Page cap: 15 per site
- Fallback query:
site:[domain] doctors OR providers OR team
For 6+ practices, use sub-agents (see Sub-Agent Strategy below).
Save checkpoint:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.json为每个诊所URL执行中的站点映射模式。技能专属设置:
references/nimble-playbook.md- 关键词权重表:
references/provider-extraction-patterns.md - 页面上限:每个站点最多15页
- 备选查询:
site:[domain] doctors OR providers OR team
若诊所数量≥6,使用子Agent(详见下文子Agent策略)。
保存检查点:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.jsonStep 4: Page Extraction
步骤4:页面提取
WSA shortcuts first: If WSA discovery found agents that extract provider data
from healthcare directories, use those for matching practices — structured WSA
output is higher quality than parsed markdown.
For all other practices, follow the Page Extraction with Retry pattern from
. Scale using the Scaled Execution pattern from
the same reference.
references/nimble-playbook.mdSave checkpoint:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.json优先使用WSA快捷方式:若WSA发现了可从医疗目录提取提供者数据的Agent,对匹配诊所使用这些Agent——结构化WSA输出质量高于解析后的markdown。
对于其他所有诊所,执行中的带重试机制页面提取模式。参考同一文档中的大规模执行模式进行扩展。
references/nimble-playbook.md保存检查点:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.jsonStep 5: Structured Parsing
步骤5:结构化解析
Parse extracted markdown to identify providers and their fields. Read
for the 5 core fields, credential
regex patterns, and specialty keywords.
references/provider-extraction-patterns.mdFor each extracted page:
- Scan for provider name patterns (Dr. prefix, heading patterns, bold text near credential suffixes)
- Match credentials using the regex patterns from
references/provider-extraction-patterns.md - Match specialty using keywords for the detected healthcare vertical
- Extract contact info (phone regex, appointment URLs, email)
- Extract education/training mentions
Build structured records:
json
{
"name": "Dr. Jane Smith",
"credentials": "MD, FACS",
"specialty": "Retinal Surgery",
"contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
"education": "Fellowship: Bascom Palmer Eye Institute",
"source_url": "https://practice.com/our-doctors",
"practice_name": "Shore Center for Eye Care",
"practice_url": "https://practice.com",
"confidence": "High"
}解析提取的markdown以识别医疗服务提供者及其字段。请阅读了解5类核心字段、资质正则表达式模式及专业领域关键词。
references/provider-extraction-patterns.md针对每个提取页面:
- 扫描提供者姓名模式(Dr.前缀、标题格式、资质后缀附近的加粗文本)
- 使用中的正则表达式匹配资质
references/provider-extraction-patterns.md - 使用检测到的医疗垂直领域关键词匹配专业领域
- 提取联系信息(电话正则表达式、预约URL、邮箱)
- 提取教育/培训相关信息
构建结构化记录:
json
{
"name": "Dr. Jane Smith",
"credentials": "MD, FACS",
"specialty": "Retinal Surgery",
"contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
"education": "Fellowship: Bascom Palmer Eye Institute",
"source_url": "https://practice.com/our-doctors",
"practice_name": "Shore Center for Eye Care",
"practice_url": "https://practice.com",
"confidence": "High"
}Step 6: Deduplication & Confidence Scoring
步骤6:去重与置信度评分
Follow the Entity Deduplication and Entity Confidence Scoring patterns from
. Skill-specific dedup rules and the 5-field
confidence criteria are in .
references/nimble-playbook.mdreferences/provider-extraction-patterns.md执行中的实体去重和实体置信度评分模式。技能专属去重规则及5字段置信度标准请参考。
references/nimble-playbook.mdreferences/provider-extraction-patterns.mdStep 7: Output
步骤7:输出
Present results grouped by practice, sorted by confidence within each practice.
markdown
undefined按诊所分组展示结果,每组内按置信度排序。
markdown
undefinedProvider Extraction: [N] Providers from [M] Practices
医疗服务提供者提取结果:[N]位提供者来自[M]家诊所
[Date] | [H] High, [M] Medium, [L] Low confidence
[日期] | 高置信度[H]、中置信度[M]、低置信度[L]
TL;DR
摘要
Extracted [N] providers from [M] practice websites. [H] with complete profiles,
[L] with partial data. [Key finding: e.g., "12 of 15 providers are board-certified"].
从[M]家诊所网站提取到[N]位医疗服务提供者信息。其中[H]位拥有完整资料,[L]位资料不全。[关键发现:例如“15位提供者中有12位具备委员会认证资质”]。
[Practice Name] ([domain])
[诊所名称] ([domain])
| # | Name | Credentials | Specialty | Contact | Education | Confidence |
|---|---|---|---|---|---|---|
| 1 | Dr. Jane Smith | MD, FACS | Retinal Surgery | (555) 123-4567 | Fellowship: Bascom Palmer | High |
| 2 | Dr. John Doe | OD | General Ophthalmology | Book | Residency: Wills Eye | Medium |
[Repeat per practice]
| # | 姓名 | 资质 | 专业领域 | 联系方式 | 教育背景 | 置信度 |
|---|---|---|---|---|---|---|
| 1 | Dr. Jane Smith | MD, FACS | Retinal Surgery | (555) 123-4567 | Fellowship: Bascom Palmer | 高 |
| 2 | Dr. John Doe | OD | General Ophthalmology | 预约 | Residency: Wills Eye | 中 |
[按诊所重复上述表格]
Data Quality Summary
数据质量总结
- Complete profiles (High): [N] providers
- Partial profiles (Medium): [N] providers — missing: [list common gaps]
- Minimal profiles (Low): [N] providers — missing: [list common gaps]
- 完整资料(高置信度):[N]位提供者
- 部分资料(中置信度):[N]位提供者 — 缺失:[常见缺失字段列表]
- 基础资料(低置信度):[N]位提供者 — 缺失:[常见缺失字段列表]
Sources
数据源
[Clickable URL for every page extracted, grouped by practice]
**Source links are mandatory.** Every provider record must trace back to a source URL.[每个提取页面的可点击URL,按诊所分组]
**必须包含来源链接**。每位提供者记录都必须可追溯至来源URL。Step 8: Save to Memory
步骤8:保存至内存
Make all Write calls simultaneously:
- Report ->
~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md - Provider data ->
~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json - Profile -> update in
last_runs.healthcare-providers-extract(only if profile exists)~/.nimble/business-profile.json - Follow the wiki update pattern from : update
references/memory-and-distribution.mdrows for all affected entity files, append aindex.mdentry for this run.log.md - Clean up checkpoint (complete run) or keep (partial run)
同时执行所有写入操作:
- 报告 ->
~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md - 提供者数据 ->
~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json - 配置文件 -> 若配置文件存在,更新中的
~/.nimble/business-profile.json字段last_runs.healthcare-providers-extract - 按照中的wiki更新模式:更新所有受影响实体文件的
references/memory-and-distribution.md行,在index.md中添加本次运行记录log.md - 清理检查点(运行完成)或保留检查点(运行中断)
Step 9: Share & Distribute
步骤9:分享与分发
Always offer distribution -- do not skip. Follow
for connector detection and sharing flow.
references/memory-and-distribution.mdNotion: full provider table as a dated subpage.
Slack: TL;DR with provider count and confidence breakdown only.
必须提供分发选项,不可跳过。参考进行连接器检测和分享流程。
references/memory-and-distribution.mdNotion:将完整提供者表格作为带日期的子页面分享。
Slack:仅分享摘要,包含提供者数量和置信度分布。
Step 10: Follow-ups
步骤10:后续操作
- "Tell me more about Dr. X" -> show full extracted profile
- "Export as CSV" -> generate CSV from providers.json
- "Run on more sites" -> append new practice URLs, extract and merge
- "What's missing?" -> detail the data gaps per provider
Enrichment from discovered WSAs: If Step 0 found enrichment-phase agents
(reviews, regulatory, practice details), offer them as immediate follow-ups:
"I also found [N] WSAs that could enrich this data: [brief list]. Want me to run reputation checks or regulatory lookups on these providers/practices?"
See for enrichment phase mapping and fallback chains.
references/wsa-reference.mdSibling skill suggestions:
Next steps:
- Run
to fill data gaps (NPI lookup, board certification verification, additional contact info)healthcare-providers-enrich- Run
to validate credentials and license statushealthcare-providers-verify- Run
to discover more practice URLs in this areamarket-finder
- “告诉我更多关于Dr. X的信息” -> 展示该提供者的完整提取资料
- “导出为CSV” -> 从providers.json生成CSV文件
- “在更多网站上运行” -> 添加新诊所URL,提取并合并数据
- “缺少哪些信息?” -> 详细说明每位提供者的数据空白
从发现的WSA进行数据补充:若步骤0发现了补充阶段的Agent(评价、监管、诊所详情),立即提供这些选项作为后续操作:
“我还发现了[N]个可用于补充此数据的WSA:[简要列表]。是否需要对这些提供者/诊所进行声誉检查或监管信息查询?”
补充阶段映射及备选流程请参考。
references/wsa-reference.md同类技能建议:
下一步操作建议:
- 运行
填补数据空白(NPI查询、委员会认证验证、额外联系信息)healthcare-providers-enrich- 运行
验证资质和执照状态healthcare-providers-verify- 运行
发现该地区更多诊所URLmarket-finder
Sub-Agent Strategy
子Agent策略
For batch extraction (6+ practices), use agents
() to parallelize site mapping and extraction.
nimble-researcheragents/nimble-researcher.mdFollow the sub-agent spawning rules from
(bypassPermissions, batch max 4, explicit Bash instruction, fallback on failure).
references/nimble-playbook.mdSpawn pattern: One agent per practice (or per batch of 3 practices for large
jobs). Each agent runs Steps 3-5 for its assigned practices and returns structured
provider records.
Single-practice optimization: If only 1-2 practices, run directly from the
main context instead of spawning agents.
Fallback: If any agent fails, run those extractions directly from the main
context. Never leave gaps in the output.
对于批量提取(≥6家诊所),使用 Agent()并行执行站点映射和提取操作。
nimble-researcheragents/nimble-researcher.md遵循中的子Agent生成规则(bypassPermissions、批量上限4个、明确Bash指令、失败备选方案)。
references/nimble-playbook.md生成模式:为每家诊所分配一个Agent(大型任务可为每3家诊所分配一个Agent)。每个Agent为其负责的诊所执行步骤3-5,并返回结构化提供者记录。
单诊所优化:若仅1-2家诊所,直接在主上下文运行,无需生成子Agent。
备选方案:若任一Agent失败,直接在主上下文执行该提取任务。输出结果不可存在空白。
Error Handling
错误处理
See for the standard error table (missing API key,
429, 401, empty results, extraction garbage). Skill-specific errors:
references/nimble-playbook.md- No provider pages found: "Couldn't find provider/team pages on [domain]. The site may list staff differently. Want me to try extracting from the homepage or search for this practice on healthcare directories?"
- All extractions returned garbage: "The practice sites appear to be heavily
JavaScript-rendered. Retrying with browser rendering..." (auto-retry with
per the shared pattern)
--render - Ambiguous practice name: If a URL fails and the user provided a name instead,
search for the practice:
nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite - CSV/Sheet parse error: "Couldn't parse the input file. Expected a column with practice URLs. Can you paste the URLs directly instead?"
标准错误表(缺失API密钥、429、401、无结果、提取无效内容)请参考。技能专属错误处理:
references/nimble-playbook.md- 未找到提供者页面:“在[domain]上未找到提供者/团队页面。该网站可能采用其他方式展示员工信息。是否需要尝试从首页提取,或在医疗目录中搜索该诊所?”
- 所有提取结果无效:“诊所网站似乎采用重度JavaScript渲染。将使用浏览器渲染重试...”(按照共享模式自动添加参数重试)
--render - 诊所名称模糊:若URL无效且用户仅提供了名称,搜索该诊所:
nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite - CSV/表格解析错误:“无法解析输入文件。预期包含诊所URL的列。能否直接粘贴URL?”