healthcare-providers-extract

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Healthcare Providers Extract

医疗服务提供者数据提取

Structured practitioner extraction from healthcare practice websites, powered by Nimble's web data APIs.

User request: $ARGUMENTS

Before running any commands, read

references/nimble-playbook.md

for Claude Code constraints (no shell state, no

wait

, sub-agent permissions, communication style).

基于Nimble网页数据API，从医疗诊所网站提取结构化从业者数据。

用户请求：$ARGUMENTS

执行任何命令前，请阅读

references/nimble-playbook.md

了解Claude Code的限制（无shell状态、禁止使用

wait

、子Agent权限、沟通风格）。

Instructions

操作说明

Step 0: Preflight + WSA Discovery

步骤0：预检 + WSA发现

Run the preflight pattern from

references/nimble-playbook.md

(5 simultaneous Bash calls: date calc, today, CLI check, profile load, index.md load).

Also simultaneously — run WSA discovery and setup:

mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}

ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null

Run Layer 1 (vertical) and Layer 3 (general tools) WSA discovery from
```
references/wsa-reference.md
```
. Layer 2 (session-specific) runs after Step 1 when you know the user's specialty.

Classify discovered agents into phases and validate with

nimble agent get

per

references/wsa-reference.md

From the preflight results:

CLI missing or API key unset ->
```
references/profile-and-onboarding.md
```
, stop
Profile exists -> note it for context. Determine mode using smart date windowing from
```
references/nimble-playbook.md
```
:
- Full mode: first run OR last run > 14 days ago
- Quick refresh: last run < 14 days ago (re-extract only new/changed pages)
- Same-day repeat: if
```
last_runs.healthcare-providers-extract
```
  is today, check for existing report at
```
~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
```
  . If found, ask: "Already ran today. Run again for fresh data?"
No profile -> that's fine. This skill doesn't require onboarding. Proceed to Step 1.

执行

references/nimble-playbook.md

中的预检流程（5个并行Bash调用：日期计算、当前日期、CLI检查、配置文件加载、index.md加载）。

同时执行WSA发现与设置：

mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}

ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null

从
```
references/wsa-reference.md
```
执行第1层（垂直领域）和第3层（通用工具）WSA发现。第2层（会话专属）将在步骤1确定用户所需专业领域后执行。

将发现的Agent按阶段分类，并根据

references/wsa-reference.md

使用

nimble agent get

验证。

根据预检结果处理：

CLI缺失或API密钥未设置 -> 参考
```
references/profile-and-onboarding.md
```
，停止操作
配置文件存在 -> 记录上下文信息。根据
```
references/nimble-playbook.md
```
中的智能日期窗口确定模式：
- 完整模式：首次运行或上次运行已超过14天
- 快速刷新模式：上次运行不足14天（仅重新提取新增/变更页面）
- 当日重复运行：若
```
last_runs.healthcare-providers-extract
```
  为今日，检查
```
~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
```
  是否存在现有报告。若存在，询问：“今日已运行过该任务。是否重新运行以获取最新数据？”
无配置文件 -> 无需担心，本技能无需完成注册流程。继续执行步骤1。

Step 1: Parse Input & Starting Questions

步骤1：解析输入与初始问题

Parse

$ARGUMENTS

for input type using the Input Parsing Pattern from

references/nimble-playbook.md

. Key routing:

URLs detected -> proceed to Step 3
Specialty + location (no URLs) -> proceed to Step 2 (practice discovery)
Unclear -> ask (counts as 1 of max 2 prompts)

If input is clear, confirm and ask one shaping question (plain text, not AskUserQuestion):

"Extracting providers from N practice sites. Quick questions:

Healthcare vertical? (ophthalmology, dental, dermatology, general, or other)

Quick scan (names + credentials only) or full extraction (all 5 fields)?"

If input is ambiguous, use AskUserQuestion (counts as 1 of max 2 prompts):

What practice sites should I extract providers from?

Paste URLs directly (one per line)

Provide a CSV file path or Google Sheet URL with practice URLs

Or describe what you're looking for (e.g., "ophthalmologists in Austin, TX") and I'll find practices first

Skip questions the user already answered in their initial message.

使用

references/nimble-playbook.md

中的输入解析模式解析

$ARGUMENTS

的输入类型。核心路由规则：

检测到URL -> 进入步骤3
专业领域+地点（无URL）-> 进入步骤2（诊所发现）
输入模糊 -> 询问用户（最多允许2次提示，此为第1次）

若输入清晰，确认后提出一个明确问题（纯文本，不使用AskUserQuestion）：

“将从N个诊所网站提取医疗服务提供者信息。快速确认：

医疗垂直领域？（眼科、牙科、皮肤科、全科或其他）

仅快速扫描（姓名+资质）还是完整提取（全部5类字段）？”

若输入模糊，使用AskUserQuestion（最多允许2次提示，此为第1次）：

请明确需要提取哪些诊所的医疗服务提供者信息？

直接粘贴URL（每行一个）

提供包含诊所URL的CSV文件路径或Google表格链接

或描述需求（例如：“德克萨斯州奥斯汀市的眼科医生”），我将先为您查找相关诊所

跳过用户在初始请求中已回答的问题。

Step 2: Practice Discovery (Optional)

步骤2：诊所发现（可选）

Only if the user provided a specialty + location instead of URLs.

Two input paths into discovery:

Path A — Fresh discovery. User gave specialty + location. Run Layer 2 WSA discovery for session-specific agents:

bash

nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"

See

references/wsa-reference.md

for the full discovery strategy, agent evaluation criteria, and healthcare discovery prioritization.

Run all discovery-phase agents simultaneously. Validate params with

nimble agent get

first.

Path B — Market-finder handoff. User ran

market-finder

first and wants to extract providers from those results. Read the market-finder output:

bash

cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/null

Extract practice records. Note: Google Maps results contain

place_url

(a Maps link) but not the practice's actual website URL. Proceed to Step 2b to resolve real website URLs before site mapping.

After either path: Deduplicate by domain. Present discovered practices:

"Found N practices for [specialty] in [location] across [M] data sources. Proceeding to extract providers from these sites..."

Fallback — if no discovery WSAs were found, or results are sparse (< 3):

bash

nimble search --query "[specialty] in [location]" --max-results 20 --search-depth lite

仅当用户未提供URL，而是给出专业领域+地点时执行此步骤。

发现流程分为两种路径：

路径A — 全新发现。用户提供了专业领域+地点。执行会话专属Agent的第2层WSA发现：

bash

nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"

完整发现策略、Agent评估标准及医疗领域发现优先级请参考

references/wsa-reference.md

。

并行运行所有发现阶段的Agent，先使用

nimble agent get

验证参数。

路径B — 对接market-finder。用户先运行了

market-finder

，希望从其结果中提取医疗服务提供者信息。读取market-finder输出：

bash

cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/null

提取诊所记录。注意：Google Maps结果包含

place_url

（Maps链接），但不包含诊所的实际网站URL。进入步骤2b解析真实网站URL后，再进行站点映射。

完成任一路径后：按域名去重，展示发现的诊所：

“在[M]个数据源中为[location]地区的[specialty]领域找到了N家诊所。即将从这些网站提取医疗服务提供者信息...”

备选方案 — 若未发现WSA，或结果不足（<3家）：

bash

nimble search --query "[specialty] in [location]" --max-results 20 --search-depth lite

Step 2b: Resolve Practice Website URLs

步骤2b：解析诊所网站URL

Discovery sources (Google Maps, Yelp, BBB) return listing URLs, not practice website URLs. Before site mapping, resolve the actual website for each practice:

Check structured data first — Google Maps results often include a
```
website
```
field in the structured output. Use it if present.
Extract from listing page — if no
```
website
```
field, extract the Maps listing to find the practice website link:
bash
```
nimble extract --url "[maps-listing-url]" --format markdown
```

Search fallback — if extraction fails:

bash

nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite

Skip practices where no website URL can be resolved — note them in the "Data Quality Summary" output section.

发现数据源（Google Maps、Yelp、BBB）返回的是列表URL，而非诊所官网URL。在站点映射前，需为每家诊所解析真实官网：

优先检查结构化数据 — Google Maps结果通常在结构化输出中包含
```
website
```
字段。若存在则直接使用。
从列表页提取 — 若无
```
website
```
字段，提取Maps列表页内容以查找诊所官网链接：
bash
```
nimble extract --url "[maps-listing-url]" --format markdown
```

搜索备选方案 — 若提取失败：

bash

nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite

跳过无法解析官网URL的诊所，并在输出的“数据质量总结”部分记录。

Step 3: Site Mapping

步骤3：站点映射

Follow the Site Mapping Pattern from

references/nimble-playbook.md

for each practice URL. Skill-specific settings:

Keyword weight table:

references/provider-extraction-patterns.md

Page cap: 15 per site

Fallback query:

site:[domain] doctors OR providers OR team

For 6+ practices, use sub-agents (see Sub-Agent Strategy below).

Save checkpoint:

~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.json

为每个诊所URL执行

references/nimble-playbook.md

中的站点映射模式。技能专属设置：

关键词权重表：

references/provider-extraction-patterns.md

页面上限：每个站点最多15页

备选查询：

site:[domain] doctors OR providers OR team

若诊所数量≥6，使用子Agent（详见下文子Agent策略）。

保存检查点：

~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.json

Step 4: Page Extraction

步骤4：页面提取

WSA shortcuts first: If WSA discovery found agents that extract provider data from healthcare directories, use those for matching practices — structured WSA output is higher quality than parsed markdown.

For all other practices, follow the Page Extraction with Retry pattern from

references/nimble-playbook.md

. Scale using the Scaled Execution pattern from the same reference.

Save checkpoint:

~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.json

优先使用WSA快捷方式：若WSA发现了可从医疗目录提取提供者数据的Agent，对匹配诊所使用这些Agent——结构化WSA输出质量高于解析后的markdown。

对于其他所有诊所，执行

references/nimble-playbook.md

中的带重试机制页面提取模式。参考同一文档中的大规模执行模式进行扩展。

保存检查点：

~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.json

Step 5: Structured Parsing

步骤5：结构化解析

Parse extracted markdown to identify providers and their fields. Read

references/provider-extraction-patterns.md

for the 5 core fields, credential regex patterns, and specialty keywords.

For each extracted page:

Scan for provider name patterns (Dr. prefix, heading patterns, bold text near credential suffixes)
Match credentials using the regex patterns from
```
references/provider-extraction-patterns.md
```
Match specialty using keywords for the detected healthcare vertical
Extract contact info (phone regex, appointment URLs, email)
Extract education/training mentions

Build structured records:

json

{
  "name": "Dr. Jane Smith",
  "credentials": "MD, FACS",
  "specialty": "Retinal Surgery",
  "contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
  "education": "Fellowship: Bascom Palmer Eye Institute",
  "source_url": "https://practice.com/our-doctors",
  "practice_name": "Shore Center for Eye Care",
  "practice_url": "https://practice.com",
  "confidence": "High"
}

解析提取的markdown以识别医疗服务提供者及其字段。请阅读

references/provider-extraction-patterns.md

了解5类核心字段、资质正则表达式模式及专业领域关键词。

针对每个提取页面：

扫描提供者姓名模式（Dr.前缀、标题格式、资质后缀附近的加粗文本）

使用

references/provider-extraction-patterns.md

中的正则表达式匹配资质

使用检测到的医疗垂直领域关键词匹配专业领域
提取联系信息（电话正则表达式、预约URL、邮箱）
提取教育/培训相关信息

构建结构化记录：

json

{
  "name": "Dr. Jane Smith",
  "credentials": "MD, FACS",
  "specialty": "Retinal Surgery",
  "contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
  "education": "Fellowship: Bascom Palmer Eye Institute",
  "source_url": "https://practice.com/our-doctors",
  "practice_name": "Shore Center for Eye Care",
  "practice_url": "https://practice.com",
  "confidence": "High"
}

Step 6: Deduplication & Confidence Scoring

步骤6：去重与置信度评分

Follow the Entity Deduplication and Entity Confidence Scoring patterns from

references/nimble-playbook.md

. Skill-specific dedup rules and the 5-field confidence criteria are in

references/provider-extraction-patterns.md

执行

references/nimble-playbook.md

中的实体去重和实体置信度评分模式。技能专属去重规则及5字段置信度标准请参考

references/provider-extraction-patterns.md

。

Step 7: Output

步骤7：输出

Present results grouped by practice, sorted by confidence within each practice.

markdown

undefined

按诊所分组展示结果，每组内按置信度排序。

markdown

undefined

Provider Extraction: [N] Providers from [M] Practices

医疗服务提供者提取结果：[N]位提供者来自[M]家诊所

[Date] | [H] High, [M] Medium, [L] Low confidence

[日期] | 高置信度[H]、中置信度[M]、低置信度[L]

TL;DR

摘要

Extracted [N] providers from [M] practice websites. [H] with complete profiles, [L] with partial data. [Key finding: e.g., "12 of 15 providers are board-certified"].

从[M]家诊所网站提取到[N]位医疗服务提供者信息。其中[H]位拥有完整资料，[L]位资料不全。[关键发现：例如“15位提供者中有12位具备委员会认证资质”]。

[Practice Name] ([domain])

[诊所名称] ([domain])

#	Name	Credentials	Specialty	Contact	Education	Confidence
1	Dr. Jane Smith	MD, FACS	Retinal Surgery	(555) 123-4567	Fellowship: Bascom Palmer	High
2	Dr. John Doe	OD	General Ophthalmology	Book	Residency: Wills Eye	Medium

[Repeat per practice]

#	姓名	资质	专业领域	联系方式	教育背景	置信度
1	Dr. Jane Smith	MD, FACS	Retinal Surgery	(555) 123-4567	Fellowship: Bascom Palmer	高
2	Dr. John Doe	OD	General Ophthalmology	预约	Residency: Wills Eye	中

[按诊所重复上述表格]

Data Quality Summary

数据质量总结

Complete profiles (High): [N] providers
Partial profiles (Medium): [N] providers — missing: [list common gaps]
Minimal profiles (Low): [N] providers — missing: [list common gaps]

完整资料（高置信度）：[N]位提供者
部分资料（中置信度）：[N]位提供者 — 缺失：[常见缺失字段列表]
基础资料（低置信度）：[N]位提供者 — 缺失：[常见缺失字段列表]

Sources

数据源

[Clickable URL for every page extracted, grouped by practice]


**Source links are mandatory.** Every provider record must trace back to a source URL.

[每个提取页面的可点击URL，按诊所分组]


**必须包含来源链接**。每位提供者记录都必须可追溯至来源URL。

Step 8: Save to Memory

步骤8：保存至内存

Make all Write calls simultaneously:

Report ->

~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md

Provider data ->

~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json

Profile -> update

last_runs.healthcare-providers-extract

~/.nimble/business-profile.json

(only if profile exists)

Follow the wiki update pattern from
```
references/memory-and-distribution.md
```
: update
```
index.md
```
rows for all affected entity files, append a
```
log.md
```
entry for this run.
Clean up checkpoint (complete run) or keep (partial run)

同时执行所有写入操作：

报告 ->

~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md

提供者数据 ->

~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json

配置文件 -> 若配置文件存在，更新

~/.nimble/business-profile.json

中的

last_runs.healthcare-providers-extract

字段

按照
```
references/memory-and-distribution.md
```
中的wiki更新模式：更新所有受影响实体文件的
```
index.md
```
行，在
```
log.md
```
中添加本次运行记录
清理检查点（运行完成）或保留检查点（运行中断）

Step 9: Share & Distribute

步骤9：分享与分发

Always offer distribution -- do not skip. Follow

references/memory-and-distribution.md

for connector detection and sharing flow.

Notion: full provider table as a dated subpage. Slack: TL;DR with provider count and confidence breakdown only.

必须提供分发选项，不可跳过。参考

references/memory-and-distribution.md

进行连接器检测和分享流程。

Notion：将完整提供者表格作为带日期的子页面分享。 Slack：仅分享摘要，包含提供者数量和置信度分布。

Step 10: Follow-ups

步骤10：后续操作

"Tell me more about Dr. X" -> show full extracted profile
"Export as CSV" -> generate CSV from providers.json
"Run on more sites" -> append new practice URLs, extract and merge
"What's missing?" -> detail the data gaps per provider

Enrichment from discovered WSAs: If Step 0 found enrichment-phase agents (reviews, regulatory, practice details), offer them as immediate follow-ups:

"I also found [N] WSAs that could enrich this data: [brief list]. Want me to run reputation checks or regulatory lookups on these providers/practices?"

See

references/wsa-reference.md

for enrichment phase mapping and fallback chains.

Sibling skill suggestions:

Next steps:
Run
healthcare-providers-enrich
to fill data gaps (NPI lookup, board certification verification, additional contact info)
Run
healthcare-providers-verify
to validate credentials and license status
Run
market-finder
to discover more practice URLs in this area

“告诉我更多关于Dr. X的信息” -> 展示该提供者的完整提取资料
“导出为CSV” -> 从providers.json生成CSV文件
“在更多网站上运行” -> 添加新诊所URL，提取并合并数据
“缺少哪些信息？” -> 详细说明每位提供者的数据空白

从发现的WSA进行数据补充：若步骤0发现了补充阶段的Agent（评价、监管、诊所详情），立即提供这些选项作为后续操作：

“我还发现了[N]个可用于补充此数据的WSA：[简要列表]。是否需要对这些提供者/诊所进行声誉检查或监管信息查询？”

补充阶段映射及备选流程请参考

references/wsa-reference.md

。

同类技能建议：

下一步操作建议：
运行
healthcare-providers-enrich
填补数据空白（NPI查询、委员会认证验证、额外联系信息）
运行
healthcare-providers-verify
验证资质和执照状态
运行
market-finder
发现该地区更多诊所URL

Sub-Agent Strategy

子Agent策略

For batch extraction (6+ practices), use

nimble-researcher

agents (

agents/nimble-researcher.md

) to parallelize site mapping and extraction.

Follow the sub-agent spawning rules from

references/nimble-playbook.md

(bypassPermissions, batch max 4, explicit Bash instruction, fallback on failure).

Spawn pattern: One agent per practice (or per batch of 3 practices for large jobs). Each agent runs Steps 3-5 for its assigned practices and returns structured provider records.

Single-practice optimization: If only 1-2 practices, run directly from the main context instead of spawning agents.

Fallback: If any agent fails, run those extractions directly from the main context. Never leave gaps in the output.

对于批量提取（≥6家诊所），使用

nimble-researcher

Agent（

agents/nimble-researcher.md

）并行执行站点映射和提取操作。

遵循

references/nimble-playbook.md

中的子Agent生成规则（bypassPermissions、批量上限4个、明确Bash指令、失败备选方案）。

生成模式：为每家诊所分配一个Agent（大型任务可为每3家诊所分配一个Agent）。每个Agent为其负责的诊所执行步骤3-5，并返回结构化提供者记录。

单诊所优化：若仅1-2家诊所，直接在主上下文运行，无需生成子Agent。

备选方案：若任一Agent失败，直接在主上下文执行该提取任务。输出结果不可存在空白。

Error Handling

错误处理

See

references/nimble-playbook.md

for the standard error table (missing API key, 429, 401, empty results, extraction garbage). Skill-specific errors:

No provider pages found: "Couldn't find provider/team pages on [domain]. The site may list staff differently. Want me to try extracting from the homepage or search for this practice on healthcare directories?"
All extractions returned garbage: "The practice sites appear to be heavily JavaScript-rendered. Retrying with browser rendering..." (auto-retry with
```
--render
```
per the shared pattern)
Ambiguous practice name: If a URL fails and the user provided a name instead, search for the practice:
```
nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite
```
CSV/Sheet parse error: "Couldn't parse the input file. Expected a column with practice URLs. Can you paste the URLs directly instead?"

标准错误表（缺失API密钥、429、401、无结果、提取无效内容）请参考

references/nimble-playbook.md

。技能专属错误处理：

未找到提供者页面：“在[domain]上未找到提供者/团队页面。该网站可能采用其他方式展示员工信息。是否需要尝试从首页提取，或在医疗目录中搜索该诊所？”
所有提取结果无效：“诊所网站似乎采用重度JavaScript渲染。将使用浏览器渲染重试...”（按照共享模式自动添加
```
--render
```
参数重试）

诊所名称模糊：若URL无效且用户仅提供了名称，搜索该诊所：

nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite

CSV/表格解析错误：“无法解析输入文件。预期包含诊所URL的列。能否直接粘贴URL？”