healthcare-providers-extract

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Healthcare Providers Extract

医疗服务提供者数据提取

Structured practitioner extraction from healthcare practice websites, powered by Nimble's web data APIs.
User request: $ARGUMENTS
Before running any commands, read
references/nimble-playbook.md
for Claude Code constraints (no shell state, no
&
/
wait
, sub-agent permissions, communication style).

基于Nimble网页数据API,从医疗诊所网站提取结构化从业者数据。
用户请求:$ARGUMENTS
执行任何命令前,请阅读
references/nimble-playbook.md
了解Claude Code的限制(无shell状态、禁止使用
&
/
wait
、子Agent权限、沟通风格)。

Instructions

操作说明

Step 0: Preflight + WSA Discovery

步骤0:预检 + WSA发现

Run the preflight pattern from
references/nimble-playbook.md
(5 simultaneous Bash calls: date calc, today, CLI check, profile load, index.md load).
Also simultaneously — run WSA discovery and setup:
  • mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}
  • ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null
  • Run Layer 1 (vertical) and Layer 3 (general tools) WSA discovery from
    references/wsa-reference.md
    . Layer 2 (session-specific) runs after Step 1 when you know the user's specialty.
Classify discovered agents into phases and validate with
nimble agent get
per
references/wsa-reference.md
.
From the preflight results:
  • CLI missing or API key unset ->
    references/profile-and-onboarding.md
    , stop
  • Profile exists -> note it for context. Determine mode using smart date windowing from
    references/nimble-playbook.md
    :
    • Full mode: first run OR last run > 14 days ago
    • Quick refresh: last run < 14 days ago (re-extract only new/changed pages)
    • Same-day repeat: if
      last_runs.healthcare-providers-extract
      is today, check for existing report at
      ~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
      . If found, ask: "Already ran today. Run again for fresh data?"
  • No profile -> that's fine. This skill doesn't require onboarding. Proceed to Step 1.
执行
references/nimble-playbook.md
中的预检流程(5个并行Bash调用:日期计算、当前日期、CLI检查、配置文件加载、index.md加载)。
同时执行WSA发现与设置:
  • mkdir -p ~/.nimble/memory/{reports,healthcare-providers-extract/checkpoints}
  • ls ~/.nimble/memory/healthcare-providers-extract/checkpoints/ 2>/dev/null
  • references/wsa-reference.md
    执行第1层(垂直领域)和第3层(通用工具)WSA发现。第2层(会话专属)将在步骤1确定用户所需专业领域后执行。
将发现的Agent按阶段分类,并根据
references/wsa-reference.md
使用
nimble agent get
验证。
根据预检结果处理:
  • CLI缺失或API密钥未设置 -> 参考
    references/profile-and-onboarding.md
    ,停止操作
  • 配置文件存在 -> 记录上下文信息。根据
    references/nimble-playbook.md
    中的智能日期窗口确定模式:
    • 完整模式:首次运行 或 上次运行已超过14天
    • 快速刷新模式:上次运行不足14天(仅重新提取新增/变更页面)
    • 当日重复运行:若
      last_runs.healthcare-providers-extract
      为今日,检查
      ~/.nimble/memory/reports/healthcare-providers-extract-*[today].md
      是否存在现有报告。若存在,询问:“今日已运行过该任务。是否重新运行以获取最新数据?”
  • 无配置文件 -> 无需担心,本技能无需完成注册流程。继续执行步骤1。

Step 1: Parse Input & Starting Questions

步骤1:解析输入与初始问题

Parse
$ARGUMENTS
for input type using the Input Parsing Pattern from
references/nimble-playbook.md
. Key routing:
  • URLs detected -> proceed to Step 3
  • Specialty + location (no URLs) -> proceed to Step 2 (practice discovery)
  • Unclear -> ask (counts as 1 of max 2 prompts)
If input is clear, confirm and ask one shaping question (plain text, not AskUserQuestion):
"Extracting providers from N practice sites. Quick questions:
  1. Healthcare vertical? (ophthalmology, dental, dermatology, general, or other)
  2. Quick scan (names + credentials only) or full extraction (all 5 fields)?"
If input is ambiguous, use AskUserQuestion (counts as 1 of max 2 prompts):
What practice sites should I extract providers from?
  • Paste URLs directly (one per line)
  • Provide a CSV file path or Google Sheet URL with practice URLs
  • Or describe what you're looking for (e.g., "ophthalmologists in Austin, TX") and I'll find practices first
Skip questions the user already answered in their initial message.
使用
references/nimble-playbook.md
中的输入解析模式解析
$ARGUMENTS
的输入类型。核心路由规则:
  • 检测到URL -> 进入步骤3
  • 专业领域+地点(无URL)-> 进入步骤2(诊所发现)
  • 输入模糊 -> 询问用户(最多允许2次提示,此为第1次)
若输入清晰,确认后提出一个明确问题(纯文本,不使用AskUserQuestion):
“将从N个诊所网站提取医疗服务提供者信息。快速确认:
  1. 医疗垂直领域?(眼科、牙科、皮肤科、全科或其他)
  2. 仅快速扫描(姓名+资质)还是完整提取(全部5类字段)?”
若输入模糊,使用AskUserQuestion(最多允许2次提示,此为第1次):
请明确需要提取哪些诊所的医疗服务提供者信息?
  • 直接粘贴URL(每行一个)
  • 提供包含诊所URL的CSV文件路径或Google表格链接
  • 或描述需求(例如:“德克萨斯州奥斯汀市的眼科医生”),我将先为您查找相关诊所
跳过用户在初始请求中已回答的问题。

Step 2: Practice Discovery (Optional)

步骤2:诊所发现(可选)

Only if the user provided a specialty + location instead of URLs.
Two input paths into discovery:
Path A — Fresh discovery. User gave specialty + location. Run Layer 2 WSA discovery for session-specific agents:
bash
nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"
See
references/wsa-reference.md
for the full discovery strategy, agent evaluation criteria, and healthcare discovery prioritization.
Run all discovery-phase agents simultaneously. Validate params with
nimble agent get
first.
Path B — Market-finder handoff. User ran
market-finder
first and wants to extract providers from those results. Read the market-finder output:
bash
cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/null
Extract practice records. Note: Google Maps results contain
place_url
(a Maps link) but not the practice's actual website URL. Proceed to Step 2b to resolve real website URLs before site mapping.
After either path: Deduplicate by domain. Present discovered practices:
"Found N practices for [specialty] in [location] across [M] data sources. Proceeding to extract providers from these sites..."
Fallback — if no discovery WSAs were found, or results are sparse (< 3):
bash
nimble search --query "[specialty] in [location]" --max-results 20 --search-depth lite
仅当用户未提供URL,而是给出专业领域+地点时执行此步骤。
发现流程分为两种路径:
路径A — 全新发现。用户提供了专业领域+地点。执行会话专属Agent的第2层WSA发现:
bash
nimble agent list --limit 50 --search "[specialty]"
nimble agent list --limit 50 --search "[directory-user-mentioned]"
完整发现策略、Agent评估标准及医疗领域发现优先级请参考
references/wsa-reference.md
并行运行所有发现阶段的Agent,先使用
nimble agent get
验证参数。
路径B — 对接market-finder。用户先运行了
market-finder
,希望从其结果中提取医疗服务提供者信息。读取market-finder输出:
bash
cat ~/.nimble/memory/market-finder/{slug}/entities.json 2>/dev/null
提取诊所记录。注意:Google Maps结果包含
place_url
(Maps链接),但不包含诊所的实际网站URL。进入步骤2b解析真实网站URL后,再进行站点映射。
完成任一路径后:按域名去重,展示发现的诊所:
“在[M]个数据源中为[location]地区的[specialty]领域找到了N家诊所。即将从这些网站提取医疗服务提供者信息...”
备选方案 — 若未发现WSA,或结果不足(<3家):
bash
nimble search --query "[specialty] in [location]" --max-results 20 --search-depth lite

Step 2b: Resolve Practice Website URLs

步骤2b:解析诊所网站URL

Discovery sources (Google Maps, Yelp, BBB) return listing URLs, not practice website URLs. Before site mapping, resolve the actual website for each practice:
  1. Check structured data first — Google Maps results often include a
    website
    field in the structured output. Use it if present.
  2. Extract from listing page — if no
    website
    field, extract the Maps listing to find the practice website link:
    bash
    nimble extract --url "[maps-listing-url]" --format markdown
  3. Search fallback — if extraction fails:
    bash
    nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite
Skip practices where no website URL can be resolved — note them in the "Data Quality Summary" output section.
发现数据源(Google Maps、Yelp、BBB)返回的是列表URL,而非诊所官网URL。在站点映射前,需为每家诊所解析真实官网:
  1. 优先检查结构化数据 — Google Maps结果通常在结构化输出中包含
    website
    字段。若存在则直接使用。
  2. 从列表页提取 — 若无
    website
    字段,提取Maps列表页内容以查找诊所官网链接:
    bash
    nimble extract --url "[maps-listing-url]" --format markdown
  3. 搜索备选方案 — 若提取失败:
    bash
    nimble search --query "[practice-name] [city] official website" --max-results 3 --search-depth lite
跳过无法解析官网URL的诊所,并在输出的“数据质量总结”部分记录。

Step 3: Site Mapping

步骤3:站点映射

Follow the Site Mapping Pattern from
references/nimble-playbook.md
for each practice URL. Skill-specific settings:
  • Keyword weight table:
    references/provider-extraction-patterns.md
  • Page cap: 15 per site
  • Fallback query:
    site:[domain] doctors OR providers OR team
For 6+ practices, use sub-agents (see Sub-Agent Strategy below).
Save checkpoint:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.json
为每个诊所URL执行
references/nimble-playbook.md
中的站点映射模式。技能专属设置:
  • 关键词权重表
    references/provider-extraction-patterns.md
  • 页面上限:每个站点最多15页
  • 备选查询
    site:[domain] doctors OR providers OR team
若诊所数量≥6,使用子Agent(详见下文子Agent策略)。
保存检查点:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/mapping.json

Step 4: Page Extraction

步骤4:页面提取

WSA shortcuts first: If WSA discovery found agents that extract provider data from healthcare directories, use those for matching practices — structured WSA output is higher quality than parsed markdown.
For all other practices, follow the Page Extraction with Retry pattern from
references/nimble-playbook.md
. Scale using the Scaled Execution pattern from the same reference.
Save checkpoint:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.json
优先使用WSA快捷方式:若WSA发现了可从医疗目录提取提供者数据的Agent,对匹配诊所使用这些Agent——结构化WSA输出质量高于解析后的markdown。
对于其他所有诊所,执行
references/nimble-playbook.md
中的带重试机制页面提取模式。参考同一文档中的大规模执行模式进行扩展。
保存检查点:
~/.nimble/memory/healthcare-providers-extract/checkpoints/{slug}/extraction.json

Step 5: Structured Parsing

步骤5:结构化解析

Parse extracted markdown to identify providers and their fields. Read
references/provider-extraction-patterns.md
for the 5 core fields, credential regex patterns, and specialty keywords.
For each extracted page:
  1. Scan for provider name patterns (Dr. prefix, heading patterns, bold text near credential suffixes)
  2. Match credentials using the regex patterns from
    references/provider-extraction-patterns.md
  3. Match specialty using keywords for the detected healthcare vertical
  4. Extract contact info (phone regex, appointment URLs, email)
  5. Extract education/training mentions
Build structured records:
json
{
  "name": "Dr. Jane Smith",
  "credentials": "MD, FACS",
  "specialty": "Retinal Surgery",
  "contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
  "education": "Fellowship: Bascom Palmer Eye Institute",
  "source_url": "https://practice.com/our-doctors",
  "practice_name": "Shore Center for Eye Care",
  "practice_url": "https://practice.com",
  "confidence": "High"
}
解析提取的markdown以识别医疗服务提供者及其字段。请阅读
references/provider-extraction-patterns.md
了解5类核心字段、资质正则表达式模式及专业领域关键词。
针对每个提取页面:
  1. 扫描提供者姓名模式(Dr.前缀、标题格式、资质后缀附近的加粗文本)
  2. 使用
    references/provider-extraction-patterns.md
    中的正则表达式匹配资质
  3. 使用检测到的医疗垂直领域关键词匹配专业领域
  4. 提取联系信息(电话正则表达式、预约URL、邮箱)
  5. 提取教育/培训相关信息
构建结构化记录:
json
{
  "name": "Dr. Jane Smith",
  "credentials": "MD, FACS",
  "specialty": "Retinal Surgery",
  "contact": {"phone": "(555) 123-4567", "scheduling_url": "..."},
  "education": "Fellowship: Bascom Palmer Eye Institute",
  "source_url": "https://practice.com/our-doctors",
  "practice_name": "Shore Center for Eye Care",
  "practice_url": "https://practice.com",
  "confidence": "High"
}

Step 6: Deduplication & Confidence Scoring

步骤6:去重与置信度评分

Follow the Entity Deduplication and Entity Confidence Scoring patterns from
references/nimble-playbook.md
. Skill-specific dedup rules and the 5-field confidence criteria are in
references/provider-extraction-patterns.md
.
执行
references/nimble-playbook.md
中的实体去重和实体置信度评分模式。技能专属去重规则及5字段置信度标准请参考
references/provider-extraction-patterns.md

Step 7: Output

步骤7:输出

Present results grouped by practice, sorted by confidence within each practice.
markdown
undefined
按诊所分组展示结果,每组内按置信度排序。
markdown
undefined

Provider Extraction: [N] Providers from [M] Practices

医疗服务提供者提取结果:[N]位提供者来自[M]家诊所

[Date] | [H] High, [M] Medium, [L] Low confidence
[日期] | 高置信度[H]、中置信度[M]、低置信度[L]

TL;DR

摘要

Extracted [N] providers from [M] practice websites. [H] with complete profiles, [L] with partial data. [Key finding: e.g., "12 of 15 providers are board-certified"].
从[M]家诊所网站提取到[N]位医疗服务提供者信息。其中[H]位拥有完整资料,[L]位资料不全。[关键发现:例如“15位提供者中有12位具备委员会认证资质”]。

[Practice Name] ([domain])

[诊所名称] ([domain])

#NameCredentialsSpecialtyContactEducationConfidence
1Dr. Jane SmithMD, FACSRetinal Surgery(555) 123-4567Fellowship: Bascom PalmerHigh
2Dr. John DoeODGeneral OphthalmologyBookResidency: Wills EyeMedium
[Repeat per practice]
#姓名资质专业领域联系方式教育背景置信度
1Dr. Jane SmithMD, FACSRetinal Surgery(555) 123-4567Fellowship: Bascom Palmer
2Dr. John DoeODGeneral Ophthalmology预约Residency: Wills Eye
[按诊所重复上述表格]

Data Quality Summary

数据质量总结

  • Complete profiles (High): [N] providers
  • Partial profiles (Medium): [N] providers — missing: [list common gaps]
  • Minimal profiles (Low): [N] providers — missing: [list common gaps]
  • 完整资料(高置信度):[N]位提供者
  • 部分资料(中置信度):[N]位提供者 — 缺失:[常见缺失字段列表]
  • 基础资料(低置信度):[N]位提供者 — 缺失:[常见缺失字段列表]

Sources

数据源

[Clickable URL for every page extracted, grouped by practice]

**Source links are mandatory.** Every provider record must trace back to a source URL.
[每个提取页面的可点击URL,按诊所分组]

**必须包含来源链接**。每位提供者记录都必须可追溯至来源URL。

Step 8: Save to Memory

步骤8:保存至内存

Make all Write calls simultaneously:
  • Report ->
    ~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md
  • Provider data ->
    ~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json
  • Profile -> update
    last_runs.healthcare-providers-extract
    in
    ~/.nimble/business-profile.json
    (only if profile exists)
  • Follow the wiki update pattern from
    references/memory-and-distribution.md
    : update
    index.md
    rows for all affected entity files, append a
    log.md
    entry for this run.
  • Clean up checkpoint (complete run) or keep (partial run)
同时执行所有写入操作:
  • 报告 ->
    ~/.nimble/memory/reports/healthcare-providers-extract-{slug}-{date}.md
  • 提供者数据 ->
    ~/.nimble/memory/healthcare-providers-extract/{slug}/providers.json
  • 配置文件 -> 若配置文件存在,更新
    ~/.nimble/business-profile.json
    中的
    last_runs.healthcare-providers-extract
    字段
  • 按照
    references/memory-and-distribution.md
    中的wiki更新模式:更新所有受影响实体文件的
    index.md
    行,在
    log.md
    中添加本次运行记录
  • 清理检查点(运行完成)或保留检查点(运行中断)

Step 9: Share & Distribute

步骤9:分享与分发

Always offer distribution -- do not skip. Follow
references/memory-and-distribution.md
for connector detection and sharing flow.
Notion: full provider table as a dated subpage. Slack: TL;DR with provider count and confidence breakdown only.
必须提供分发选项,不可跳过。参考
references/memory-and-distribution.md
进行连接器检测和分享流程。
Notion:将完整提供者表格作为带日期的子页面分享。 Slack:仅分享摘要,包含提供者数量和置信度分布。

Step 10: Follow-ups

步骤10:后续操作

  • "Tell me more about Dr. X" -> show full extracted profile
  • "Export as CSV" -> generate CSV from providers.json
  • "Run on more sites" -> append new practice URLs, extract and merge
  • "What's missing?" -> detail the data gaps per provider
Enrichment from discovered WSAs: If Step 0 found enrichment-phase agents (reviews, regulatory, practice details), offer them as immediate follow-ups:
"I also found [N] WSAs that could enrich this data: [brief list]. Want me to run reputation checks or regulatory lookups on these providers/practices?"
See
references/wsa-reference.md
for enrichment phase mapping and fallback chains.
Sibling skill suggestions:
Next steps:
  • Run
    healthcare-providers-enrich
    to fill data gaps (NPI lookup, board certification verification, additional contact info)
  • Run
    healthcare-providers-verify
    to validate credentials and license status
  • Run
    market-finder
    to discover more practice URLs in this area

  • “告诉我更多关于Dr. X的信息” -> 展示该提供者的完整提取资料
  • “导出为CSV” -> 从providers.json生成CSV文件
  • “在更多网站上运行” -> 添加新诊所URL,提取并合并数据
  • “缺少哪些信息?” -> 详细说明每位提供者的数据空白
从发现的WSA进行数据补充:若步骤0发现了补充阶段的Agent(评价、监管、诊所详情),立即提供这些选项作为后续操作:
“我还发现了[N]个可用于补充此数据的WSA:[简要列表]。是否需要对这些提供者/诊所进行声誉检查或监管信息查询?”
补充阶段映射及备选流程请参考
references/wsa-reference.md
同类技能建议:
下一步操作建议:
  • 运行
    healthcare-providers-enrich
    填补数据空白(NPI查询、委员会认证验证、额外联系信息)
  • 运行
    healthcare-providers-verify
    验证资质和执照状态
  • 运行
    market-finder
    发现该地区更多诊所URL

Sub-Agent Strategy

子Agent策略

For batch extraction (6+ practices), use
nimble-researcher
agents (
agents/nimble-researcher.md
) to parallelize site mapping and extraction.
Follow the sub-agent spawning rules from
references/nimble-playbook.md
(bypassPermissions, batch max 4, explicit Bash instruction, fallback on failure).
Spawn pattern: One agent per practice (or per batch of 3 practices for large jobs). Each agent runs Steps 3-5 for its assigned practices and returns structured provider records.
Single-practice optimization: If only 1-2 practices, run directly from the main context instead of spawning agents.
Fallback: If any agent fails, run those extractions directly from the main context. Never leave gaps in the output.

对于批量提取(≥6家诊所),使用
nimble-researcher
Agent(
agents/nimble-researcher.md
)并行执行站点映射和提取操作。
遵循
references/nimble-playbook.md
中的子Agent生成规则(bypassPermissions、批量上限4个、明确Bash指令、失败备选方案)。
生成模式:为每家诊所分配一个Agent(大型任务可为每3家诊所分配一个Agent)。每个Agent为其负责的诊所执行步骤3-5,并返回结构化提供者记录。
单诊所优化:若仅1-2家诊所,直接在主上下文运行,无需生成子Agent。
备选方案:若任一Agent失败,直接在主上下文执行该提取任务。输出结果不可存在空白。

Error Handling

错误处理

See
references/nimble-playbook.md
for the standard error table (missing API key, 429, 401, empty results, extraction garbage). Skill-specific errors:
  • No provider pages found: "Couldn't find provider/team pages on [domain]. The site may list staff differently. Want me to try extracting from the homepage or search for this practice on healthcare directories?"
  • All extractions returned garbage: "The practice sites appear to be heavily JavaScript-rendered. Retrying with browser rendering..." (auto-retry with
    --render
    per the shared pattern)
  • Ambiguous practice name: If a URL fails and the user provided a name instead, search for the practice:
    nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite
  • CSV/Sheet parse error: "Couldn't parse the input file. Expected a column with practice URLs. Can you paste the URLs directly instead?"
标准错误表(缺失API密钥、429、401、无结果、提取无效内容)请参考
references/nimble-playbook.md
。技能专属错误处理:
  • 未找到提供者页面:“在[domain]上未找到提供者/团队页面。该网站可能采用其他方式展示员工信息。是否需要尝试从首页提取,或在医疗目录中搜索该诊所?”
  • 所有提取结果无效:“诊所网站似乎采用重度JavaScript渲染。将使用浏览器渲染重试...”(按照共享模式自动添加
    --render
    参数重试)
  • 诊所名称模糊:若URL无效且用户仅提供了名称,搜索该诊所:
    nimble search --query "[practice name] [location] doctors" --max-results 5 --search-depth lite
  • CSV/表格解析错误:“无法解析输入文件。预期包含诊所URL的列。能否直接粘贴URL?”