digital-health-clinical-asr-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0 -->
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0 -->

Clinical ASR Flywheel — Stage 1 (Setup)

临床ASR飞轮 — 第一阶段(设置)

Agent: this file is the complete Stage 1 procedure. Do not invoke
find
,
ls
,
rg
, or
grep
looking for an installer or hidden config — there isn't one. The four sections below (outbound-data disclosure, three numbered checks, sibling hand-off) are all required reading; don't skip any. Function IDs, env-var conventions, and the smoke-test gate are inlined further down — answer from what's actually written here rather than from prior Riva/NVCF familiarity.
Stage 1 has one job: prove the user can reach NVIDIA's hosted speech stack with the
NVIDIA_API_KEY
they currently hold. Once a single clinical sentence round-trips through Magpie TTS → Parakeet/Nemotron ASR successfully, the user is cleared to advance to
/digital-health-clinical-asr-build
.
The four-stage flywheel exists to drive down KER (keyword error rate) on clinical entities — drugs, procedures, anatomy, conditions, labs, roles. WER averages obscure the failures that hurt clinically; KER is what Stage 3 will measure you against.
There is no installer script anywhere in this skill — not
install.sh
, not
setup.py
, nothing hidden. Stage 1 is the three steps below: verify the key, install Python deps, run the smoke test. Anything past Stage 1 is composed from sibling skills (
/data-designer
,
/riva-tts
, the inlined Stage 3 ASR recipe,
/riva-asr-custom
). If a user asks "what script installs everything?", answer from this paragraph; don't go searching.
Agent:此文件包含完整的第一阶段流程。 不要调用
find
ls
rg
grep
查找安装程序或隐藏配置——不存在此类内容。以下四个部分(对外数据披露、三项编号检查、后续技能交接)均需阅读,请勿跳过。函数ID、环境变量约定和冒烟测试关卡在下方内嵌说明——请根据此处实际内容作答,而非依赖之前对Riva/NVCF的了解。
第一阶段的目标只有一个:验证用户能否使用当前持有的
NVIDIA_API_KEY
访问NVIDIA托管的语音栈。当一句临床语句成功完成Magpie TTS → Parakeet/Nemotron ASR的往返测试后,用户即可进入下一阶段
/digital-health-clinical-asr-build
四阶段飞轮的存在是为了降低临床实体(药物、手术、解剖结构、病症、检验、角色)的KER(关键词错误率)。词错误率(WER)平均值会掩盖对临床场景有害的错误;KER是第三阶段的考核指标。
本技能中不存在任何安装脚本——没有
install.sh
,没有
setup.py
,也没有隐藏的安装程序。第一阶段就是以下三个步骤:验证密钥、安装Python依赖、运行冒烟测试。第一阶段之后的所有内容都由兄弟技能(
/data-designer
/riva-tts
、内嵌的第三阶段ASR方案、
/riva-asr-custom
)组成。如果用户询问“哪个脚本可以安装所有内容?”,请根据本段内容作答,不要去搜索其他内容。

Outbound data flows — surface before any text or audio is sent

对外数据流 — 在发送任何文本或音频前披露

Two external endpoints receive data during this flywheel. The user has to acknowledge both before Stage 2 begins, against whatever data-governance policy their organization enforces. Render the table below word-for-word in your response — a paraphrase doesn't satisfy the disclosure; the literal phrasing is what counts.
ServiceWhat gets sentWhenHosted by
NVIDIA NVCF (
grpc.nvcf.nvidia.com
)
The clinical sentences you synthesize (text), and the WAV files you transcribe (audio)Every Stage 2 TTS call and every Stage 3 ASR callNVIDIA, governed by build.nvidia.com terms
Merriam-Webster (
dictionaryapi.com
JSON API or the public
merriam-webster.com
HTML site)
Individual clinical terms (drug names, anatomy, procedures), one HTTP request per termStage 2 IPA tagging — see "Two MW paths" below for which endpoint appliesMerriam-Webster, governed by their API or site terms
The data is synthetic by construction — the flywheel manufactures sentences and audio from a user-curated term list, never from real patient encounters. That said: do not feed real patient transcripts, recorded clinical audio, or any PHI through any stage. If the term list itself contains sensitive material (codename drugs, unreleased product names), the user should consult their organization's external-API policy before proceeding. Either endpoint can be turned off:
  • Skip Merriam-Webster entirely: leave
    DICTIONARY_API_KEY
    unset and don't run a scraper. Stage 2 falls back to Magpie G2P, which still works but with weaker coverage on long-tail clinical terms.
  • Skip NVCF: this is a hard stop. Magpie TTS + Parakeet/Nemotron ASR are the workload; without them this skill family is the wrong tool — a self-hosted ASR/TTS pipeline is what you want instead.
Recommend a copy of this notice lands in the user's workspace
README.md
; bring it forward on first invocation if it isn't already there.
在此飞轮运行期间,数据会发送至两个外部端点。用户必须在第二阶段开始前确认这一点,且需符合其组织的数据治理政策。请在回复中一字不差地呈现下方表格——转述不符合披露要求,必须使用原文表述。
服务发送内容发送时机托管方
NVIDIA NVCF (
grpc.nvcf.nvidia.com
)
您合成的临床语句(文本)和转录的WAV文件(音频)每次第二阶段的TTS调用和第三阶段的ASR调用NVIDIA,受build.nvidia.com条款约束
Merriam-Webster (
dictionaryapi.com
JSON API 公共
merriam-webster.com
HTML站点)
单个临床术语(药物名称、解剖结构、手术名称),每个术语对应一次HTTP请求第二阶段的IPA标注——请参考下方“两种MW路径”确定适用的端点Merriam-Webster,受其API或站点条款约束
这些数据本质上是合成的——飞轮从用户整理的术语列表生成语句和音频,绝不会使用真实患者的诊疗记录。尽管如此:请勿将真实患者的转录文本、录制的临床音频或任何受保护健康信息(PHI)输入任何阶段。 如果术语列表本身包含敏感内容(代号药物、未发布产品名称),用户应在继续操作前咨询其组织的外部API政策。任一端点均可关闭:
  • 完全跳过Merriam-Webster: 不设置
    DICTIONARY_API_KEY
    且不运行爬虫。第二阶段将 fallback 到Magpie G2P,该方案仍可运行,但对长尾临床术语的覆盖能力较弱。
  • 跳过NVCF: 这会导致流程直接终止。Magpie TTS + Parakeet/Nemotron ASR是核心工作负载;没有它们,本技能家族并非合适工具——您需要的是自托管的ASR/TTS管道。
建议将本通知副本添加到用户工作区的
README.md
中;如果首次调用时未存在,请主动提供。

Purpose

目标

Get a fresh environment ready for Stage 2. Three things to confirm: key is present, deps import cleanly, hosted stack actually answers. Close by naming which skill to run next.
The four
digital-health-clinical-asr-*
skills are self-contained — every TTS, ASR, IPA-tagging, and scoring recipe lives inside them; no other agent skill needs installing to run the flywheel end-to-end.
This skill takes no opinion on workspace layout. The user decides where their cycle artifacts live;
data/eval_sets/cycle<N>/
is not imposed.
为第二阶段准备全新环境。需确认三件事:密钥已存在、依赖项可正常导入、托管栈可正常响应。最后指明下一步应运行的技能。
四个
digital-health-clinical-asr-*
技能是独立封装的——所有TTS、ASR、IPA标注和评分方案都包含在其中;无需安装其他Agent技能即可完成飞轮的端到端运行。
本技能不限制工作区布局。用户可自行决定循环工件的存储位置;不会强制使用
data/eval_sets/cycle<N>/
路径。

When to use this skill

何时使用本技能

Activate on user phrases like:
  • "Set up the Clinical ASR Flywheel"
  • "Initialize the clinical-asr eval"
  • "I want to evaluate ASR on clinical terminology — where do I start?"
  • "Bootstrap my environment for the flywheel"
  • "What do I need installed before I run the flywheel?"
Do not activate when:
  • The user already has a manifest and wants to score it →
    /digital-health-clinical-asr-eval
  • The user already has the env set up and wants to curate terms →
    /digital-health-clinical-asr-build
  • The user is asking about Stage 4 fine-tune NGC/Docker setup specifically → that's covered inside
    /digital-health-clinical-asr-finetune
当用户说出以下类似语句时激活:
  • "设置临床ASR飞轮"
  • "初始化临床ASR评估"
  • "我想评估ASR在临床术语上的表现——从哪里开始?"
  • "为飞轮引导我的环境"
  • "运行飞轮前需要安装什么?"
请勿在以下场景激活:
  • 用户已有清单并希望评分 →
    /digital-health-clinical-asr-eval
  • 用户已完成环境设置并希望整理术语 →
    /digital-health-clinical-asr-build
  • 用户专门询问第四阶段微调NGC/Docker设置 → 该内容在
    /digital-health-clinical-asr-finetune
    中覆盖

Prerequisites

先决条件

RequirementRequired?WhyHow
NVIDIA_API_KEY
(
nvapi-…
)
RequiredHosted Magpie TTS + Parakeet/Nemotron ASR via NVCFIssue at https://build.nvidia.com;
export NVIDIA_API_KEY=...
in shell
Python ≥ 3.10RequiredNeMo client, scoring, manifest tools
python3 --version
nvidia-riva-client
,
pandas
,
soundfile
,
requests
RequiredTTS + ASR clients, manifest I/O, MW lookup
pip install nvidia-riva-client pandas soundfile requests
DICTIONARY_API_KEY
OptionalMerriam-Webster Medical Dictionary lookup via the JSON API (Path A in the build skill — recommended)Free key at https://dictionaryapi.com. Path B (HTML scrape of
merriam-webster.com
, no key, brittle) is also documented in the build skill if you can't get a key. Without either path, Stage 2 falls through to Magpie G2P with weaker long-tail coverage.
jiwer
OptionalReference WER/CER against the inlined Levenshtein implementation
pip install jiwer
— the eval skill includes a pure-Python fallback
(Stage 4 only)
NGC_API_KEY
+ CUDA host + NeMo container
Optional, deferredFine-tune workloadSet up inside
/digital-health-clinical-asr-finetune
; defer until the eval shows KER > 0.3
要求是否必需?原因实现方式
NVIDIA_API_KEY
(
nvapi-…
)
必需通过NVCF访问托管的Magpie TTS + Parakeet/Nemotron ASRhttps://build.nvidia.com申请;在shell中执行
export NVIDIA_API_KEY=...
Python ≥ 3.10必需NeMo客户端、评分工具、清单工具执行
python3 --version
检查
nvidia-riva-client
,
pandas
,
soundfile
,
requests
必需TTS + ASR客户端、清单输入输出、MW查询执行
pip install nvidia-riva-client pandas soundfile requests
DICTIONARY_API_KEY
可选通过JSON API查询Merriam-Webster医学词典(构建技能中的路径A——推荐)https://dictionaryapi.com免费获取密钥。如果无法获取密钥,构建技能中还记录了路径B(抓取
merriam-webster.com
的HTML,无需密钥,但易受站点变更影响)。如果两种路径都不使用,第二阶段将 fallback 到Magpie G2P,对长尾术语的覆盖能力较弱。
jiwer
可选对比内嵌Levenshtein实现的参考WER/CER执行
pip install jiwer
——评估技能包含纯Python fallback方案
(仅第四阶段)
NGC_API_KEY
+ CUDA主机 + NeMo容器
可选,延迟安装微调工作负载
/digital-health-clinical-asr-finetune
中设置;延迟到评估显示KER > 0.3时再安装

Instructions

操作说明

Scope. This skill performs read-only environment checks: confirming a key is exported (length-only), the Python version, that libraries import, and that the hosted NVCF stack responds to a single smoke-test round-trip. It does not install system packages, modify shell rc files, write to disk outside an explicit
.venv/
, or attempt to authenticate with the real key value. Validate; never mutate without explicit user direction.
范围:本技能仅执行只读环境检查:确认密钥已导出(仅检查长度)、Python版本符合要求、库可正常导入、托管NVCF栈可响应单次冒烟测试往返请求。本技能不会安装系统包、修改shell rc文件、在显式的
.venv/
之外写入磁盘,也不会尝试使用真实密钥值进行认证。仅做验证;未经用户明确指示,绝不修改环境。

1a. Verify
NVIDIA_API_KEY
(length-only — never echo the value)

1a. 验证
NVIDIA_API_KEY
(仅检查长度——绝不回显密钥值)

bash
undefined
bash
undefined

Export NVIDIA_API_KEY in your shell — never echo or commit the value

Export NVIDIA_API_KEY in your shell — never echo or commit the value

export NVIDIA_API_KEY=nvapi-... # from https://build.nvidia.com
export NVIDIA_API_KEY=nvapi-... # from https://build.nvidia.com

Length-only check; the key value never appears in any log

Length-only check; the key value never appears in any log

test -n "$NVIDIA_API_KEY" && echo "NVIDIA_API_KEY len=${#NVIDIA_API_KEY}"

A length of 70+ is normal. If the output is empty or shows `len=0`, the user must paste a key from <https://build.nvidia.com>. Do **not** print the key, even truncated. To persist across shell sessions, add the `export` line to your shell rc (`~/.bashrc`, `~/.zshrc`) — or use a per-directory tool like `direnv`.
test -n "$NVIDIA_API_KEY" && echo "NVIDIA_API_KEY len=${#NVIDIA_API_KEY}"

正常长度为70以上。如果输出为空或显示`len=0`,用户必须从<https://build.nvidia.com>获取密钥。**请勿**打印密钥,即使是截断后的内容。如需在shell会话间持久化,可将`export`行添加到shell rc文件(`~/.bashrc`、`~/.zshrc`)中——或使用`direnv`等目录级工具。

1b. Install Python dependencies

1b. 安装Python依赖项

bash
python3 -m venv .venv
source .venv/bin/activate
pip install nvidia-riva-client pandas soundfile requests
bash
python3 -m venv .venv
source .venv/bin/activate
pip install nvidia-riva-client pandas soundfile requests

optional

optional

pip install jiwer

For Stage 4 (fine-tune) only: `nemo-toolkit` and Docker + NVIDIA Container Toolkit are also required. Defer those to `/digital-health-clinical-asr-finetune` — there is no point installing them up front if the user may never reach Stage 4.
pip install jiwer

仅第四阶段(微调)需要:还需安装`nemo-toolkit`和Docker + NVIDIA容器工具包。将这些安装步骤延迟到`/digital-health-clinical-asr-finetune`中——如果用户可能永远不会进入第四阶段,提前安装毫无意义。

1c. Smoke-test the hosted NVCF stack

1c. 冒烟测试托管NVCF栈

NVIDIA_API_KEY
handling — load-bearing, do not deviate:
  • The agent harness reads
    $NVIDIA_API_KEY
    from the shell and passes it as an explicit function argument to
    smoke_test(api_key=…)
    .
  • Auditors can grep the recipe for every wire crossing — every
    api_key
    use is visible in
    auth_for(...)
    .
  • Do not
    echo
    ,
    print
    , or log the key value (including truncated). Length-only checks are fine (see §1a).
  • Do not let the recipe read
    os.environ["NVIDIA_API_KEY"]
    itself — the explicit-argument pattern is the auditability guarantee.
  • Do not commit the key to any file, including
    .env
    examples or notebook outputs.
Verify the
NVIDIA_API_KEY
actually works against Magpie TTS and Parakeet/Nemotron ASR before advancing. The four skills inline every recipe needed; this round-trip just confirms the API key + network path are real.
The agent harness loads the
NVIDIA_API_KEY
shell variable and passes it as an explicit function argument to the helpers below. The recipe code itself does not read environment variables — auditors can see exactly which API keys cross the wire.
python
import wave, tempfile
import riva.client

NVCF_HOST = "grpc.nvcf.nvidia.com:443"
MAGPIE_FUNCTION_ID    = "877104f7-e885-42b9-8de8-f6e4c6303969"   # Magpie TTS
PARAKEET_FUNCTION_ID  = "d3fe9151-442b-4204-a70d-5fcc597fd610"   # Parakeet TDT 0.6B v2 (offline ASR)

def auth_for(function_id: str, api_key: str) -> riva.client.Auth:
    return riva.client.Auth(
        use_ssl=True, uri=NVCF_HOST,
        metadata_args=[
            ["function-id", function_id],
            ["authorization", f"Bearer {api_key}"],
        ],
    )

def smoke_test(api_key: str) -> str:
    """Caller passes api_key (the harness reads $NVIDIA_API_KEY at the shell;
    this code never touches the environment). Returns the ASR transcript."""

    # 1. TTS: "The patient was prescribed cefazolin."
    tts = riva.client.SpeechSynthesisService(auth_for(MAGPIE_FUNCTION_ID, api_key))
    pcm = b"".join(c.audio for c in tts.synthesize_online(
        text="The patient was prescribed cefazolin.",
        voice_name="Magpie-Multilingual.EN-US.Mia",
        language_code="en-US", sample_rate_hz=16000,
    ))
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        with wave.open(f, "wb") as w:
            w.setnchannels(1); w.setsampwidth(2); w.setframerate(16000); w.writeframes(pcm)
        wav_path = f.name

    # 2. ASR: transcribe the WAV we just synthesized.
    asr = riva.client.ASRService(auth_for(PARAKEET_FUNCTION_ID, api_key))
    with open(wav_path, "rb") as f:
        audio_bytes = f.read()
    config = riva.client.RecognitionConfig(
        encoding=riva.client.AudioEncoding.LINEAR_PCM,
        sample_rate_hertz=16000, language_code="en-US",
        max_alternatives=1, enable_automatic_punctuation=True,
    )
    response = asr.offline_recognize(audio_bytes, config)
    transcript = response.results[0].alternatives[0].transcript if response.results else ""
    print(f"TTS:  The patient was prescribed cefazolin.")
    print(f"ASR:  {transcript}")
    return transcript
NVIDIA_API_KEY
处理规则——至关重要,请勿偏离:
  • Agent工具从shell读取
    $NVIDIA_API_KEY
    ,并将其作为显式函数参数传递给
    smoke_test(api_key=…)
  • 审计人员可通过grep方案查看所有密钥传输——每个
    api_key
    的使用都在
    auth_for(...)
    中可见。
  • 请勿
    echo
    print
    或记录密钥值(包括截断后的内容)。仅检查长度是允许的(见§1a)。
  • 请勿让方案自行读取
    os.environ["NVIDIA_API_KEY"]
    ——显式参数模式是可审计性的保障。
  • 请勿将密钥提交到任何文件中,包括
    .env
    示例或笔记本输出。
在进入下一阶段前,需验证
NVIDIA_API_KEY
可正常用于Magpie TTS和Parakeet/Nemotron ASR。四个技能内嵌了所有所需方案;此次往返测试仅用于确认API密钥和网络路径有效。
Agent工具加载
NVIDIA_API_KEY
shell变量,并将其作为显式函数参数传递给下方的辅助函数。方案代码本身不会读取环境变量——审计人员可清楚看到哪些API密钥会通过网络传输。
python
import wave, tempfile
import riva.client

NVCF_HOST = "grpc.nvcf.nvidia.com:443"
MAGPIE_FUNCTION_ID    = "877104f7-e885-42b9-8de8-f6e4c6303969"   # Magpie TTS
PARAKEET_FUNCTION_ID  = "d3fe9151-442b-4204-a70d-5fcc597fd610"   # Parakeet TDT 0.6B v2 (offline ASR)

def auth_for(function_id: str, api_key: str) -> riva.client.Auth:
    return riva.client.Auth(
        use_ssl=True, uri=NVCF_HOST,
        metadata_args=[
            ["function-id", function_id],
            ["authorization", f"Bearer {api_key}"],
        ],
    )

def smoke_test(api_key: str) -> str:
    """Caller passes api_key (the harness reads $NVIDIA_API_KEY at the shell;
    this code never touches the environment). Returns the ASR transcript."""

    # 1. TTS: "The patient was prescribed cefazolin."
    tts = riva.client.SpeechSynthesisService(auth_for(MAGPIE_FUNCTION_ID, api_key))
    pcm = b"".join(c.audio for c in tts.synthesize_online(
        text="The patient was prescribed cefazolin.",
        voice_name="Magpie-Multilingual.EN-US.Mia",
        language_code="en-US", sample_rate_hz=16000,
    ))
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        with wave.open(f, "wb") as w:
            w.setnchannels(1); w.setsampwidth(2); w.setframerate(16000); w.writeframes(pcm)
        wav_path = f.name

    # 2. ASR: transcribe the WAV we just synthesized.
    asr = riva.client.ASRService(auth_for(PARAKEET_FUNCTION_ID, api_key))
    with open(wav_path, "rb") as f:
        audio_bytes = f.read()
    config = riva.client.RecognitionConfig(
        encoding=riva.client.AudioEncoding.LINEAR_PCM,
        sample_rate_hertz=16000, language_code="en-US",
        max_alternatives=1, enable_automatic_punctuation=True,
    )
    response = asr.offline_recognize(audio_bytes, config)
    transcript = response.results[0].alternatives[0].transcript if response.results else ""
    print(f"TTS:  The patient was prescribed cefazolin.")
    print(f"ASR:  {transcript}")
    return transcript

Invoke from the agent (api_key sourced by the harness, not by this code):

Invoke from the agent (api_key sourced by the harness, not by this code):

smoke_test(api_key="<NVIDIA_API_KEY value>")

smoke_test(api_key="<NVIDIA_API_KEY value>")


**Run the smoke test — don't defer it.** This is the gate that proves Stages 2–4 can reach the hosted stack with the user's current key. "I can run it later" is not an acceptable completion of Stage 1; either invoke `smoke_test(api_key=…)` now or, if the user has explicitly opted out, log the deferral in your closing summary so they know what they're missing.

If the transcript matches the input within ~1 token, the hosted stack is reachable and the user can advance to Stage 2. If either call fails:

- `401 Unauthorized` / `PERMISSION_DENIED` → `NVIDIA_API_KEY` is wrong, expired, or not exported in this shell. Re-export and re-test.
- `404` / `INVALID_ARGUMENT: function not found` → the function ID is stale. Look up the current ID at <https://build.nvidia.com> and update the constant above.
- `RESOURCE_EXHAUSTED` → NVCF rate limit. Retry after 30 seconds; this is normal under load.
- Network/TLS errors → corporate proxy or DNS issue. Test `curl https://build.nvidia.com` first.

**运行冒烟测试——请勿延迟。** 这是证明第二至第四阶段可使用用户当前密钥访问托管栈的关卡。“我以后再运行”不是第一阶段完成的可接受理由;要么立即调用`smoke_test(api_key=…)`,如果用户明确选择跳过,需在结束总结中记录延迟情况,让用户了解其遗漏的内容。

如果转录文本与输入内容相差不超过约1个token,说明托管栈可访问,用户可进入第二阶段。如果任一调用失败:

- `401 Unauthorized` / `PERMISSION_DENIED` → `NVIDIA_API_KEY`错误、过期或未在此shell中导出。重新导出并重新测试。
- `404` / `INVALID_ARGUMENT: function not found` → 函数ID已过期。在<https://build.nvidia.com>查找当前ID并更新上方的常量。
- `RESOURCE_EXHAUSTED` → NVCF速率限制。30秒后重试;负载过高时此情况正常。
- 网络/TLS错误 → 企业代理或DNS问题。先测试`curl https://build.nvidia.com`。

1d. (Optional) Verify Merriam-Webster lookup

1d.(可选)验证Merriam-Webster查询

Two paths produce a
merriam-webster
-tagged manifest row in Stage 2. Pick one (or neither — Magpie G2P fall-through is a valid posture):
  • Path A — JSON API + key. Recommended for standalone use of this skill. Check the key is set:
    bash
    test -n "$DICTIONARY_API_KEY" && echo "DICTIONARY_API_KEY len=${#DICTIONARY_API_KEY}" \
      || echo "DICTIONARY_API_KEY not set — Path A is off"
    Free key issues instantly at https://dictionaryapi.com.
  • Path B — HTML scraping. No API key needed; reachability is the only prerequisite. Brittle to MW site HTML changes; recipe inlined in the build skill's
    references/pronunciation-pipeline.md
    .
    bash
    curl -fsS -o /dev/null -w "merriam-webster.com reachable, HTTP %{http_code}\n" \
      https://www.merriam-webster.com/medical/cefazolin
    If you don't want to maintain a scraper, use Path A instead.
Remember the data-disclosure note at the top: under either path, each clinical term in your seed list goes out as an HTTP request to a Merriam-Webster endpoint.
第二阶段有两种路径可生成带有
merriam-webster
标注的清单行。选择其中一种(或都不选——Magpie G2P fallback是有效的方案):
  • 路径A — JSON API + 密钥。 推荐单独使用本技能时采用。检查密钥是否已设置:
    bash
    test -n "$DICTIONARY_API_KEY" && echo "DICTIONARY_API_KEY len=${#DICTIONARY_API_KEY}" \
      || echo "DICTIONARY_API_KEY not set — Path A is off"
    https://dictionaryapi.com可立即获取免费密钥。
  • 路径B — HTML抓取。 无需API密钥;仅需确认可访问站点。易受MW站点HTML变更影响;方案内嵌在构建技能的
    references/pronunciation-pipeline.md
    中。
    bash
    curl -fsS -o /dev/null -w "merriam-webster.com reachable, HTTP %{http_code}\n" \
      https://www.merriam-webster.com/medical/cefazolin
    如果不想维护爬虫,请使用路径A。
请记住顶部的数据披露说明:无论采用哪种路径,种子列表中的每个临床术语都会作为HTTP请求发送至Merriam-Webster端点。

Examples

示例

Fresh shell, never run before. User says something like "I want to start the flywheel." → Quote the disclosure table first, then walk through 1a → 1b → 1c in order. On a green smoke test, point them at
/digital-health-clinical-asr-build
and explicitly name KER as the metric Stage 3 will judge them by.
Returning user, env already up. User says "I already have the env, just confirm I'm good to go." → Skip the venv +
pip install
(1b). Run only the length check (1a) and the smoke test (1c). On green, advance.
全新shell,首次运行。 用户说出类似“我想启动飞轮。”的语句 → 首先引用披露表格,然后按顺序引导完成1a → 1b → 1c。如果冒烟测试通过,指引用户前往
/digital-health-clinical-asr-build
,并明确说明KER是第三阶段的考核指标。
返回用户,环境已配置完成。 用户说出类似“我已经配置好环境,只需确认是否可以继续。”的语句 → 跳过虚拟环境和
pip install
步骤(1b)。仅运行长度检查(1a)和冒烟测试(1c)。如果通过,指引用户进入下一阶段。

Artifacts produced

生成的工件

  • NVIDIA_API_KEY
    exported in the user's shell
  • An activated virtualenv with
    nvidia-riva-client
    ,
    pandas
    ,
    soundfile
    ,
    requests
  • A confirmed TTS→ASR round-trip on a clinical sentence (proof the hosted stack works)
No manifest, audio, or model artifact is produced at this stage — those come at Stages 2–4.
  • 用户shell中已导出
    NVIDIA_API_KEY
  • 已激活的虚拟环境,包含
    nvidia-riva-client
    pandas
    soundfile
    requests
  • 已完成临床语句的TTS→ASR往返测试(证明托管栈可正常工作)
本阶段不会生成清单、音频或模型工件——这些将在第二至第四阶段生成。

Troubleshooting

故障排除

  • Length check shows nothing or
    len=0
    NVIDIA_API_KEY
    isn't exported in this shell. Run
    export NVIDIA_API_KEY=nvapi-...
    and re-check.
  • Variable is set in one shell but not another → exports don't persist across sessions. Add the
    export
    line to your shell rc (
    ~/.bashrc
    ,
    ~/.zshrc
    ), or use a per-directory loader like
    direnv
    .
  • 401 Unauthorized
    on the smoke test
    → key value is wrong or expired. Re-issue at https://build.nvidia.com.
  • grpc.RpcError: function not found
    → the inlined function IDs need updating against the current NVCF catalog. Check https://build.nvidia.com and edit the constants in 1c. The eval skill (
    /digital-health-clinical-asr-eval
    ) provides a catalog of current function IDs in its Step 3a "Other catalog options" list.
  • StatusCode.INVALID_ARGUMENT
    with
    CUDA error: an illegal memory access was encountered
    → NVCF-side backend fault on this specific function ID (Triton/PyTorch on NVCF, not your env). Either retry later or temporarily point at a different offline ASR NIM — Whisper Large v3 function-id
    b702f636-f60c-4a3d-a6f4-f3568c13bd7d
    is the closest drop-in (also offline; pass
    language_code="en"
    instead of
    "en-US"
    ). For routine eval cycles, prefer to wait for the Parakeet backend to recover so Stage 3 baseline and Stage 4 SFT base stay aligned.
  • TypeError: Auth.__init__() got an unexpected keyword argument 'ssl_cert'
    → you're on
    nvidia-riva-client >= 2.x
    where the kwarg was renamed to
    ssl_root_cert
    (and is no longer needed for hosted NVCF). Drop the
    ssl_cert=None,
    line from your local copy of the recipe.
  • ModuleNotFoundError: riva.client
    → step 1b was skipped or the venv isn't activated.
    source .venv/bin/activate && pip install nvidia-riva-client
    .
  • 长度检查无输出或显示
    len=0
    NVIDIA_API_KEY
    未在此shell中导出。执行
    export NVIDIA_API_KEY=nvapi-...
    并重新检查。
  • 变量在一个shell中已设置,但在另一个shell中未设置 → 导出的变量不会在会话间持久化。将
    export
    行添加到shell rc文件(
    ~/.bashrc
    ~/.zshrc
    )中,或使用
    direnv
    等目录级加载工具。
  • 冒烟测试出现
    401 Unauthorized
    → 密钥值错误或已过期。在https://build.nvidia.com重新申请。
  • grpc.RpcError: function not found
    → 内嵌的函数ID需要根据当前NVCF目录更新。检查https://build.nvidia.com并修改1c中的常量。评估技能(
    /digital-health-clinical-asr-eval
    )在其步骤3a“其他目录选项”列表中提供了当前函数ID的目录。
  • StatusCode.INVALID_ARGUMENT
    伴随
    CUDA error: an illegal memory access was encountered
    → 此特定函数ID在NVCF端出现后端故障(NVCF上的Triton/PyTorch问题,而非您的环境问题)。要么稍后重试,要么临时切换到其他离线ASR NIM——Whisper Large v3函数ID
    b702f636-f60c-4a3d-a6f4-f3568c13bd7d
    是最接近的替代方案(同样是离线;将
    language_code
    改为
    "en"
    而非
    "en-US"
    )。对于常规评估循环,建议等待Parakeet后端恢复,以便第三阶段基线和第四阶段SFT基准保持一致。
  • TypeError: Auth.__init__() got an unexpected keyword argument 'ssl_cert'
    → 您使用的是
    nvidia-riva-client >= 2.x
    ,该参数已重命名为
    ssl_root_cert
    (且托管NVCF不再需要此参数)。从您本地的方案副本中删除
    ssl_cert=None,
    行。
  • ModuleNotFoundError: riva.client
    → 跳过了步骤1b或未激活虚拟环境。执行
    source .venv/bin/activate && pip install nvidia-riva-client

Limitations

局限性

  • Scope is environment readiness only. Whether the user's term list or pronunciation overrides make sense is decided in
    /digital-health-clinical-asr-build
    , not here.
  • Magpie en-US assumption. Downstream IPA validation rides on Magpie's English phoneme inventory; other locales require a different phoneme set entirely.
  • Hosted NVCF is the assumed deployment. Running self-hosted Riva NIMs is possible but the setup for that lives inside
    /digital-health-clinical-asr-finetune
    Stage 4d.
  • Synthetic data only. This skill family is built for benchmarks generated from a curated term list. Real patient transcripts and recorded audio must not flow through any stage.
  • 仅负责环境就绪检查。 用户的术语列表或发音覆盖是否合理将在
    /digital-health-clinical-asr-build
    中判定,而非本阶段。
  • 默认假设使用Magpie美式英语。 下游IPA验证依赖Magpie的英语音素库;其他地区需要完全不同的音素集。
  • 默认部署方式为托管NVCF。 运行自托管Riva NIM是可行的,但相关设置在
    /digital-health-clinical-asr-finetune
    第四阶段d中。
  • 仅支持合成数据。 本技能家族专为从整理后的术语列表生成的基准测试而构建。真实患者的转录文本和录制的音频绝不能输入任何阶段。

Next steps

下一步

Mandatory close on success: finish the Stage 1 response by pointing the user explicitly to
/digital-health-clinical-asr-build
and naming KER (keyword error rate) as the headline measure they'll see at Stage 3. Both pointers are required, not optional — they place the user inside the four-stage flywheel.
  • Default forward route:
    /digital-health-clinical-asr-build
    — specialty interview, term curation, IPA tagging, NeMo manifest synthesis.
  • Direct jump to Stage 3 (only when the user is bringing their own NeMo-format manifest with
    term
    /
    entity_category
    /
    ipa_source
    fields):
    /digital-health-clinical-asr-eval
    .
成功后的强制收尾: 完成第一阶段回复时,明确指引用户前往
/digital-health-clinical-asr-build
,并指明KER(关键词错误率)是第三阶段的核心衡量指标。这两项指引都是必需的,而非可选——它们将用户纳入四阶段飞轮流程。
  • 默认前进路线:
    /digital-health-clinical-asr-build
    — 专业访谈、术语整理、IPA标注、NeMo清单合成。
  • 直接跳至第三阶段(仅当用户自带包含
    term
    /
    entity_category
    /
    ipa_source
    字段的NeMo格式清单时):
    /digital-health-clinical-asr-eval

References

参考资料

  • references/dependency-ownership.md
    — boundary between skill-owned and companion-owned responsibilities.
  • references/dependency-ownership.md
    — 技能负责范围与配套工具负责范围的边界说明。