digital-health-clinical-asr-setup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
Clinical ASR Flywheel — Stage 1 (Setup)
临床ASR飞轮 — 第一阶段(设置)
Agent: this file is the complete Stage 1 procedure. Do not invoke,find,ls, orrglooking for an installer or hidden config — there isn't one. The four sections below (outbound-data disclosure, three numbered checks, sibling hand-off) are all required reading; don't skip any. Function IDs, env-var conventions, and the smoke-test gate are inlined further down — answer from what's actually written here rather than from prior Riva/NVCF familiarity.grep
Stage 1 has one job: prove the user can reach NVIDIA's hosted speech stack with the they currently hold. Once a single clinical sentence round-trips through Magpie TTS → Parakeet/Nemotron ASR successfully, the user is cleared to advance to .
NVIDIA_API_KEY/digital-health-clinical-asr-buildThe four-stage flywheel exists to drive down KER (keyword error rate) on clinical entities — drugs, procedures, anatomy, conditions, labs, roles. WER averages obscure the failures that hurt clinically; KER is what Stage 3 will measure you against.
There is no installer script anywhere in this skill — not , not , nothing hidden. Stage 1 is the three steps below: verify the key, install Python deps, run the smoke test. Anything past Stage 1 is composed from sibling skills (, , the inlined Stage 3 ASR recipe, ). If a user asks "what script installs everything?", answer from this paragraph; don't go searching.
install.shsetup.py/data-designer/riva-tts/riva-asr-customAgent:此文件包含完整的第一阶段流程。 不要调用、find、ls或rg查找安装程序或隐藏配置——不存在此类内容。以下四个部分(对外数据披露、三项编号检查、后续技能交接)均需阅读,请勿跳过。函数ID、环境变量约定和冒烟测试关卡在下方内嵌说明——请根据此处实际内容作答,而非依赖之前对Riva/NVCF的了解。grep
第一阶段的目标只有一个:验证用户能否使用当前持有的访问NVIDIA托管的语音栈。当一句临床语句成功完成Magpie TTS → Parakeet/Nemotron ASR的往返测试后,用户即可进入下一阶段。
NVIDIA_API_KEY/digital-health-clinical-asr-build四阶段飞轮的存在是为了降低临床实体(药物、手术、解剖结构、病症、检验、角色)的KER(关键词错误率)。词错误率(WER)平均值会掩盖对临床场景有害的错误;KER是第三阶段的考核指标。
本技能中不存在任何安装脚本——没有,没有,也没有隐藏的安装程序。第一阶段就是以下三个步骤:验证密钥、安装Python依赖、运行冒烟测试。第一阶段之后的所有内容都由兄弟技能(、、内嵌的第三阶段ASR方案、)组成。如果用户询问“哪个脚本可以安装所有内容?”,请根据本段内容作答,不要去搜索其他内容。
install.shsetup.py/data-designer/riva-tts/riva-asr-customOutbound data flows — surface before any text or audio is sent
对外数据流 — 在发送任何文本或音频前披露
Two external endpoints receive data during this flywheel. The user has to acknowledge both before Stage 2 begins, against whatever data-governance policy their organization enforces. Render the table below word-for-word in your response — a paraphrase doesn't satisfy the disclosure; the literal phrasing is what counts.
| Service | What gets sent | When | Hosted by |
|---|---|---|---|
NVIDIA NVCF ( | The clinical sentences you synthesize (text), and the WAV files you transcribe (audio) | Every Stage 2 TTS call and every Stage 3 ASR call | NVIDIA, governed by build.nvidia.com terms |
Merriam-Webster ( | Individual clinical terms (drug names, anatomy, procedures), one HTTP request per term | Stage 2 IPA tagging — see "Two MW paths" below for which endpoint applies | Merriam-Webster, governed by their API or site terms |
The data is synthetic by construction — the flywheel manufactures sentences and audio from a user-curated term list, never from real patient encounters. That said: do not feed real patient transcripts, recorded clinical audio, or any PHI through any stage. If the term list itself contains sensitive material (codename drugs, unreleased product names), the user should consult their organization's external-API policy before proceeding. Either endpoint can be turned off:
- Skip Merriam-Webster entirely: leave unset and don't run a scraper. Stage 2 falls back to Magpie G2P, which still works but with weaker coverage on long-tail clinical terms.
DICTIONARY_API_KEY - Skip NVCF: this is a hard stop. Magpie TTS + Parakeet/Nemotron ASR are the workload; without them this skill family is the wrong tool — a self-hosted ASR/TTS pipeline is what you want instead.
Recommend a copy of this notice lands in the user's workspace ; bring it forward on first invocation if it isn't already there.
README.md在此飞轮运行期间,数据会发送至两个外部端点。用户必须在第二阶段开始前确认这一点,且需符合其组织的数据治理政策。请在回复中一字不差地呈现下方表格——转述不符合披露要求,必须使用原文表述。
| 服务 | 发送内容 | 发送时机 | 托管方 |
|---|---|---|---|
NVIDIA NVCF ( | 您合成的临床语句(文本)和转录的WAV文件(音频) | 每次第二阶段的TTS调用和第三阶段的ASR调用 | NVIDIA,受build.nvidia.com条款约束 |
Merriam-Webster ( | 单个临床术语(药物名称、解剖结构、手术名称),每个术语对应一次HTTP请求 | 第二阶段的IPA标注——请参考下方“两种MW路径”确定适用的端点 | Merriam-Webster,受其API或站点条款约束 |
这些数据本质上是合成的——飞轮从用户整理的术语列表生成语句和音频,绝不会使用真实患者的诊疗记录。尽管如此:请勿将真实患者的转录文本、录制的临床音频或任何受保护健康信息(PHI)输入任何阶段。 如果术语列表本身包含敏感内容(代号药物、未发布产品名称),用户应在继续操作前咨询其组织的外部API政策。任一端点均可关闭:
- 完全跳过Merriam-Webster: 不设置且不运行爬虫。第二阶段将 fallback 到Magpie G2P,该方案仍可运行,但对长尾临床术语的覆盖能力较弱。
DICTIONARY_API_KEY - 跳过NVCF: 这会导致流程直接终止。Magpie TTS + Parakeet/Nemotron ASR是核心工作负载;没有它们,本技能家族并非合适工具——您需要的是自托管的ASR/TTS管道。
建议将本通知副本添加到用户工作区的中;如果首次调用时未存在,请主动提供。
README.mdPurpose
目标
Get a fresh environment ready for Stage 2. Three things to confirm: key is present, deps import cleanly, hosted stack actually answers. Close by naming which skill to run next.
The four skills are self-contained — every TTS, ASR, IPA-tagging, and scoring recipe lives inside them; no other agent skill needs installing to run the flywheel end-to-end.
digital-health-clinical-asr-*This skill takes no opinion on workspace layout. The user decides where their cycle artifacts live; is not imposed.
data/eval_sets/cycle<N>/为第二阶段准备全新环境。需确认三件事:密钥已存在、依赖项可正常导入、托管栈可正常响应。最后指明下一步应运行的技能。
四个技能是独立封装的——所有TTS、ASR、IPA标注和评分方案都包含在其中;无需安装其他Agent技能即可完成飞轮的端到端运行。
digital-health-clinical-asr-*本技能不限制工作区布局。用户可自行决定循环工件的存储位置;不会强制使用路径。
data/eval_sets/cycle<N>/When to use this skill
何时使用本技能
Activate on user phrases like:
- "Set up the Clinical ASR Flywheel"
- "Initialize the clinical-asr eval"
- "I want to evaluate ASR on clinical terminology — where do I start?"
- "Bootstrap my environment for the flywheel"
- "What do I need installed before I run the flywheel?"
Do not activate when:
- The user already has a manifest and wants to score it →
/digital-health-clinical-asr-eval - The user already has the env set up and wants to curate terms →
/digital-health-clinical-asr-build - The user is asking about Stage 4 fine-tune NGC/Docker setup specifically → that's covered inside
/digital-health-clinical-asr-finetune
当用户说出以下类似语句时激活:
- "设置临床ASR飞轮"
- "初始化临床ASR评估"
- "我想评估ASR在临床术语上的表现——从哪里开始?"
- "为飞轮引导我的环境"
- "运行飞轮前需要安装什么?"
请勿在以下场景激活:
- 用户已有清单并希望评分 →
/digital-health-clinical-asr-eval - 用户已完成环境设置并希望整理术语 →
/digital-health-clinical-asr-build - 用户专门询问第四阶段微调NGC/Docker设置 → 该内容在中覆盖
/digital-health-clinical-asr-finetune
Prerequisites
先决条件
| Requirement | Required? | Why | How |
|---|---|---|---|
| Required | Hosted Magpie TTS + Parakeet/Nemotron ASR via NVCF | Issue at https://build.nvidia.com; |
| Python ≥ 3.10 | Required | NeMo client, scoring, manifest tools | |
| Required | TTS + ASR clients, manifest I/O, MW lookup | |
| Optional | Merriam-Webster Medical Dictionary lookup via the JSON API (Path A in the build skill — recommended) | Free key at https://dictionaryapi.com. Path B (HTML scrape of |
| Optional | Reference WER/CER against the inlined Levenshtein implementation | |
(Stage 4 only) | Optional, deferred | Fine-tune workload | Set up inside |
| 要求 | 是否必需? | 原因 | 实现方式 |
|---|---|---|---|
| 必需 | 通过NVCF访问托管的Magpie TTS + Parakeet/Nemotron ASR | 在https://build.nvidia.com申请;在shell中执行 |
| Python ≥ 3.10 | 必需 | NeMo客户端、评分工具、清单工具 | 执行 |
| 必需 | TTS + ASR客户端、清单输入输出、MW查询 | 执行 |
| 可选 | 通过JSON API查询Merriam-Webster医学词典(构建技能中的路径A——推荐) | 在https://dictionaryapi.com免费获取密钥。如果无法获取密钥,构建技能中还记录了路径B(抓取 |
| 可选 | 对比内嵌Levenshtein实现的参考WER/CER | 执行 |
(仅第四阶段) | 可选,延迟安装 | 微调工作负载 | 在 |
Instructions
操作说明
Scope. This skill performs read-only environment checks: confirming a key is exported (length-only), the Python version, that libraries import, and that the hosted NVCF stack responds to a single smoke-test round-trip. It does not install system packages, modify shell rc files, write to disk outside an explicit , or attempt to authenticate with the real key value. Validate; never mutate without explicit user direction.
.venv/范围:本技能仅执行只读环境检查:确认密钥已导出(仅检查长度)、Python版本符合要求、库可正常导入、托管NVCF栈可响应单次冒烟测试往返请求。本技能不会安装系统包、修改shell rc文件、在显式的之外写入磁盘,也不会尝试使用真实密钥值进行认证。仅做验证;未经用户明确指示,绝不修改环境。
.venv/1a. Verify NVIDIA_API_KEY
(length-only — never echo the value)
NVIDIA_API_KEY1a. 验证NVIDIA_API_KEY
(仅检查长度——绝不回显密钥值)
NVIDIA_API_KEYbash
undefinedbash
undefinedExport NVIDIA_API_KEY in your shell — never echo or commit the value
Export NVIDIA_API_KEY in your shell — never echo or commit the value
export NVIDIA_API_KEY=nvapi-... # from https://build.nvidia.com
export NVIDIA_API_KEY=nvapi-... # from https://build.nvidia.com
Length-only check; the key value never appears in any log
Length-only check; the key value never appears in any log
test -n "$NVIDIA_API_KEY" && echo "NVIDIA_API_KEY len=${#NVIDIA_API_KEY}"
A length of 70+ is normal. If the output is empty or shows `len=0`, the user must paste a key from <https://build.nvidia.com>. Do **not** print the key, even truncated. To persist across shell sessions, add the `export` line to your shell rc (`~/.bashrc`, `~/.zshrc`) — or use a per-directory tool like `direnv`.test -n "$NVIDIA_API_KEY" && echo "NVIDIA_API_KEY len=${#NVIDIA_API_KEY}"
正常长度为70以上。如果输出为空或显示`len=0`,用户必须从<https://build.nvidia.com>获取密钥。**请勿**打印密钥,即使是截断后的内容。如需在shell会话间持久化,可将`export`行添加到shell rc文件(`~/.bashrc`、`~/.zshrc`)中——或使用`direnv`等目录级工具。1b. Install Python dependencies
1b. 安装Python依赖项
bash
python3 -m venv .venv
source .venv/bin/activate
pip install nvidia-riva-client pandas soundfile requestsbash
python3 -m venv .venv
source .venv/bin/activate
pip install nvidia-riva-client pandas soundfile requestsoptional
optional
pip install jiwer
For Stage 4 (fine-tune) only: `nemo-toolkit` and Docker + NVIDIA Container Toolkit are also required. Defer those to `/digital-health-clinical-asr-finetune` — there is no point installing them up front if the user may never reach Stage 4.pip install jiwer
仅第四阶段(微调)需要:还需安装`nemo-toolkit`和Docker + NVIDIA容器工具包。将这些安装步骤延迟到`/digital-health-clinical-asr-finetune`中——如果用户可能永远不会进入第四阶段,提前安装毫无意义。1c. Smoke-test the hosted NVCF stack
1c. 冒烟测试托管NVCF栈
NVIDIA_API_KEY- The agent harness reads from the shell and passes it as an explicit function argument to
$NVIDIA_API_KEY.smoke_test(api_key=…) - Auditors can grep the recipe for every wire crossing — every use is visible in
api_key.auth_for(...) - Do not ,
echo, or log the key value (including truncated). Length-only checks are fine (see §1a).print - Do not let the recipe read itself — the explicit-argument pattern is the auditability guarantee.
os.environ["NVIDIA_API_KEY"] - Do not commit the key to any file, including examples or notebook outputs.
.env
Verify the actually works against Magpie TTS and Parakeet/Nemotron ASR before advancing. The four skills inline every recipe needed; this round-trip just confirms the API key + network path are real.
NVIDIA_API_KEYThe agent harness loads the shell variable and passes it as an explicit function argument to the helpers below. The recipe code itself does not read environment variables — auditors can see exactly which API keys cross the wire.
NVIDIA_API_KEYpython
import wave, tempfile
import riva.client
NVCF_HOST = "grpc.nvcf.nvidia.com:443"
MAGPIE_FUNCTION_ID = "877104f7-e885-42b9-8de8-f6e4c6303969" # Magpie TTS
PARAKEET_FUNCTION_ID = "d3fe9151-442b-4204-a70d-5fcc597fd610" # Parakeet TDT 0.6B v2 (offline ASR)
def auth_for(function_id: str, api_key: str) -> riva.client.Auth:
return riva.client.Auth(
use_ssl=True, uri=NVCF_HOST,
metadata_args=[
["function-id", function_id],
["authorization", f"Bearer {api_key}"],
],
)
def smoke_test(api_key: str) -> str:
"""Caller passes api_key (the harness reads $NVIDIA_API_KEY at the shell;
this code never touches the environment). Returns the ASR transcript."""
# 1. TTS: "The patient was prescribed cefazolin."
tts = riva.client.SpeechSynthesisService(auth_for(MAGPIE_FUNCTION_ID, api_key))
pcm = b"".join(c.audio for c in tts.synthesize_online(
text="The patient was prescribed cefazolin.",
voice_name="Magpie-Multilingual.EN-US.Mia",
language_code="en-US", sample_rate_hz=16000,
))
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
with wave.open(f, "wb") as w:
w.setnchannels(1); w.setsampwidth(2); w.setframerate(16000); w.writeframes(pcm)
wav_path = f.name
# 2. ASR: transcribe the WAV we just synthesized.
asr = riva.client.ASRService(auth_for(PARAKEET_FUNCTION_ID, api_key))
with open(wav_path, "rb") as f:
audio_bytes = f.read()
config = riva.client.RecognitionConfig(
encoding=riva.client.AudioEncoding.LINEAR_PCM,
sample_rate_hertz=16000, language_code="en-US",
max_alternatives=1, enable_automatic_punctuation=True,
)
response = asr.offline_recognize(audio_bytes, config)
transcript = response.results[0].alternatives[0].transcript if response.results else ""
print(f"TTS: The patient was prescribed cefazolin.")
print(f"ASR: {transcript}")
return transcriptNVIDIA_API_KEY- Agent工具从shell读取,并将其作为显式函数参数传递给
$NVIDIA_API_KEY。smoke_test(api_key=…) - 审计人员可通过grep方案查看所有密钥传输——每个的使用都在
api_key中可见。auth_for(...) - 请勿、
echo或记录密钥值(包括截断后的内容)。仅检查长度是允许的(见§1a)。print - 请勿让方案自行读取——显式参数模式是可审计性的保障。
os.environ["NVIDIA_API_KEY"] - 请勿将密钥提交到任何文件中,包括示例或笔记本输出。
.env
在进入下一阶段前,需验证可正常用于Magpie TTS和Parakeet/Nemotron ASR。四个技能内嵌了所有所需方案;此次往返测试仅用于确认API密钥和网络路径有效。
NVIDIA_API_KEYAgent工具加载 shell变量,并将其作为显式函数参数传递给下方的辅助函数。方案代码本身不会读取环境变量——审计人员可清楚看到哪些API密钥会通过网络传输。
NVIDIA_API_KEYpython
import wave, tempfile
import riva.client
NVCF_HOST = "grpc.nvcf.nvidia.com:443"
MAGPIE_FUNCTION_ID = "877104f7-e885-42b9-8de8-f6e4c6303969" # Magpie TTS
PARAKEET_FUNCTION_ID = "d3fe9151-442b-4204-a70d-5fcc597fd610" # Parakeet TDT 0.6B v2 (offline ASR)
def auth_for(function_id: str, api_key: str) -> riva.client.Auth:
return riva.client.Auth(
use_ssl=True, uri=NVCF_HOST,
metadata_args=[
["function-id", function_id],
["authorization", f"Bearer {api_key}"],
],
)
def smoke_test(api_key: str) -> str:
"""Caller passes api_key (the harness reads $NVIDIA_API_KEY at the shell;
this code never touches the environment). Returns the ASR transcript."""
# 1. TTS: "The patient was prescribed cefazolin."
tts = riva.client.SpeechSynthesisService(auth_for(MAGPIE_FUNCTION_ID, api_key))
pcm = b"".join(c.audio for c in tts.synthesize_online(
text="The patient was prescribed cefazolin.",
voice_name="Magpie-Multilingual.EN-US.Mia",
language_code="en-US", sample_rate_hz=16000,
))
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
with wave.open(f, "wb") as w:
w.setnchannels(1); w.setsampwidth(2); w.setframerate(16000); w.writeframes(pcm)
wav_path = f.name
# 2. ASR: transcribe the WAV we just synthesized.
asr = riva.client.ASRService(auth_for(PARAKEET_FUNCTION_ID, api_key))
with open(wav_path, "rb") as f:
audio_bytes = f.read()
config = riva.client.RecognitionConfig(
encoding=riva.client.AudioEncoding.LINEAR_PCM,
sample_rate_hertz=16000, language_code="en-US",
max_alternatives=1, enable_automatic_punctuation=True,
)
response = asr.offline_recognize(audio_bytes, config)
transcript = response.results[0].alternatives[0].transcript if response.results else ""
print(f"TTS: The patient was prescribed cefazolin.")
print(f"ASR: {transcript}")
return transcriptInvoke from the agent (api_key sourced by the harness, not by this code):
Invoke from the agent (api_key sourced by the harness, not by this code):
smoke_test(api_key="<NVIDIA_API_KEY value>")
smoke_test(api_key="<NVIDIA_API_KEY value>")
**Run the smoke test — don't defer it.** This is the gate that proves Stages 2–4 can reach the hosted stack with the user's current key. "I can run it later" is not an acceptable completion of Stage 1; either invoke `smoke_test(api_key=…)` now or, if the user has explicitly opted out, log the deferral in your closing summary so they know what they're missing.
If the transcript matches the input within ~1 token, the hosted stack is reachable and the user can advance to Stage 2. If either call fails:
- `401 Unauthorized` / `PERMISSION_DENIED` → `NVIDIA_API_KEY` is wrong, expired, or not exported in this shell. Re-export and re-test.
- `404` / `INVALID_ARGUMENT: function not found` → the function ID is stale. Look up the current ID at <https://build.nvidia.com> and update the constant above.
- `RESOURCE_EXHAUSTED` → NVCF rate limit. Retry after 30 seconds; this is normal under load.
- Network/TLS errors → corporate proxy or DNS issue. Test `curl https://build.nvidia.com` first.
**运行冒烟测试——请勿延迟。** 这是证明第二至第四阶段可使用用户当前密钥访问托管栈的关卡。“我以后再运行”不是第一阶段完成的可接受理由;要么立即调用`smoke_test(api_key=…)`,如果用户明确选择跳过,需在结束总结中记录延迟情况,让用户了解其遗漏的内容。
如果转录文本与输入内容相差不超过约1个token,说明托管栈可访问,用户可进入第二阶段。如果任一调用失败:
- `401 Unauthorized` / `PERMISSION_DENIED` → `NVIDIA_API_KEY`错误、过期或未在此shell中导出。重新导出并重新测试。
- `404` / `INVALID_ARGUMENT: function not found` → 函数ID已过期。在<https://build.nvidia.com>查找当前ID并更新上方的常量。
- `RESOURCE_EXHAUSTED` → NVCF速率限制。30秒后重试;负载过高时此情况正常。
- 网络/TLS错误 → 企业代理或DNS问题。先测试`curl https://build.nvidia.com`。1d. (Optional) Verify Merriam-Webster lookup
1d.(可选)验证Merriam-Webster查询
Two paths produce a -tagged manifest row in Stage 2. Pick one (or neither — Magpie G2P fall-through is a valid posture):
merriam-webster-
Path A — JSON API + key. Recommended for standalone use of this skill. Check the key is set:bash
test -n "$DICTIONARY_API_KEY" && echo "DICTIONARY_API_KEY len=${#DICTIONARY_API_KEY}" \ || echo "DICTIONARY_API_KEY not set — Path A is off"Free key issues instantly at https://dictionaryapi.com. -
Path B — HTML scraping. No API key needed; reachability is the only prerequisite. Brittle to MW site HTML changes; recipe inlined in the build skill's.
references/pronunciation-pipeline.mdbashcurl -fsS -o /dev/null -w "merriam-webster.com reachable, HTTP %{http_code}\n" \ https://www.merriam-webster.com/medical/cefazolinIf you don't want to maintain a scraper, use Path A instead.
Remember the data-disclosure note at the top: under either path, each clinical term in your seed list goes out as an HTTP request to a Merriam-Webster endpoint.
第二阶段有两种路径可生成带有标注的清单行。选择其中一种(或都不选——Magpie G2P fallback是有效的方案):
merriam-webster-
路径A — JSON API + 密钥。 推荐单独使用本技能时采用。检查密钥是否已设置:bash
test -n "$DICTIONARY_API_KEY" && echo "DICTIONARY_API_KEY len=${#DICTIONARY_API_KEY}" \ || echo "DICTIONARY_API_KEY not set — Path A is off"在https://dictionaryapi.com可立即获取免费密钥。 -
路径B — HTML抓取。 无需API密钥;仅需确认可访问站点。易受MW站点HTML变更影响;方案内嵌在构建技能的中。
references/pronunciation-pipeline.mdbashcurl -fsS -o /dev/null -w "merriam-webster.com reachable, HTTP %{http_code}\n" \ https://www.merriam-webster.com/medical/cefazolin如果不想维护爬虫,请使用路径A。
请记住顶部的数据披露说明:无论采用哪种路径,种子列表中的每个临床术语都会作为HTTP请求发送至Merriam-Webster端点。
Examples
示例
Fresh shell, never run before. User says something like "I want to start the flywheel." → Quote the disclosure table first, then walk through 1a → 1b → 1c in order. On a green smoke test, point them at and explicitly name KER as the metric Stage 3 will judge them by.
/digital-health-clinical-asr-buildReturning user, env already up. User says "I already have the env, just confirm I'm good to go." → Skip the venv + (1b). Run only the length check (1a) and the smoke test (1c). On green, advance.
pip install全新shell,首次运行。 用户说出类似“我想启动飞轮。”的语句 → 首先引用披露表格,然后按顺序引导完成1a → 1b → 1c。如果冒烟测试通过,指引用户前往,并明确说明KER是第三阶段的考核指标。
/digital-health-clinical-asr-build返回用户,环境已配置完成。 用户说出类似“我已经配置好环境,只需确认是否可以继续。”的语句 → 跳过虚拟环境和步骤(1b)。仅运行长度检查(1a)和冒烟测试(1c)。如果通过,指引用户进入下一阶段。
pip installArtifacts produced
生成的工件
- exported in the user's shell
NVIDIA_API_KEY - An activated virtualenv with ,
nvidia-riva-client,pandas,soundfilerequests - A confirmed TTS→ASR round-trip on a clinical sentence (proof the hosted stack works)
No manifest, audio, or model artifact is produced at this stage — those come at Stages 2–4.
- 用户shell中已导出
NVIDIA_API_KEY - 已激活的虚拟环境,包含、
nvidia-riva-client、pandas、soundfilerequests - 已完成临床语句的TTS→ASR往返测试(证明托管栈可正常工作)
本阶段不会生成清单、音频或模型工件——这些将在第二至第四阶段生成。
Troubleshooting
故障排除
- Length check shows nothing or →
len=0isn't exported in this shell. RunNVIDIA_API_KEYand re-check.export NVIDIA_API_KEY=nvapi-... - Variable is set in one shell but not another → exports don't persist across sessions. Add the line to your shell rc (
export,~/.bashrc), or use a per-directory loader like~/.zshrc.direnv - on the smoke test → key value is wrong or expired. Re-issue at https://build.nvidia.com.
401 Unauthorized - → the inlined function IDs need updating against the current NVCF catalog. Check https://build.nvidia.com and edit the constants in 1c. The eval skill (
grpc.RpcError: function not found) provides a catalog of current function IDs in its Step 3a "Other catalog options" list./digital-health-clinical-asr-eval - with
StatusCode.INVALID_ARGUMENT→ NVCF-side backend fault on this specific function ID (Triton/PyTorch on NVCF, not your env). Either retry later or temporarily point at a different offline ASR NIM — Whisper Large v3 function-idCUDA error: an illegal memory access was encounteredis the closest drop-in (also offline; passb702f636-f60c-4a3d-a6f4-f3568c13bd7dinstead oflanguage_code="en"). For routine eval cycles, prefer to wait for the Parakeet backend to recover so Stage 3 baseline and Stage 4 SFT base stay aligned."en-US" - → you're on
TypeError: Auth.__init__() got an unexpected keyword argument 'ssl_cert'where the kwarg was renamed tonvidia-riva-client >= 2.x(and is no longer needed for hosted NVCF). Drop thessl_root_certline from your local copy of the recipe.ssl_cert=None, - → step 1b was skipped or the venv isn't activated.
ModuleNotFoundError: riva.client.source .venv/bin/activate && pip install nvidia-riva-client
- 长度检查无输出或显示→
len=0未在此shell中导出。执行NVIDIA_API_KEY并重新检查。export NVIDIA_API_KEY=nvapi-... - 变量在一个shell中已设置,但在另一个shell中未设置 → 导出的变量不会在会话间持久化。将行添加到shell rc文件(
export、~/.bashrc)中,或使用~/.zshrc等目录级加载工具。direnv - 冒烟测试出现→ 密钥值错误或已过期。在https://build.nvidia.com重新申请。
401 Unauthorized - → 内嵌的函数ID需要根据当前NVCF目录更新。检查https://build.nvidia.com并修改1c中的常量。评估技能(
grpc.RpcError: function not found)在其步骤3a“其他目录选项”列表中提供了当前函数ID的目录。/digital-health-clinical-asr-eval - 伴随
StatusCode.INVALID_ARGUMENT→ 此特定函数ID在NVCF端出现后端故障(NVCF上的Triton/PyTorch问题,而非您的环境问题)。要么稍后重试,要么临时切换到其他离线ASR NIM——Whisper Large v3函数IDCUDA error: an illegal memory access was encountered是最接近的替代方案(同样是离线;将b702f636-f60c-4a3d-a6f4-f3568c13bd7d改为language_code而非"en")。对于常规评估循环,建议等待Parakeet后端恢复,以便第三阶段基线和第四阶段SFT基准保持一致。"en-US" - → 您使用的是
TypeError: Auth.__init__() got an unexpected keyword argument 'ssl_cert',该参数已重命名为nvidia-riva-client >= 2.x(且托管NVCF不再需要此参数)。从您本地的方案副本中删除ssl_root_cert行。ssl_cert=None, - → 跳过了步骤1b或未激活虚拟环境。执行
ModuleNotFoundError: riva.client。source .venv/bin/activate && pip install nvidia-riva-client
Limitations
局限性
- Scope is environment readiness only. Whether the user's term list or pronunciation overrides make sense is decided in , not here.
/digital-health-clinical-asr-build - Magpie en-US assumption. Downstream IPA validation rides on Magpie's English phoneme inventory; other locales require a different phoneme set entirely.
- Hosted NVCF is the assumed deployment. Running self-hosted Riva NIMs is possible but the setup for that lives inside Stage 4d.
/digital-health-clinical-asr-finetune - Synthetic data only. This skill family is built for benchmarks generated from a curated term list. Real patient transcripts and recorded audio must not flow through any stage.
- 仅负责环境就绪检查。 用户的术语列表或发音覆盖是否合理将在中判定,而非本阶段。
/digital-health-clinical-asr-build - 默认假设使用Magpie美式英语。 下游IPA验证依赖Magpie的英语音素库;其他地区需要完全不同的音素集。
- 默认部署方式为托管NVCF。 运行自托管Riva NIM是可行的,但相关设置在第四阶段d中。
/digital-health-clinical-asr-finetune - 仅支持合成数据。 本技能家族专为从整理后的术语列表生成的基准测试而构建。真实患者的转录文本和录制的音频绝不能输入任何阶段。
Next steps
下一步
Mandatory close on success: finish the Stage 1 response by pointing the user explicitly to and naming KER (keyword error rate) as the headline measure they'll see at Stage 3. Both pointers are required, not optional — they place the user inside the four-stage flywheel.
/digital-health-clinical-asr-build- Default forward route: — specialty interview, term curation, IPA tagging, NeMo manifest synthesis.
/digital-health-clinical-asr-build - Direct jump to Stage 3 (only when the user is bringing their own NeMo-format manifest with /
term/entity_categoryfields):ipa_source./digital-health-clinical-asr-eval
成功后的强制收尾: 完成第一阶段回复时,明确指引用户前往,并指明KER(关键词错误率)是第三阶段的核心衡量指标。这两项指引都是必需的,而非可选——它们将用户纳入四阶段飞轮流程。
/digital-health-clinical-asr-build- 默认前进路线: — 专业访谈、术语整理、IPA标注、NeMo清单合成。
/digital-health-clinical-asr-build - 直接跳至第三阶段(仅当用户自带包含/
term/entity_category字段的NeMo格式清单时):ipa_source。/digital-health-clinical-asr-eval
References
参考资料
- — boundary between skill-owned and companion-owned responsibilities.
references/dependency-ownership.md
- — 技能负责范围与配套工具负责范围的边界说明。
references/dependency-ownership.md