vosk
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVosk Offline STT
Vosk 离线语音转文本(STT)
Use this skill when the agent must operate without internet connectivity, or when user privacy requirements prohibit sending audio to external APIs. Vosk provides fully offline speech recognition after the model is downloaded once.
Prefer this over cloud STT providers when operating in air-gapped environments, on-premise deployments, or when the user explicitly requests zero-cloud voice processing.
当Agent必须在无网络环境下运行,或者用户隐私要求禁止将音频发送至外部API时,可使用该技能。Vosk模型仅需下载一次,即可实现完全离线的语音识别。
在隔离环境、本地部署场景下,或者用户明确要求零云端语音处理时,优先选择该方案而非云端STT服务提供商。
Setup
安装配置
Download a Vosk model and place it at (default), or set in .
~/.agentos/models/vosk/modelPathproviderOptionssh
undefined下载Vosk模型并放置在默认路径,或者在中设置参数。
~/.agentos/models/vosk/providerOptionsmodelPathsh
undefinedExample: download the small English model
Example: download the small English model
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/
undefinedwget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/
undefinedConfiguration
配置说明
json
{
"voice": {
"stt": "vosk"
}
}With a custom model path:
json
{
"voice": {
"stt": "vosk",
"providerOptions": {
"modelPath": "/opt/models/vosk-model-en"
}
}
}json
{
"voice": {
"stt": "vosk"
}
}自定义模型路径的配置示例:
json
{
"voice": {
"stt": "vosk",
"providerOptions": {
"modelPath": "/opt/models/vosk-model-en"
}
}
}Provider Rules
服务提供商规则
- Audio input must be 16 kHz LINEAR16 PCM. Resample other formats before streaming.
- Model quality scales with model size. Use for best English accuracy; use small models on constrained hardware.
vosk-model-en-us-0.22 - No API key required. The only requirement is a pre-downloaded model directory.
- Streaming recognition is natively supported by the Vosk recognizer.
- 音频输入必须为16 kHz LINEAR16 PCM格式。流式识别前需对其他格式进行重采样。
- 模型质量随模型大小提升。使用可获得最佳英文识别准确率;在资源受限的硬件上建议使用小型模型。
vosk-model-en-us-0.22 - 无需API密钥,仅需预先下载模型目录即可使用。
- Vosk识别器原生支持流式识别。
Examples
使用示例
- "Use offline Vosk STT for this air-gapped deployment."
- "Transcribe my voice locally without sending audio to the cloud."
- "Configure Vosk with the large English model for best accuracy."
- "为该隔离环境部署使用Vosk离线STT。"
- "在本地转录我的语音,不将音频发送至云端。"
- "配置Vosk使用大型英文模型以获得最佳准确率。"
Constraints
限制条件
- Requires the Vosk npm package and a pre-downloaded model directory.
- Accuracy is lower than cloud providers, especially for accented speech and domain-specific vocabulary.
- Audio must be 16 kHz mono LINEAR16 PCM. Other sample rates or formats require conversion.
- Model download size ranges from ~40 MB (small) to ~1.8 GB (large en-us).
- 需要安装Vosk npm包并预先下载模型目录。
- 识别准确率低于云端服务提供商,尤其是带口音的语音和特定领域词汇。
- 音频必须为16 kHz单声道LINEAR16 PCM格式。其他采样率或格式需先进行转换。
- 模型下载大小约为40 MB(小型)至1.8 GB(大型英文模型)不等。