vosk

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vosk Offline STT

Vosk 离线语音转文本（STT）

Use this skill when the agent must operate without internet connectivity, or when user privacy requirements prohibit sending audio to external APIs. Vosk provides fully offline speech recognition after the model is downloaded once.

Prefer this over cloud STT providers when operating in air-gapped environments, on-premise deployments, or when the user explicitly requests zero-cloud voice processing.

当Agent必须在无网络环境下运行，或者用户隐私要求禁止将音频发送至外部API时，可使用该技能。Vosk模型仅需下载一次，即可实现完全离线的语音识别。

在隔离环境、本地部署场景下，或者用户明确要求零云端语音处理时，优先选择该方案而非云端STT服务提供商。

Setup

安装配置

Download a Vosk model and place it at

~/.agentos/models/vosk/

(default), or set

modelPath

providerOptions

undefined

下载Vosk模型并放置在默认路径

~/.agentos/models/vosk/

，或者在

providerOptions

中设置

modelPath

参数。

undefined

Example: download the small English model

wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/

undefined

wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/

undefined

Configuration

配置说明

json

{
  "voice": {
    "stt": "vosk"
  }
}

With a custom model path:

json

{
  "voice": {
    "stt": "vosk",
    "providerOptions": {
      "modelPath": "/opt/models/vosk-model-en"
    }
  }
}

json

{
  "voice": {
    "stt": "vosk"
  }
}

自定义模型路径的配置示例：

json

{
  "voice": {
    "stt": "vosk",
    "providerOptions": {
      "modelPath": "/opt/models/vosk-model-en"
    }
  }
}

Provider Rules

服务提供商规则

Audio input must be 16 kHz LINEAR16 PCM. Resample other formats before streaming.
Model quality scales with model size. Use
```
vosk-model-en-us-0.22
```
for best English accuracy; use small models on constrained hardware.
No API key required. The only requirement is a pre-downloaded model directory.
Streaming recognition is natively supported by the Vosk recognizer.

音频输入必须为16 kHz LINEAR16 PCM格式。流式识别前需对其他格式进行重采样。
模型质量随模型大小提升。使用
```
vosk-model-en-us-0.22
```
可获得最佳英文识别准确率；在资源受限的硬件上建议使用小型模型。
无需API密钥，仅需预先下载模型目录即可使用。
Vosk识别器原生支持流式识别。

Examples

使用示例

"Use offline Vosk STT for this air-gapped deployment."
"Transcribe my voice locally without sending audio to the cloud."
"Configure Vosk with the large English model for best accuracy."

"为该隔离环境部署使用Vosk离线STT。"
"在本地转录我的语音，不将音频发送至云端。"
"配置Vosk使用大型英文模型以获得最佳准确率。"

Constraints

限制条件

Requires the Vosk npm package and a pre-downloaded model directory.
Accuracy is lower than cloud providers, especially for accented speech and domain-specific vocabulary.
Audio must be 16 kHz mono LINEAR16 PCM. Other sample rates or formats require conversion.
Model download size ranges from ~40 MB (small) to ~1.8 GB (large en-us).

需要安装Vosk npm包并预先下载模型目录。
识别准确率低于云端服务提供商，尤其是带口音的语音和特定领域词汇。
音频必须为16 kHz单声道LINEAR16 PCM格式。其他采样率或格式需先进行转换。
模型下载大小约为40 MB（小型）至1.8 GB（大型英文模型）不等。