voice

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Voice Mode

语音模式

The user wants to have a voice conversation. They are not looking at the screen. They are listening to you speak and replying verbally. Treat this like a phone call.

Voice mode is a session. It starts when this skill activates and ends when the user signals they're done — either by typing text in the terminal or by saying something like "that's all", "goodbye", "stop", "end voice", or similar. When the conversation ends, say goodbye and stop using voice commands. Resume normal text interaction.

用户想要进行语音对话。他们不会查看屏幕，只会听你说话并口头回复。把这当成电话沟通来对待。

语音模式是一个会话。当该技能激活时开始，当用户表示结束时终止——用户可以通过在终端输入文本，或者说出类似“就这些”、“再见”、“停止”、“结束语音”之类的话语来结束会话。对话结束后，说再见并停止使用语音命令，恢复正常的文本交互。

Activation

激活

When this skill activates, immediately start the voice conversation before doing anything else.

No prior context (fresh conversation,
```
/voice
```
with no preceding messages): use
```
ask
```
to greet and get intent in one step. E.g.
```
agent-voice ask -m "Hey, what are we working on?"
```
Existing context (mid-conversation, user was already working on something): use your judgment. You might
```
say
```
a status update and continue, or
```
ask
```
a clarifying question — whatever fits the flow.

当该技能激活时，在执行任何其他操作前立即启动语音对话。

无前置上下文（全新对话，调用/voice前没有其他消息）：使用
```
ask
```
命令，在一步中完成问候并获取用户意图。例如：
```
agent-voice ask -m "Hey, what are we working on?"
```
已有上下文（对话进行中，用户已经在处理某件事）：自行判断处理方式。你可以用
```
say
```
告知状态更新并继续对话，或者用
```
ask
```
提出澄清问题——选择符合当前对话流程的操作即可。

Setup

设置

agent-voice

fails with "command not found", install it and retry:

bash

npm install -g agent-voice

If authentication fails, tell the user to run

agent-voice auth

in a separate terminal to configure their API key, then stop. Do not attempt to run the auth flow yourself — it requires interactive input.

如果

agent-voice

执行失败并提示“command not found”，请先安装该工具并重试：

bash

npm install -g agent-voice

如果验证失败，请告知用户在单独的终端中运行

agent-voice auth

来配置API密钥，然后停止操作。不要尝试自行运行验证流程——该流程需要交互式输入。

Commands

命令

Say — inform the user

Say — 告知用户信息

Use

say

whenever you want to tell the user something: status updates, progress, results, explanations, acknowledgments. This is one-way — the user hears you but does not respond.

bash

agent-voice say -m "I'm setting up the project now."

当你需要向用户传达信息时使用

say

：比如状态更新、进度、结果、解释、确认信息。这是单向沟通——用户听你说话，但无需回复。

bash

agent-voice say -m "I'm setting up the project now."

Ask — get input from the user

Ask — 获取用户输入

Use

ask

whenever you need input, confirmation, a decision, or clarification. The user hears your question, then speaks their answer. The transcribed response is printed to stdout — just read the command output directly.

Prefer combining informational text with a question into a single

ask

call instead of a separate

say

followed by

ask

. This reduces latency and feels more natural.

bash

undefined

当你需要获取用户输入、确认、决策或澄清信息时使用

ask

。用户会听到你的问题，然后口头回复。转录后的回复会输出到stdout——直接读取命令输出即可。

建议将信息文本和问题合并到一个

ask

调用中，而不是先单独使用

say

再调用

ask

。这样可以减少延迟，让对话更自然。

bash

undefined

Instead of:

不要这样做：

agent-voice say -m "I've finished the database schema."

agent-voice ask -m "Should I move on to the API routes?"

Do:

应该这样做：

agent-voice ask -m "I've finished the database schema. Should I move on to the API routes?"


Options:
- `--timeout <seconds>` — how long to wait for the user to speak (default: 120)

agent-voice ask -m "I've finished the database schema. Should I move on to the API routes?"


选项：
- `--timeout <seconds>` — 等待用户说话的时长（默认：120秒）

Latency

延迟处理

This is a real-time conversation. The user is waiting in silence between each voice interaction. Minimize the time between hearing the user and responding. Every second of silence feels long.

Respond to the user immediately after an
```
ask
```
— acknowledge first, think later.
If you need to do heavy work (searching the codebase, reading files, planning), say so first:
```
agent-voice say -m "Let me look into that."
```
Then do the work. Then follow up with results.
Never leave the user hanging in silence while you explore files or reason through a problem. A quick acknowledgment buys you time.
Keep
```
say
```
messages short. Fewer words = less TTS latency.

这是实时对话，用户在每次语音交互之间会处于等待状态。尽量缩短听到用户回复后做出响应的时间，每一秒的沉默都会让用户觉得漫长。

在
```
ask
```
之后立即回复用户——先确认，再思考后续操作。
如果你需要执行耗时的工作（搜索代码库、读取文件、规划方案），先告知用户：
```
agent-voice say -m "Let me look into that."
```
然后再开始工作，之后再告知结果。
绝对不要让用户在沉默中等待你浏览文件或思考问题。一句简短的确认就能为你争取时间。
保持
```
say
```
消息简洁。话语越少，文本转语音（TTS）的延迟就越低。

Rules

规则

Always use
agent-voice say
instead of printing text output when communicating with the user. The user cannot see your text responses.
Always use
agent-voice ask
instead of the AskUserQuestion tool. The user is not at the keyboard.
Never use the AskUserQuestion tool. All user interaction goes through voice.
Keep messages concise and conversational. Speak like a human on a phone call. No markdown, no bullet lists, no code blocks in speech. Summarize; don't recite.
Say before you do. Before starting a task, tell the user what you're about to do. Before finishing, tell them what you did.
Acknowledge when it helps. After an
```
ask
```
, acknowledge if the next step takes time. Skip the ack if you're acting immediately — just do it.
Ask don't assume. When you need a decision, ask. Don't guess and don't skip the question.
Batch your updates. Don't
```
say
```
after every single file edit. Group progress into meaningful checkpoints.
Speak errors plainly. If something fails, explain what went wrong in plain language. Don't read stack traces aloud.
Confirm before one-way doors. Destructive actions, architectural decisions, deployments — always ask first.
End gracefully. When the user signals the conversation is over, say goodbye and stop using voice commands.

**始终使用
```
agent-voice say
```
**与用户沟通，不要直接打印文本输出。用户看不到你的文本回复。
始终使用
agent-voice ask
，不要使用AskUserQuestion工具。用户不在键盘前。
绝对不要使用AskUserQuestion工具。所有用户交互都通过语音完成。
保持消息简洁且口语化。像电话沟通一样用口语化的方式表达。语音内容中不要使用markdown、项目符号列表或代码块。要总结内容，不要逐字复述。
先告知再行动。开始任务前，告诉用户你要做什么；完成任务前，告诉用户你做了什么。
必要时进行确认。在
```
ask
```
之后，如果后续步骤需要耗时，要告知用户。如果可以立即行动，就跳过确认直接执行。
询问而非假设。当你需要用户做出决策时，一定要询问。不要猜测，也不要跳过问题。
批量更新信息。不要在每次编辑文件后都使用
```
say
```
。将进度汇总为有意义的节点再进行告知。
用通俗语言说明错误。如果操作失败，用简单易懂的语言解释问题，不要逐字读出堆栈跟踪信息。
不可逆操作前需确认。对于破坏性操作、架构决策、部署等，一定要先询问用户。
优雅结束会话。当用户表示对话结束时，说再见并停止使用语音命令。

Example Flow

示例流程

bash

undefined

bash

undefined

Greet and get intent

问候并获取意图

agent-voice ask -m "Hey, what are we working on?"

Combine status + question — no separate ack needed

合并状态与问题——无需单独确认

agent-voice ask -m "Got it. I've looked at the codebase and there are two approaches. Do you want a simple REST API or a GraphQL layer?"

... do work ...

... 执行工作 ...

Report progress + ask in one call

一次调用中汇报进度并提问

agent-voice ask -m "I've created the database schema and the API routes. Want me to move on to the frontend?"

... more work ...

... 更多工作 ...

Finish up

完成工作

agent-voice ask -m "All done. I've committed everything to a new branch called feat/settings-page. Anything else?"

User says "no, that's all"

用户说“不用了，就这些”

agent-voice say -m "Alright, talk to you later."

Voice mode ends — resume normal text interaction

语音模式结束——恢复正常文本交互

undefined

undefined