cekura-onboarding

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Cekura Platform Onboarding

Cekura平台入门指南

Purpose

目的

Walk a new user through the complete Cekura setup — from account creation to their first useful artifact.

Two onboarding paths share the same Phases 1–2 (account, project, agent) and diverge after that:

Testing (default) — build evaluators (test scenarios), run them against the agent in simulation, review results. Use this for pre-deploy regression testing and "is my prompt change safe to ship?".
Observability — ingest production call logs into Cekura, attach metrics, run evaluation, and review/vote on results. Use this for "what's actually happening on live calls?".

This is an interactive, step-by-step walkthrough. At each phase, confirm with the user before proceeding and help them with the actual API calls or UI steps.

引导新用户完成Cekura的完整设置流程——从创建账户到生成第一个可用成果。

两种入门路径的第1-2阶段（账户、项目、Agent）是相同的，之后会分岔：

测试（默认路径）——构建评估器（测试场景），在模拟环境中针对Agent运行评估器，查看结果。适用于部署前的回归测试以及“我的提示词变更是否可以安全上线？”这类场景。
可观测性——将生产环境调用日志导入Cekura，附加指标，运行评估，并查看/投票结果。适用于“实时调用实际发生了什么？”这类场景。

这是一个交互式的分步指南。在每个阶段，继续操作前需与用户确认，并协助他们完成实际的API调用或UI步骤。

Performing Platform Actions

执行平台操作

When this skill suggests creating, listing, updating, or evaluating something on Cekura, prefer using available platform tools over describing API calls or dashboard steps. In Claude Code with the Cekura plugin installed, these tools are auto-configured and handle authentication, parameter validation, and error handling for you. Fall back to direct API endpoints or dashboard guidance only when no tools are available in the current session.

Each phase below names the primary tool for that step. Actually call the tool rather than telling the user to do it in the dashboard — that's what makes the onboarding hands-on instead of a tutorial. If a call fails (validation error, missing field, auth), fix the cause or ask the user for the missing input, then retry; don't claim a step is done until the call succeeds.

当本技能建议在Cekura上创建、列出、更新或评估内容时，优先使用可用的平台工具，而非描述API调用或控制台步骤。在安装了Cekura插件的Claude Code中，这些工具已自动配置，可处理身份验证、参数验证和错误处理。仅当当前会话中没有可用工具时，才退回到直接使用API端点或控制台指导。

以下每个阶段都标注了该步骤的主要工具。实际调用工具，而非告知用户在控制台中操作——这才是让入门流程亲自动手而非仅作为教程的关键。如果调用失败（验证错误、字段缺失、权限问题），修复问题或向用户询问缺失的输入，然后重试；直到调用成功，才确认步骤完成。

Never invent IDs

切勿编造ID

Every agent ID, scenario ID, call log ID, metric ID, and run ID comes from a real tool response. If you don't have an ID you need, call the relevant list/retrieve tool and pull it from the response — do not fabricate one to keep the flow moving. This holds even when the user gives you a name ("the Booking Bot agent"): look it up and use the returned

id

. Provider-side identifiers the user must supply (VAPI assistant IDs, Retell agent IDs, API keys, webhook URLs) follow the same rule — ask the user, never guess.

每个Agent ID、场景ID、调用日志ID、指标ID和运行ID都来自真实的工具响应。如果需要某个ID但没有获取到，调用相关的列表/检索工具并从响应中提取——不要为了推进流程而编造ID。即使用户提供了名称（如“预订机器人Agent”），也要查找并使用返回的

id

。用户必须提供的供应商端标识符（VAPI助手ID、Retell Agent ID、API密钥、Webhook URL）也遵循此规则——询问用户，切勿猜测。

How to Use This Skill

如何使用本技能

This is an interactive walkthrough, not a reference doc. Guide the user through each phase conversationally:

Confirm which path applies (Phase 0 — usually already known from how you were invoked).
Survey what already exists, so you skip completed work (State Assessment).
Use platform tools to perform actions on the user's behalf.
Validate each step before moving to the next.

Hand off to specialized skills (

cekura-create-agent

cekura-metric-design

cekura-eval-design

cekura-metric-improvement

) when appropriate.

这是一个交互式指南，而非参考文档。以对话方式引导用户完成每个阶段：

确认适用的路径（第0阶段——通常从调用方式中即可得知）。
调查已有的内容，跳过已完成的工作（状态评估）。
使用平台工具代表用户执行操作。
进入下一阶段前验证每个步骤。

在合适的情况下，转交给专业技能（

cekura-create-agent

、

cekura-metric-design

、

cekura-eval-design

、

cekura-metric-improvement

）。

Phase 0: Choose the Path

第0阶段：选择路径

If the caller already specified a path — via the

/cekura-onboarding

command argument or the invoking context — honour it without asking.

Otherwise, ask once:

Two onboarding paths — which fits your goal?

Testing (default) — build evaluators and run simulated calls against your agent.

Observability — ingest your production call logs and evaluate them.

Default to testing when ambiguous. Phases 1–2 are identical for both; the flow forks at Phase 3.

如果调用者已指定路径——通过

/cekura-onboarding

命令参数或调用上下文——无需询问，直接遵循该路径。

否则，询问一次：

有两种入门路径——哪一种符合你的目标？

测试（默认）——构建评估器并针对你的Agent运行模拟调用。

可观测性——导入你的生产环境调用日志并进行评估。

当不确定时，默认选择测试路径。第1-2阶段对两种路径完全相同；流程在第3阶段分岔。

State Assessment (do this once, before Phase 1)

状态评估（在第1阶段前执行一次）

Survey what already exists in the user's project before walking them through any phase. This prevents asking "Resume where?" on an empty project (redundant) and prevents skipping past existing work (risky).

Gathering state:

If you were handed an inventory (e.g. the
```
/cekura-onboarding
```
command pre-detected project state and passed it in context), trust it — don't re-run the same lookups.
Otherwise, list the path-relevant resources yourself: agents and metrics for both paths; plus scenarios and results for testing; plus call logs for observability.

Decision:

State of the path's relevant resources	Action
Clean slate — none exist (testing: 0 agents + 0 scenarios + 0 results; observability: 0 agents + 0 call logs + 0 metrics)	Proceed straight to Phase 1 (or Phase 2 if account/project already set up). Don't ask "Resume where?" / "Ready to continue?" — there's nothing to resume.
Mid-onboarding — some relevant resources exist but the flow is incomplete	Surface ONE concrete clarification: e.g. "Found existing agent Booking Bot with 12 scenarios and 1 result. Continue with it, or create a new agent?" — never a generic "Ready to continue?".
Obvious from the user's message — they said "create a new agent" / "start fresh" / named a specific agent	Honour that intent without an extra confirm.

After deciding, move into the appropriate phase. Confirm at phase boundaries and before destructive operations, but never re-ask the state you just surveyed.

在引导用户进入任何阶段前，调查用户项目中已有的内容。这避免了在空项目中询问“从哪里继续？”（冗余），也避免跳过已完成的工作（有风险）。

收集状态：

如果已获取到清单（例如
```
/cekura-onboarding
```
命令预先检测到项目状态并在上下文中传递），信任该清单——不要重复执行相同的查找操作。
否则，自行列出与路径相关的资源：两种路径都需要Agent和指标；测试路径还需要场景和结果；可观测性路径还需要调用日志。

决策：

路径相关资源的状态	操作
空白状态——无任何资源（测试：0个Agent + 0个场景 + 0个结果；可观测性：0个Agent + 0个调用日志 + 0个指标）	直接进入第1阶段（如果账户/项目已设置好则进入第2阶段）。不要询问“是否继续？”——没有可继续的内容。
入门进行中——存在部分相关资源但流程未完成	给出一个具体的确认问题：例如“发现现有Agent 预订机器人，包含12个场景和1个结果。是继续使用它，还是创建新的Agent？”——切勿使用通用的“是否继续？”。
用户消息明确——用户说“创建新Agent”/“重新开始”/指定了特定Agent	遵循用户意图，无需额外确认。

做出决策后，进入相应阶段。在阶段边界和执行破坏性操作前确认，但不要再询问刚刚调查过的状态。

Phase 1: Account & Project Setup (shared)

第1阶段：账户与项目设置（共享）

Skip this phase entirely if the user is already signed in with a project selected (or state was handed to you showing an existing project) — go straight to Phase 2. Phase 1 is only for users starting from nothing; don't re-ask account or project facts you already have.

如果用户已登录并选择了项目（或上下文传递的状态显示存在现有项目），完全跳过此阶段——直接进入第2阶段。第1阶段仅适用于从零开始的用户；不要重复询问已掌握的账户或项目信息。

1.1 Verify Account Access

1.1 验证账户访问权限

Ask the user:

"Do you already have a Cekura account?"
"Do you have an API key, or do you sign in via OAuth?"

If they have an API key, verify it works by listing metrics. A successful response (even empty) confirms the key is valid.

If they don't have an account, direct them to sign up at https://dashboard.cekura.ai/sign-up and create a project.

For Claude Code plugin users: If platform operations aren't working, run

/setup-mcp

to configure API access.

询问用户：

“你已经有Cekura账户了吗？”
“你有API密钥，还是通过OAuth登录？”

如果用户有API密钥，通过列出指标来验证其有效性。成功的响应（即使为空）确认密钥有效。

如果用户没有账户，引导他们访问https://dashboard.cekura.ai/sign-up注册并创建项目。

**对于Claude Code插件用户：**如果平台操作无法正常工作，运行

/setup-mcp

配置API访问。

1.2 Project Setup

1.2 项目设置

Ask: "Do you already have a project, or do we need to create one?"

If creating: Create the project (

projects_create

) or point them to the dashboard.

Project organization guidance:

Small teams: single project for multiple agents.
Enterprises: separate projects by team and environment (staging vs production).
Each project gets its own metrics, evaluators, and observability data.

询问：“你已经有项目了吗，还是需要创建一个？”

**如果创建项目：**使用

projects_create

创建项目，或引导用户前往控制台。

项目组织建议：

小型团队：单个项目管理多个Agent。
企业：按团队和环境（ staging与production）划分不同项目。
每个项目有独立的指标、评估器和可观测性数据。

Phase 2: Agent Configuration (shared, framing differs by path)

第2阶段：Agent配置（共享，不同路径的表述不同）

Both paths register the agent with

aiagents_create

. What differs is the framing:

Testing: "Let's create your test agent — pick the provider you'll simulate against."
Observability: "Let's connect your production agent — Cekura needs to know about it so we can attribute uploaded calls to it."

两种路径都使用

aiagents_create

注册Agent。不同之处在于表述方式：

测试路径：“让我们创建你的测试Agent——选择你要模拟的供应商。”
可观测性路径：“让我们连接你的生产环境Agent——Cekura需要了解它，以便将上传的调用关联到该Agent。”

2.1 Create or Connect an Agent

2.1 创建或连接Agent

Ask:

"Do you already have a voice AI agent deployed?"
"What provider — VAPI, Retell, LiveKit, ElevenLabs, Pipecat, or custom?"

Create the agent on Cekura with

aiagents_create

— agent name, project ID, and description. For detailed agent setup (provider integration, mock tools, KB, dynamic variables), hand off to the cekura-create-agent skill.

Critical: agent description is essential. It enables automatic evaluator generation (testing) and powers metrics that reference

{{agent.description}}

(both paths). Ask the user to paste their agent's full system prompt — it's the single most leverage-rich field on the agent record.

询问：

“你已经部署了语音AI Agent吗？”
“使用的是哪个供应商——VAPI、Retell、LiveKit、ElevenLabs、Pipecat还是自定义？”

使用

aiagents_create

在Cekura上创建Agent——需要Agent名称、项目ID和描述。如需详细的Agent设置（供应商集成、模拟工具、知识库、动态变量），转交给cekura-create-agent技能。

关键：Agent描述至关重要。它支持自动生成评估器（测试路径），并为引用

{{agent.description}}

的指标提供支持（两种路径）。请用户粘贴其Agent的完整系统提示词——这是Agent记录中最具价值的字段。

2.2 Provider Integration

2.2 供应商集成

Based on their provider, guide them through connecting:

VAPI:

Need: VAPI API Key + Assistant ID.
In Cekura: Agent Settings → Provider → VAPI → enter credentials.
Observability tip: If you only need call-log ingestion, provider creds are optional — ingestion works with the external
```
assistant_id
```
alone.

Retell:

Need: Retell API Key + Assistant ID.
In Cekura: Agent Settings → Provider → Retell → enter credentials.
Optionally enable auto-sync of prompts.

LiveKit:

Need: LiveKit agent deployment details.
Calls include
```
metadata.raw_metrics
```
for latency tracking.

ElevenLabs:

Need: ElevenLabs API Key + Agent ID.

Pipecat:

Set

transcript_provider: "pipecat"

, a

pipecat_api_key

, and

pipecat_data: {"pipecat_agent_name": "<name>"}

. The agent name goes inside
pipecat_data
— it is NOT a top-level field.

assistant_provider

is not

pipecat

(leave default/

self_hosted

Run tests over WebRTC with
```
scenarios_run_pipecat_v2
```
.
See https://docs.cekura.ai/documentation/integrations/pipecat for the webhook contract.

Self-hosted / Custom (reached via SIP, WebSocket, or chat):

These are
```
assistant_provider: "self_hosted"
```
agents — SIP / WebSocket / chat are connection modes, not providers.
Guide based on their specific setup.
Refer to https://docs.cekura.ai/documentation/integrations/ for provider-specific docs.

根据用户选择的供应商，引导他们完成连接：

VAPI：

需要：VAPI API密钥 + 助手ID。
在Cekura中：Agent设置 → 供应商 → VAPI → 输入凭据。
*可观测性提示：*如果仅需要调用日志导入，供应商凭据是可选的——仅使用外部
```
assistant_id
```
即可完成导入。

Retell：

需要：Retell API密钥 + 助手ID。
在Cekura中：Agent设置 → 供应商 → Retell → 输入凭据。
可选择启用提示词自动同步。

LiveKit：

需要：LiveKit Agent部署详情。
调用包含
```
metadata.raw_metrics
```
用于延迟跟踪。

ElevenLabs：

需要：ElevenLabs API密钥 + Agent ID。

Pipecat：

设置

transcript_provider: "pipecat"

、

pipecat_api_key

和

pipecat_data: {"pipecat_agent_name": "<name>"}

。Agent名称需放在

pipecat_data

内部——它不是顶级字段。

assistant_provider

不是

pipecat

（保留默认值/

self_hosted

）。

使用
```
scenarios_run_pipecat_v2
```
通过WebRTC运行测试。
查看https://docs.cekura.ai/documentation/integrations/pipecat了解WebHook协议。

自托管/自定义（通过SIP、WebSocket或聊天访问）：

这些是
```
assistant_provider: "self_hosted"
```
类型的Agent——SIP/WebSocket/聊天是连接方式，而非供应商。
根据用户的具体设置提供指导。
参考https://docs.cekura.ai/documentation/integrations/获取供应商特定文档。

2.3 Dynamic Variables (if applicable)

2.3 动态变量（如适用）

Ask: "Does your agent use dynamic variables — per-call data like customer names, account IDs, or configuration flags?"

If yes:

Cekura auto-detects
```
{{variableName}}
```
patterns in the agent description.
These become available in metrics as
```
{{dynamic_variables.keyName}}
```
.
Useful for multi-agent flows where each node has its own system prompt.
Observability path:
```
dynamic_variables
```
is also a field on ingestion payloads — values appear alongside the transcript in the UI.

询问：“你的Agent是否使用动态变量——每个调用的专属数据，如客户名称、账户ID或配置标志？”

如果是：

Cekura会自动检测Agent描述中的
```
{{variableName}}
```
模式。
这些变量会作为
```
{{dynamic_variables.keyName}}
```
在指标中可用。
适用于多Agent流程，每个节点有自己的系统提示词。
可观测性路径：
```
dynamic_variables
```
也是导入负载中的字段——值会与转录文本一起显示在UI中。

2.4 Mock Tools (testing) / Real Tool Calls (observability)

2.4 模拟工具（测试路径）/真实工具调用（可观测性路径）

Testing path — ask: "Does your agent call external APIs or tools during calls?" If yes:

Auto-fetch from provider (recommended): Cekura pulls tool definitions automatically.
Manual setup: Add tool names, descriptions, and input/output mappings.
Mock tools let you test without hitting real backends.
See the cekura-eval-design skill for detailed mock tool configuration.

Observability path — tool calls in production are real. They surface in the call log as

tool_calls

(alongside the transcript) and the Tool Call Success metric scores them automatically once enabled.

After Phase 2, the flow diverges. Follow ONLY your path's Phase 3+ sections below.

测试路径——询问：“你的Agent在调用过程中是否会调用外部API或工具？”如果是：

自动从供应商获取（推荐）：Cekura自动拉取工具定义。
手动设置：添加工具名称、描述和输入/输出映射。
模拟工具让你无需调用真实后端即可进行测试。
查看cekura-eval-design技能了解详细的模拟工具配置。

可观测性路径——生产环境中的工具调用是真实的。它们会作为

tool_calls

与转录文本一起显示在调用日志中，一旦启用，Tool Call Success指标会自动为其评分。

完成第2阶段后，流程分岔。仅遵循对应路径的第3+部分。

──────── Testing path (Phases 3–6) ────────

──────── 测试路径（第3-6阶段） ────────

Use this branch when the path is testing (default).

当路径为测试（默认）时使用此分支。

Phase 3 (testing): Metrics Setup

第3阶段（测试）：指标设置

3.1 Enable Pre-defined Metrics

3.1 启用预定义指标

Always recommend selecting ALL pre-defined metrics for comprehensive analysis:

Category	Metrics
Accuracy	Expected Outcome, Hallucination, Relevancy, Response Consistency, Tool Call Success, Transcription Accuracy, Voicemail Detection
Quality	Interruption counts, Response latency, Silence detection, Call termination appropriateness
Customer Experience	CSAT, Sentiment, Dropoff nodes, Topic categorization
Speech Quality	Pitch, Speaking rate, Gibberish detection, Pronunciation verification

Guide: "Go to your project's Metrics section and enable all pre-defined metrics. This gives you a comprehensive baseline."

Two-step activation: Metrics must be (1) toggled on at the project level AND (2) attached to individual evaluators.

始终建议选择所有预定义指标以进行全面分析：

类别	指标
准确性	Expected Outcome、Hallucination、Relevancy、Response Consistency、Tool Call Success、Transcription Accuracy、Voicemail Detection
质量	中断次数、响应延迟、静默检测、通话终止合理性
客户体验	CSAT、Sentiment、Dropoff nodes、Topic categorization
语音质量	Pitch、Speaking rate、Gibberish detection、Pronunciation verification

指导：“前往项目的指标部分，启用所有预定义指标。这将为你提供全面的基准。”

**两步激活：**指标必须（1）在项目级别开启，并且（2）附加到单个评估器。

3.2 Custom Metrics (optional, defer to later)

3.2 自定义指标（可选，延后处理）

For first-time users, skip custom metrics initially. Once they have test results, they can use the cekura-metric-design skill to create targeted custom metrics.

对于首次使用的用户，初始阶段跳过自定义指标。当他们有测试结果后，可以使用cekura-metric-design技能创建针对性的自定义指标。

Phase 4 (testing): First Evaluators

第4阶段（测试）：第一个评估器

4.1 Auto-Generate Evaluators (Recommended)

4.1 自动生成评估器（推荐）

The fastest path to first tests — generate scenarios with

scenarios_agent_create

json

{
  "agent_id": <agent_id>,
  "num_scenarios": 10,
  "personalities": [<personality_id>],
  "generate_expected_outcomes": true,
  "tool_ids": ["TOOL_END_CALL", "TOOL_END_CALL_ONLY_ON_TRANSFER"]
}

Generation runs in the background — poll

scenarios_generate_progress

until it completes, then review the generated scenarios.

After generation, check:

Are instructions specific and behavioral?
Are expected outcomes concise and achievable?
Are the right tools enabled?
For non-English agents: PATCH
```
scenario_language
```
to correct code.

最快生成首次测试的方式——使用

scenarios_agent_create

生成场景：

json

{
  "agent_id": <agent_id>,
  "num_scenarios": 10,
  "personalities": [<personality_id>],
  "generate_expected_outcomes": true,
  "tool_ids": ["TOOL_END_CALL", "TOOL_END_CALL_ONLY_ON_TRANSFER"]
}

生成在后台运行——轮询

scenarios_generate_progress

直到完成，然后查看生成的场景。

生成后检查：

指令是否具体且符合行为要求？
预期结果是否简洁且可实现？
是否启用了正确的工具？
对于非英语Agent：使用PATCH修改
```
scenario_language
```
为正确的代码。

4.2 Review and Supplement

4.2 审核与补充

Common gaps in auto-generated evals:

Red-team / adversarial scenarios.
Edge cases specific to the client's domain.
Multi-language coverage.
Tool failure scenarios.

Hand off to the cekura-eval-design skill for designing more targeted evaluators.

自动生成的评估常见差距：

红队/对抗性场景。
特定于客户领域的边缘案例。
多语言覆盖。
工具故障场景。

转交给cekura-eval-design技能设计更具针对性的评估器。

4.3 Attach Metrics

4.3 附加指标

Every evaluator needs metrics attached. At minimum:

Expected Outcome — Did the agent achieve the scenario's goal?
Infrastructure Issues — Connection drops, silence, non-response.

Use bulk-add via

actions → modify scenarios

in the UI.

每个评估器都需要附加指标。至少包含：

Expected Outcome——Agent是否达成了场景目标？
Infrastructure Issues——连接中断、静默、无响应。

在UI中通过

actions → modify scenarios

批量添加。

Phase 5 (testing): First Test Run

第5阶段（测试）：首次测试运行

Run the scenarios with one of the

scenarios_run_*

tools. The exact tool depends on the agent's provider/transport:

```
scenarios_run_pipecat_v2
```
— Pipecat Cloud, WebRTC (uses the agent's
```
pipecat_api_key
```
)
```
scenarios_run_livekit_v2
```
— LiveKit, WebRTC

scenarios_run_vapi_webrtc

scenarios_run_retell_webrtc

— VAPI / Retell WebRTC

```
scenarios_run_elevenlabs
```
— ElevenLabs
```
scenarios_run_websocket
```
— custom/self-hosted WebSocket agents
```
scenarios_run_sip
```
— SIP endpoints
```
scenarios_run_voice
```
/
```
scenarios_run_text
```
— phone (PSTN) / text

使用

scenarios_run_*

工具之一运行场景。具体工具取决于Agent的供应商/传输方式：

```
scenarios_run_pipecat_v2
```
——Pipecat Cloud，WebRTC（使用Agent的
```
pipecat_api_key
```
）
```
scenarios_run_livekit_v2
```
——LiveKit，WebRTC

scenarios_run_vapi_webrtc

scenarios_run_retell_webrtc

——VAPI / Retell WebRTC

```
scenarios_run_elevenlabs
```
——ElevenLabs
```
scenarios_run_websocket
```
——自定义/自托管WebSocket Agent
```
scenarios_run_sip
```
——SIP端点
```
scenarios_run_voice
```
/
```
scenarios_run_text
```
——电话（PSTN）/文本

5.1 Execute

5.1 执行

json

{
  "agent_id": <agent_id>,
  "scenarios": [<scenario_ids>],
  "frequency": 1
}

Start with 5–10 scenarios for the first run. Voice calls take 1–3 minutes each.

json

{
  "agent_id": <agent_id>,
  "scenarios": [<scenario_ids>],
  "frequency": 1
}

首次运行先选择5-10个场景。语音通话每个需要1-3分钟。

5.2 Monitor

5.2 监控

Check results via the results endpoint. Each run includes:

Full transcript.
Audio recording.
Metric scores.
Expected outcome pass/fail.

通过结果端点检查结果。每次运行包含：

完整转录文本。
音频记录。
指标分数。
预期结果通过/失败状态。

5.3 Review Results

5.3 查看结果

Guide the user through interpreting results:

70–80% pass rate is realistic for a first iteration.
Review failures to identify: misunderstandings, missing info, technical issues.
90–95% after refinement is the target.
Don't aim for 100% — real conversations are unpredictable.

引导用户解读结果：

70-80%的通过率对于首次迭代是合理的。
查看失败案例以识别：误解、信息缺失、技术问题。
优化后的目标是90-95%。
不要追求100%——真实对话是不可预测的。

Phase 6 (testing): What's Next

第6阶段（测试）：下一步

Need	Next step	Description
Better metrics	cekura-metric-design	Design custom metrics for specific workflows.
More evaluators	cekura-eval-design	Design targeted test scenarios.
Improve metric quality	cekura-metric-improvement	Iterate metric quality through feedback.
Monitor production	Re-run onboarding on the observability path	Ingest live calls and score them.
CI/CD integration	GitHub Actions	Auto-test on code changes.
Scheduled tests	Cron jobs	Recurring test suites.

需求	下一步	描述
更好的指标	cekura-metric-design	为特定工作流设计自定义指标。
更多评估器	cekura-eval-design	设计针对性的测试场景。
提升指标质量	cekura-metric-improvement	通过反馈迭代指标质量。
监控生产环境	在可观测性路径重新运行入门流程	导入实时通话并评分。
CI/CD集成	GitHub Actions	在代码变更时自动测试。
定时测试	Cron任务	定期运行测试套件。

──────── Observability path (Phases 3–7) ────────

──────── 可观测性路径（第3-7阶段） ────────

Use this branch when the path is observability.

The observability path does not generate scenarios or run simulations. Instead, you ingest the user's actual production calls, attach metrics, evaluate, and review. The agent registered in Phase 2 is the production agent that owns those calls.

当路径为可观测性时使用此分支。

可观测性路径不生成场景或运行模拟。相反，会导入用户的实际生产环境通话，附加指标，进行评估并查看结果。第2阶段注册的Agent是拥有这些通话的生产环境Agent。

Phase 3 (observability): Ingest Call Logs

第3阶段（可观测性）：导入调用日志

Get the user's production calls into Cekura with

observe_create

使用

observe_create

将用户的生产环境通话导入Cekura。

3.1 Pick an ingestion mode

3.1 选择导入模式

Ask: "Do you want to (a) upload a sample call to see how Cekura processes it, or (b) configure continuous webhook ingestion from your provider?"

询问：“你想（a）上传一个示例通话，查看Cekura如何处理它，还是（b）配置从供应商持续导入的WebHook？”

(a) One-shot upload — fastest start

(a) 一次性上传——最快入门

Call

observe_create

with the user's transcript. Identify the agent by either:

```
agent
```
: the Cekura agent ID from Phase 2 (preferred), or
```
assistant_id
```
: the external provider-side ID (Cekura resolves it to your agent).

Minimum payload:

json

{
  "call_id": "<unique call id>",
  "agent": <agent_id>,
  "transcript_type": "cekura",
  "transcript_json": [
    {"role": "Testing Agent", "content": "Hi, can I book a room?", "start_time": 0.0, "end_time": 2.1},
    {"role": "Main Agent", "content": "Of course — for what date?", "start_time": 2.3, "end_time": 4.0}
  ],
  "call_ended_reason": "completed"
}

For

transcript_type: "cekura"

, the only valid roles are

"Testing Agent"

(caller) and

"Main Agent"

(the agent under test).

"agent"

"user"

are NOT valid for this format.

If the user has a provider-native transcript (VAPI, Retell, ElevenLabs, Bland, LiveKit, Pipecat, KoreAI, Trillet), set

transcript_type

to that provider and pass

transcript_json

exactly as the provider emits it — Cekura normalises it internally.

Useful optional fields:

```
voice_recording_url
```
— enables audio-based metrics (pitch, speaking rate, gibberish detection).

metadata

— freeform tags for Observability filtering (

{"customer_id": "...", "campaign_id": "..."}

```
dynamic_variables
```
— values injected into the agent at runtime; shown alongside the transcript.
```
customer_number
```
— caller's number in E.164.
```
metric_ids
```
— comma-separated metric IDs to evaluate immediately (skips Phase 5's separate kickoff).

使用用户的转录文本调用

observe_create

。通过以下方式之一识别Agent：

```
agent
```
：第2阶段获取的Cekura Agent ID（首选），或
```
assistant_id
```
：外部供应商端ID（Cekura会将其解析为你的Agent）。

最小负载：

json

{
  "call_id": "<unique call id>",
  "agent": <agent_id>,
  "transcript_type": "cekura",
  "transcript_json": [
    {"role": "Testing Agent", "content": "Hi, can I book a room?", "start_time": 0.0, "end_time": 2.1},
    {"role": "Main Agent", "content": "Of course — for what date?", "start_time": 2.3, "end_time": 4.0}
  ],
  "call_ended_reason": "completed"
}

对于

transcript_type: "cekura"

，仅支持

"Testing Agent"

（呼叫者）和

"Main Agent"

（被测Agent）角色。

"agent"

"user"

在此格式中无效。

如果用户有供应商原生转录文本（VAPI、Retell、ElevenLabs、Bland、LiveKit、Pipecat、KoreAI、Trillet），将

transcript_type

设置为对应供应商，并按供应商输出的原样传递

transcript_json

——Cekura会在内部进行标准化处理。

有用的可选字段：

```
voice_recording_url
```
——启用基于音频的指标（音调、语速、无意义内容检测）。
```
metadata
```
——用于可观测性过滤的自由格式标签（
```
{"customer_id": "...", "campaign_id": "..."}
```
）。
```
dynamic_variables
```
——运行时注入Agent的值；与转录文本一起显示。
```
customer_number
```
——呼叫者的E.164格式号码。
```
metric_ids
```
——逗号分隔的指标ID，用于立即评估（跳过第5阶段的单独启动步骤）。

(b) Continuous webhook ingestion

(b) 持续WebHook导入

Cekura ships provider-specific webhook endpoints that accept the provider's raw post-call shape — no transformation on the user's side:

Provider	Webhook URL
VAPI	`POST /observability/v1/vapi/observe/`
Retell	`POST /observability/v1/retell/observe/`
ElevenLabs	`POST /observability/v1/elevenlabs/observe/`
LiveKit	`POST /observability/v1/livekit/observe/`
Pipecat	`POST /observability/v1/pipecat/observe/`
Other	Use generic `observe_create`

Guide the user to configure their provider's webhook to POST every completed call to the relevant URL with their Cekura API key in the

Authorization: Bearer ...

header. Then trigger one test call so a real ingestion lands.

Cekura提供供应商特定的WebHook端点，接受供应商的原始通话后数据格式——用户无需进行转换：

供应商	WebHook URL
VAPI	`POST /observability/v1/vapi/observe/`
Retell	`POST /observability/v1/retell/observe/`
ElevenLabs	`POST /observability/v1/elevenlabs/observe/`
LiveKit	`POST /observability/v1/livekit/observe/`
Pipecat	`POST /observability/v1/pipecat/observe/`
其他	使用通用 `observe_create`

引导用户配置供应商的WebHook，将每个完成的通话POST到相应URL，并在

Authorization: Bearer ...

头中包含其Cekura API密钥。然后触发一次测试通话，确保真实导入成功。

3.2 Verify ingestion

3.2 验证导入

After the first call lands:

List call logs (
```
call_logs_list
```
) to confirm it's visible.
Show the user the resulting call log id and explain that metric evaluation is async — initial
```
status
```
is
```
evaluating
```
; full results appear shortly after.

首次通话导入后：

列出调用日志（
```
call_logs_list
```
）确认其可见。
向用户展示生成的调用日志ID，并说明指标评估是异步的——初始
```
status
```
为
```
evaluating
```
；完整结果会在稍后显示。

3.3 Iterate (optional)

3.3 迭代（可选）

If the user has more than one provider, repeat with a sample from each to build the call inventory for Phase 5 evaluation.

如果用户有多个供应商，为每个供应商重复导入示例，为第5阶段的评估构建调用清单。

Phase 4 (observability): Configure Metrics

第4阶段（可观测性）：配置指标

Metrics in observability mode score real calls. The starter set should cover correctness, customer experience, and safety.

可观测性模式下的指标为真实通话评分。初始集应覆盖正确性、客户体验和安全性。

4.1 Survey existing metrics

4.1 调查现有指标

List metrics (

metrics_list

) to see what's already configured. If the project already has metrics from a prior testing onboarding, reuse them — they apply to call logs as well as test runs.

列出指标（

metrics_list

）查看已配置的内容。如果项目已从之前的测试入门流程中获得指标，可复用它们——它们适用于调用日志和测试运行。

4.2 Recommend a starter set

4.2 推荐初始集

For first-time observability onboarding, recommend three metrics that cover the high-value bases:

Metric	Why it matters in observability
Hallucination	Catches the agent inventing facts on live calls — highest blast-radius failure mode.
Expected Outcome adherence	Did the agent accomplish the call's purpose (booking, transfer, info-gathering)?
Sentiment	Surfaces customer frustration trends; a leading indicator for churn.

List the full catalog with

predefined_metrics_list

. For each chosen metric, create it with

metrics_create

(single) or

metrics_bulk_create

(multiple), passing the project_id and metric specifics.

对于首次使用可观测性入门的用户，推荐三个覆盖高价值场景的指标：

指标	在可观测性中的重要性
Hallucination	捕获Agent在实时通话中编造事实的情况——这是影响范围最广的故障模式。
Expected Outcome adherence	Agent是否完成了通话的目的（预订、转接、信息收集）？
Sentiment	发现客户不满趋势；这是客户流失的领先指标。

使用

predefined_metrics_list

列出完整目录。对于每个选择的指标，使用

metrics_create

（单个）或

metrics_bulk_create

（多个）创建，传递project_id和指标详情。

4.3 LLM-generated metrics from agent description (optional)

4.3 从Agent描述生成LLM指标（可选）

If the user wants metrics auto-tailored to their agent (e.g. workflow-specific outcome metrics), use

metrics_generate

— Cekura generates metric definitions from the agent's description. Defer to the cekura-metric-design skill for designing custom metrics carefully.

如果用户想要针对其Agent自动生成指标（例如特定工作流的结果指标），使用

metrics_generate

——Cekura会从Agent的描述生成指标定义。如需精心设计自定义指标，转交给cekura-metric-design技能。

Phase 5 (observability): Run Metric Evaluation

第5阶段（可观测性）：运行指标评估

If you passed

metric_ids

during ingestion, auto-evaluation already started. This phase evaluates additional metrics on existing call logs.

如果在导入时传递了

metric_ids

，自动评估已启动。本阶段针对现有调用日志评估额外的指标。

5.1 Kick off evaluation

5.1 启动评估

Call

call_logs_evaluate_metrics_create

json

{
  "call_log_ids": [<id1>, <id2>],
  "metric_ids": [<metric_id1>, <metric_id2>]
}

Evaluation runs async — the response shows

status: "evaluating"

and the call log's

metrics

array is initially empty. Re-retrieve the call log shortly after to see scores.

调用

call_logs_evaluate_metrics_create

：

json

{
  "call_log_ids": [<id1>, <id2>],
  "metric_ids": [<metric_id1>, <metric_id2>]
}

评估异步运行——响应显示

status: "evaluating"

，调用日志的

metrics

数组初始为空。稍后重新检索调用日志即可查看分数。

5.2 Rerun (when needed)

5.2 重新运行（必要时）

If a metric prompt was updated and the user wants existing call logs re-scored, use

call_logs_rerun_evaluation_create

如果指标提示词已更新，用户希望重新为现有调用日志评分，使用

call_logs_rerun_evaluation_create

。

Phase 6 (observability): Review Results & Vote

第6阶段（可观测性）：查看结果并投票

The point of observability is closing the loop: humans review scores, mark ones that disagree with their judgment, and that feedback improves future metric quality.

可观测性的意义在于闭环：人工查看分数，标记与判断不符的结果，这些反馈会提升未来的指标质量。

6.1 Show results

6.1 展示结果

Retrieve the call log (

call_logs_retrieve

) with metric results. Walk the user through:

The transcript.
Each metric's score + reasoning.
Any flagged segments (low-confidence, edge cases).

If results still show

status: "evaluating"

, wait a moment and re-retrieve.

检索包含指标结果的调用日志（

call_logs_retrieve

）。引导用户查看：

转录文本。
每个指标的分数+推理。
任何标记的片段（低置信度、边缘案例）。

如果结果仍显示

status: "evaluating"

，稍等片刻后重新检索。

6.2 Collect votes

6.2 收集投票

Ask the user to pick at least one metric result they disagree with and explain why. Then record it with

call_logs_mark_metric_vote_create

json

{
  "call_log_id": <id>,
  "metric_id": <id>,
  "vote": "incorrect",
  "reasoning": "<user's reason>"
}

Encourage 3–5 votes for a meaningful feedback signal.

请用户选择至少一个不同意的指标结果并解释原因。然后使用

call_logs_mark_metric_vote_create

记录：

json

{
  "call_log_id": <id>,
  "metric_id": <id>,
  "vote": "incorrect",
  "reasoning": "<user's reason>"
}

鼓励用户投3-5票，以获得有意义的反馈信号。

6.3 Iterate

6.3 迭代

Hand off to cekura-metric-improvement to use the collected votes to actually refine metric prompts. That skill loops: rebuild prompt → preview on the voted call logs → ship.

转交给cekura-metric-improvement技能，使用收集到的投票优化指标提示词。该技能会循环执行：重建提示词 → 在已投票的调用日志上预览 → 上线。

Phase 7 (observability): What's Next

第7阶段（可观测性）：下一步

Need	Next step	Description
Improve metrics with votes	cekura-metric-improvement	Use Phase 6's votes to refine metric prompts.
Design custom metrics	cekura-metric-design	New metrics for workflow-specific behaviour.
Add pre-deploy tests	Re-run onboarding on the testing path	Use real production calls as the basis for new scenarios.
Scheduled re-evaluation	Cron jobs	Re-score live calls as metrics evolve.
Multi-project rollups	Observability dashboards	Aggregate metric scores across agents/projects.

需求	下一步	描述
使用投票改进指标	cekura-metric-improvement	使用第6阶段的投票优化指标提示词。
设计自定义指标	cekura-metric-design	为特定工作流行为创建新指标。
添加部署前测试	在测试路径重新运行入门流程	使用真实生产环境通话作为新场景的基础。
定时重新评估	Cron任务	随着指标演进，重新为实时通话评分。
多项目汇总	可观测性仪表板	跨Agent/项目聚合指标分数。

Documentation

文档

Public docs: https://docs.cekura.ai
LLM-friendly docs: https://docs.cekura.ai/llms.txt
Concepts: https://docs.cekura.ai/documentation/key-concepts/
Integrations: https://docs.cekura.ai/documentation/integrations/
Observability webhook setup: https://docs.cekura.ai/documentation/observability/

See

references/api-quickstart.md

for the essential endpoints used during onboarding.

公开文档：https://docs.cekura.ai
LLM友好文档：https://docs.cekura.ai/llms.txt
概念：https://docs.cekura.ai/documentation/key-concepts/
集成：https://docs.cekura.ai/documentation/integrations/
可观测性WebHook设置：https://docs.cekura.ai/documentation/observability/

查看

references/api-quickstart.md

获取入门过程中使用的核心端点。