optimize-trace-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Optimize Coval Trace Observability

优化Coval追踪可观测性

Turn a working but thin Coval trace into a useful debugging artifact. Prefer to inspect a proven Coval trace first. If
setup-tracing
has launched an initial asynchronous validation run that is still pending, use the waiting time to make safe code-visible enrichment, then re-check the finished run before declaring the optimization complete.
将可用但内容单薄的Coval追踪转换为实用的调试工件。建议先检查已验证的Coval追踪。如果
setup-tracing
已启动初始异步验证运行且仍在进行中,可利用等待时间进行安全的代码可见性增强,然后在声明优化完成前重新检查已完成的运行。

Read First

必读内容

Load these references as needed:
  • ../references/span-schema.md
    for canonical spans, attributes, aliases, and guardrails
  • ../references/agent-type-routing.md
    for framework-specific trace boundaries
  • ../references/coval-tracing-reference.md
    for viewer/search behavior and ingestion limits
按需加载以下参考资料:
  • ../references/span-schema.md
    :规范跨度、属性、别名和防护规则
  • ../references/agent-type-routing.md
    :框架特定的追踪边界
  • ../references/coval-tracing-reference.md
    :查看器/搜索行为及导入限制

Phase 1: Inspect Current Trace Quality

阶段1:检查当前追踪质量

Start from evidence, not assumptions.
  1. Find one recent traced simulation or conversation in Coval. Use the Coval CLI/API or Trace Search instead of asking the user for a screenshot when credentials are available.
  2. If the only candidate is an in-flight validation run from
    setup-tracing
    , start or continue a bounded CLI/API poll loop and do not block idly. While waiting, inspect the code path and add only enrichment that is clearly safe from the implementation.
  3. Inspect the trace viewer or exported trace dump once trace data exists.
  4. Classify the current trace:
    • no trace
    • trace exists but only root/provider spans
    • STT/LLM/TTS spans exist but lack attributes
    • tool spans missing
    • parent/child structure is flat or misleading
    • attributes are unsafe, oversized, or high-cardinality
  5. Check whether existing framework instrumentation should be enriched instead of duplicated.
Do not add manual duplicate spans for operations already emitted by Pipecat, LiveKit, Vapi, or an existing OTel integration unless the existing span cannot be enriched.
基于证据而非假设开展工作。
  1. 在Coval中找到一个近期的追踪模拟或对话。当有可用凭证时,使用Coval CLI/API或Trace Search,而非向用户索要截图。
  2. 如果唯一候选对象是来自
    setup-tracing
    的进行中验证运行,启动或继续有限的CLI/API轮询循环,不要闲置等待。等待期间,检查代码路径,仅添加从实现来看明显安全的增强内容。
  3. 一旦追踪数据存在,检查追踪查看器或导出的追踪转储。
  4. 对当前追踪进行分类:
    • 无追踪数据
    • 存在追踪但仅包含根/提供者跨度
    • STT/LLM/TTS跨度存在但缺少属性
    • 工具跨度缺失
    • 父/子结构扁平或具有误导性
    • 属性不安全、过大或基数过高
  5. 检查是否应增强现有框架埋点而非重复实现。
除非现有跨度无法增强,否则不要为Pipecat、LiveKit、Vapi或现有OTel集成已发出的操作添加手动重复跨度。

Phase 1b: Discover Business Events Already In The Code

阶段1b:发现代码中已有的业务事件

Before deciding what to add, grep the agent code for business-event surface area the customer already implements but isn't tracing. Search for terms like
cart
,
order
,
intent
,
tool_call
,
function_call
,
handoff
,
escalat
,
payment
,
checkout
,
confirm
,
cancel
,
transfer
,
submit
,
dispatch
,
end_of_call
,
conversation_end
,
system_notify
,
webhook
,
event_name
. Each match is a candidate for a
business.<event>
span or
llm_tool_call
span with a numeric attribute that can later become a customer-signal metric. This is the most common missed coverage layer: the customer's protocol already defines meaningful business events that never make it into traces because the first pass only instrumented the audio/LLM pipeline.
Propose one span per distinct business event with at least one numeric attribute (cart total, item count, payment amount, escalation level, etc.). Do not wait for the customer to ask "is that all?" before adding these.
在决定添加内容之前,在Agent代码中搜索客户已实现但未追踪的业务事件范围。搜索诸如
cart
order
intent
tool_call
function_call
handoff
escalat
payment
checkout
confirm
cancel
transfer
submit
dispatch
end_of_call
conversation_end
system_notify
webhook
event_name
等术语。每个匹配项都是
business.<event>
跨度或带有数值属性的
llm_tool_call
跨度的候选对象,这些属性日后可成为客户信号指标。这是最常被遗漏的覆盖层:客户的协议已定义有意义的业务事件,但由于首次仅对音频/LLM管道进行了埋点,这些事件从未进入追踪数据。
为每个不同的业务事件建议一个跨度,且至少包含一个数值属性(购物车总额、商品数量、支付金额、升级级别等)。不要等客户问“就这些吗?”再添加这些内容。

Phase 2: Add Coval-Native Span Coverage

阶段2:添加Coval原生跨度覆盖

Prioritize spans that make Coval trace UI, built-in trace metrics, and custom trace metrics more useful:
  • conversation
    : full call/session root
  • turn
    : one user/assistant exchange when the framework exposes a turn boundary
  • stt
    : final speech recognition result
  • stt.provider.<name>
    : each STT provider attempt or fallback
  • llm
    : model call
  • tts
    : synthesis call
  • llm_tool_call
    : tool/function execution
  • vad
    : speech activity decisions when relevant
  • pipeline
    or
    transport
    : only when they help diagnose routing/audio issues
Keep span names stable and low-cardinality. Put IDs, provider names, endpoint names, and dynamic details in attributes. Match the public span naming convention before adding custom business spans: canonical names get semantic colors, labels, and built-in trace metric support.
Prefer OTel span events over new spans for moment-in-time annotations. A span event (
span.add_event("simulation_id_received", {...})
) is a cheap timestamped marker on an existing span — it does not bloat the span count, does not affect trace metrics, and gives the trace viewer a visible flag on the parent timeline. Use events for milestones like
simulation_id_received
,
first_inbound_audio
,
first_speech_detected
,
cart_sent
,
tool_dispatched
,
websocket_disconnect
, and
conversation_end
on the
conversation
root or the relevant
turn
. Reserve full spans for things with non-trivial duration or parent/child structure.
优先添加能提升Coval追踪UI、内置追踪指标和自定义追踪指标实用性的跨度:
  • conversation
    :完整通话/会话根节点
  • turn
    :当框架暴露轮次边界时,一次用户/助手交互
  • stt
    :最终语音识别结果
  • stt.provider.<name>
    :每个STT提供者的尝试或回退
  • llm
    :模型调用
  • tts
    :合成调用
  • llm_tool_call
    :工具/函数执行
  • vad
    :相关时的语音活动决策
  • pipeline
    transport
    :仅在有助于诊断路由/音频问题时添加
保持跨度名称稳定且低基数。将ID、提供者名称、端点名称和动态细节放在属性中。在添加自定义业务跨度前匹配公共跨度命名规范:规范名称会获得语义颜色、标签和内置追踪指标支持。
对于时间点注释,优先使用OTel跨度事件而非新跨度。 跨度事件(
span.add_event("simulation_id_received", {...})
)是现有跨度上的廉价时间戳标记——它不会增加跨度数量,不会影响追踪指标,还能在追踪查看器的父时间轴上显示可见标记。将事件用于
simulation_id_received
first_inbound_audio
first_speech_detected
cart_sent
tool_dispatched
websocket_disconnect
conversation_end
等里程碑,添加到
conversation
根节点或相关
turn
上。仅为具有非 trivial 持续时间或父/子结构的内容保留完整跨度。

Phase 3: Add High-Value Attributes

阶段3:添加高价值属性

Use the canonical attributes from
../references/span-schema.md
.
Minimum valuable set:
  • stt
    :
    transcript
    ,
    metrics.ttfb
    ,
    stt.confidence
    when available
  • stt.provider.<name>
    :
    stt.providerName
    ,
    metrics.ttfb
    ,
    stt.confidence
    , error status when a provider fails
  • llm
    :
    metrics.ttfb
    ,
    llm.finish_reason
    ,
    gen_ai.usage.input_tokens
    ,
    gen_ai.usage.output_tokens
    , model/provider metadata when available
  • tts
    :
    metrics.ttfb
    , provider/voice metadata when safe
  • llm_tool_call
    :
    function.name
    ,
    tool_call_id
    ,
    function.arguments
    if safe and bounded,
    tool.latency_ms
    , numeric
    tool.error
    , numeric
    tool.dependency_unavailable
    ,
    tool.result.count
    when applicable
  • conversation
    :
    tool.call.count
    ,
    tool.failure.count
    , numeric
    workflow.completed
    , numeric
    workflow.dependency_blocked
    , numeric
    workflow.fallback_used
    when the session boundary is available
  • custom spans: one numerical attribute that can become a custom trace metric, such as
    duration_ms
    ,
    retry_count
    ,
    confidence_score
    ,
    queue_wait_ms
    , or
    external_api_latency_ms
Set OTel status to
ERROR
on failing provider/tool/API spans. Coval custom trace metrics can calculate
error_rate
and
success_rate
from span status.
Also emit numeric
0
/
1
flags for important rates. Some public metric creation APIs require a numeric
metric_attribute
;
average
over these flags preserves the rate while still working in those environments.
使用
../references/span-schema.md
中的规范属性。
最低价值属性集:
  • stt
    transcript
    metrics.ttfb
    、可用时的
    stt.confidence
  • stt.provider.<name>
    stt.providerName
    metrics.ttfb
    stt.confidence
    、提供者失败时的错误状态
  • llm
    metrics.ttfb
    llm.finish_reason
    gen_ai.usage.input_tokens
    gen_ai.usage.output_tokens
    、可用时的模型/提供者元数据
  • tts
    metrics.ttfb
    、安全情况下的提供者/语音元数据
  • llm_tool_call
    function.name
    tool_call_id
    、安全且有限的
    function.arguments
    tool.latency_ms
    、数值型
    tool.error
    、数值型
    tool.dependency_unavailable
    、适用时的
    tool.result.count
  • conversation
    tool.call.count
    tool.failure.count
    、数值型
    workflow.completed
    、数值型
    workflow.dependency_blocked
    、会话边界可用时的数值型
    workflow.fallback_used
  • 自定义跨度:一个可成为自定义追踪指标的数值属性,例如
    duration_ms
    retry_count
    confidence_score
    queue_wait_ms
    external_api_latency_ms
在失败的提供者/工具/API跨度上设置OTel状态为
ERROR
。Coval自定义追踪指标可通过跨度状态计算
error_rate
success_rate
同时为重要比率发出数值型
0
/
1
标志。部分公共指标创建API需要数值型
metric_attribute
;对这些标志取平均值可保留比率,同时仍能在这些环境中正常工作。

Phase 4: Protect Customers

阶段4:保护客户数据

Before committing enrichment, remove or bound:
  • API keys, tokens, passwords, session cookies, account secrets, and credentials
  • raw audio blobs or base64 audio
  • full prompts/responses when they may contain PII
  • unbounded transcripts in custom attributes unrelated to
    stt.transcript
  • high-cardinality span names
  • duplicated successful export retries
Prefer summaries and counts:
  • tool.result.count
  • tool.result.success
  • tool.error
  • tool.call.count
  • tool.failure.count
  • workflow.completed
  • workflow.dependency_blocked
  • workflow.fallback_used
  • prompt.message_count
  • response.length
  • http.status_code
  • retry_count
在提交增强内容前,移除或限制以下内容:
  • API密钥、令牌、密码、会话Cookie、账户机密和凭证
  • 原始音频 blob 或base64编码音频
  • 可能包含PII的完整提示/响应
  • stt.transcript
    无关的自定义属性中的无界转录文本
  • 高基数跨度名称
  • 重复的成功导出重试
优先使用摘要和计数:
  • tool.result.count
  • tool.result.success
  • tool.error
  • tool.call.count
  • tool.failure.count
  • workflow.completed
  • workflow.dependency_blocked
  • workflow.fallback_used
  • prompt.message_count
  • response.length
  • http.status_code
  • retry_count

Phase 5: Verify Improved Value

阶段5:验证价值提升

After changes:
  1. Run one representative simulation or conversation through the Coval CLI/API, or reuse the in-flight validation run if it exercises the changed code.
  2. Poll through the CLI/API until the run finishes and trace data appears. While it is pending, prepare candidate metrics or documentation notes instead of waiting idle.
  3. Open the Coval trace viewer.
  4. Confirm the trace has meaningful hierarchy and expected span colors.
  5. Check Trace Search can filter by span name, status, duration, provider, or attributes.
  6. If new numerical attributes were added, run
    configure-trace-metrics
    to create at least one metric against them after the span/attribute is visible in real trace data.
Report before/after differences in concrete terms, such as span count, new span names, new attributes, and the specific debugging question the trace can now answer.
修改完成后:
  1. 通过Coval CLI/API运行一次代表性模拟或对话,若变更代码已被执行,可复用进行中的验证运行。
  2. 通过CLI/API轮询直至运行完成并出现追踪数据。等待期间,准备候选指标或文档说明,不要闲置等待。
  3. 打开Coval追踪查看器。
  4. 确认追踪具有有意义的层级结构和预期的跨度颜色。
  5. 检查Trace Search是否可按跨度名称、状态、持续时间、提供者或属性进行筛选。
  6. 如果添加了新的数值属性,在跨度/属性出现在真实追踪数据中后,运行
    configure-trace-metrics
    至少针对它们创建一个指标。
用具体术语报告前后差异,例如跨度数量、新跨度名称、新属性,以及现在追踪可解答的具体调试问题。",