iii-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Observability

可观测性

Comparable to: Datadog, Grafana, Honeycomb, Jaeger
同类工具:Datadog、Grafana、Honeycomb、Jaeger

Key Concepts

核心概念

Use the concepts below when they fit the task. Not every worker needs custom spans or metrics.
  • Built-in OpenTelemetry support across all SDKs — every function invocation is automatically traced
  • The engine exports traces, metrics, and logs via OTLP to any compatible collector
  • Workers propagate W3C trace context automatically across function invocations
  • Prometheus metrics are exposed on port 9464
  • registerWorker()
    with
    otel
    config enables telemetry per worker
  • Custom spans via
    withSpan(name, opts, fn)
    wrap async work with trace context
  • Custom metrics via
    getMeter()
    create counters and histograms
在任务需要时使用以下概念,并非每个worker都需要自定义跨度或指标。
  • 所有SDK均内置OpenTelemetry支持——每次函数调用都会自动被追踪
  • 引擎通过OTLP将追踪数据、指标和日志导出至任何兼容的收集器
  • Worker会在函数调用间自动传播W3C追踪上下文
  • Prometheus指标在9464端口暴露
  • 带有
    otel
    配置的
    registerWorker()
    可为每个worker启用遥测功能
  • 通过
    withSpan(name, opts, fn)
    创建自定义跨度,为异步工作包裹追踪上下文
  • 通过
    getMeter()
    创建自定义指标,生成计数器和直方图

Architecture

架构

The worker SDK generates spans, metrics, and logs during function execution. These flow to the engine, which exports them via OTLP to a collector (Jaeger, Grafana, Datadog). The engine also exposes a Prometheus endpoint on port 9464 for scraping.
Worker SDK在函数执行期间生成跨度、指标和日志。这些数据流向引擎,再由引擎通过OTLP导出至收集器(如Jaeger、Grafana、Datadog)。引擎还会在9464端口暴露Prometheus端点以供抓取。

iii Primitives Used

使用的iii原语

PrimitivePurpose
registerWorker(url, { otel })
Connect worker with telemetry config
withSpan(name, opts, fn)
Create a custom trace span
getTracer()
Access OpenTelemetry Tracer directly
getMeter()
Access OpenTelemetry Meter for custom metrics
currentTraceId()
Get active trace ID for correlation
injectTraceparent()
Inject W3C trace context into outbound calls
onLog(callback, { level })
Subscribe to log events
shutdown_otel()
Graceful shutdown of telemetry pipeline
原语用途
registerWorker(url, { otel })
连接worker并配置遥测功能
withSpan(name, opts, fn)
创建自定义追踪跨度
getTracer()
直接访问OpenTelemetry Tracer
getMeter()
访问OpenTelemetry Meter以创建自定义指标
currentTraceId()
获取当前追踪ID用于关联
injectTraceparent()
将W3C追踪上下文注入出站调用
onLog(callback, { level })
订阅日志事件
shutdown_otel()
优雅关闭遥测管道

Reference Implementation

参考实现

See ../references/observability.js for the full working example — a worker with custom spans,
Also available in Python: ../references/observability.py
Also available in Rust: ../references/observability.rs metrics counters, trace propagation, and log subscriptions connected to an OTel collector.
完整可运行示例请查看../references/observability.js——这是一个包含自定义跨度、指标计数器、追踪传播和日志订阅的worker,已连接至OTel收集器。
同时提供Python版本:../references/observability.py
以及Rust版本:../references/observability.rs

Common Patterns

常见模式

Code using this pattern commonly includes, when relevant:
  • registerWorker('ws://localhost:49134', { otel: { enabled: true, serviceName: 'my-svc' } })
    — enable telemetry
  • withSpan('validate-order', {}, async (span) => { span.setAttribute('order.id', id); ... })
    — custom span
  • getMeter().createCounter('orders.processed')
    — custom counter metric
  • getMeter().createHistogram('request.duration')
    — custom histogram metric
  • onLog((log) => { ... }, { level: 'warn' })
    — subscribe to warnings and above
  • currentTraceId()
    — get active trace ID for correlation with external systems
  • injectTraceparent()
    — propagate trace context to outbound HTTP calls
  • Disable telemetry:
    registerWorker(url, { otel: { enabled: false } })
    or
    OTEL_ENABLED=false
相关代码通常包含以下内容(按需使用):
  • registerWorker('ws://localhost:49134', { otel: { enabled: true, serviceName: 'my-svc' } })
    —— 启用遥测功能
  • withSpan('validate-order', {}, async (span) => { span.setAttribute('order.id', id); ... })
    —— 自定义跨度
  • getMeter().createCounter('orders.processed')
    —— 自定义计数器指标
  • getMeter().createHistogram('request.duration')
    —— 自定义直方图指标
  • onLog((log) => { ... }, { level: 'warn' })
    —— 订阅警告及以上级别的日志
  • currentTraceId()
    —— 获取当前追踪ID以与外部系统关联
  • injectTraceparent()
    —— 将追踪上下文传播至出站HTTP调用
  • 禁用遥测:
    registerWorker(url, { otel: { enabled: false } })
    OTEL_ENABLED=false

Adapting This Pattern

模式适配

Use the adaptations below when they apply to the task.
  • Enable
    otel
    in
    registerWorker()
    config to start collecting traces automatically
  • Add custom spans around expensive operations (DB queries, LLM calls, external APIs)
  • Create domain-specific metrics (orders processed, payment failures, queue depth)
  • Use
    currentTraceId()
    to correlate iii traces with external system logs
  • Configure
    OtelModule
    in iii-config.yaml for engine-side exporter, sampling ratio, and alerts
  • Point the OTLP endpoint at your collector (Jaeger, Grafana Tempo, Datadog Agent)
根据任务需求使用以下适配方式:
  • registerWorker()
    配置中启用
    otel
    ,开始自动收集追踪数据
  • 在耗时操作(数据库查询、LLM调用、外部API)周围添加自定义跨度
  • 创建领域特定指标(已处理订单数、支付失败数、队列深度)
  • 使用
    currentTraceId()
    将iii追踪数据与外部系统日志关联
  • 在iii-config.yaml中配置
    OtelModule
    ,设置引擎端导出器、采样率和告警规则
  • 将OTLP端点指向你的收集器(Jaeger、Grafana Tempo、Datadog Agent)

Engine Configuration

引擎配置

OtelModule must be enabled in iii-config.yaml for engine-side traces, metrics, and logs. See ../references/iii-config.yaml for the full annotated config reference.
必须在iii-config.yaml中启用OtelModule,才能收集引擎端的追踪数据、指标和日志。完整带注释的配置参考请查看../references/iii-config.yaml

Pattern Boundaries

模式边界

  • For engine-side OtelModule YAML configuration, prefer
    iii-engine-config
    .
  • For SDK init options and function registration, prefer
    iii-functions-and-triggers
    .
  • Stay with
    iii-observability
    when the primary problem is SDK-level telemetry: spans, metrics, logs, and trace propagation.
  • 对于引擎端OtelModule的YAML配置,优先使用
    iii-engine-config
  • 对于SDK初始化选项和函数注册,优先使用
    iii-functions-and-triggers
  • 当核心问题是SDK级别的遥测(跨度、指标、日志和追踪传播)时,使用
    iii-observability

When to Use

使用场景

  • Use this skill when the task is primarily about
    iii-observability
    in the iii engine.
  • Triggers when the request directly asks for this pattern or an equivalent implementation.
  • 当任务主要涉及iii引擎中的
    iii-observability
    时,使用本技能。
  • 当请求直接要求此模式或等效实现时触发。

Boundaries

边界限制

  • Never use this skill as a generic fallback for unrelated tasks.
  • You must not apply this skill when a more specific iii skill is a better fit.
  • Always verify environment and safety constraints before applying examples from this skill.
  • 切勿将本技能作为无关任务的通用 fallback。
  • 当有更特定的iii技能更合适时,不得使用本技能。
  • 在应用本技能中的示例前,务必验证环境和安全约束。