superlog-onboard

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Superlog onboarding

Superlog 接入指南

Wire OpenTelemetry traces, logs, and metrics into the user's project so telemetry streams to Superlog. Cover every app and service in the repo — not just the one the user is currently sitting in.
Prefer native OpenTelemetry APIs and the framework's documented bootstrap over custom helper layers. If a specific stack stumps you, search the OTel docs for that language; don't guess.
Before editing, read the applicable companion skills:
  • otel-onboarding-style
    for general OTel taste.
  • otel-python-style
    for Python services.
  • otel-fastapi-style
    for FastAPI services.
  • otel-livekit-style
    for LiveKit agents.
  • otel-nextjs-style
    for Next.js/Vercel apps.
  • otel-expo-style
    for Expo / React Native apps.
  • otel-supabase-edge-style
    for Supabase Edge Functions.
  • otel-generic-style
    for any other language (Go, Java/Kotlin, Ruby, Rust, .NET/C#, PHP, Elixir, plain Node, …) — use this as the fallback when none of the above match.
将OpenTelemetry追踪、日志和指标接入用户项目,使遥测数据流向Superlog。覆盖仓库中的所有应用和服务——不仅仅是用户当前操作的那一个。
优先使用原生OpenTelemetry API和框架文档中记录的引导方式,而非自定义辅助层。如果遇到特定技术栈的问题,请搜索对应语言的OTel文档,不要猜测。
开始编辑前,请阅读适用的配套技能文档:
  • otel-onboarding-style
    :通用OTel规范
  • otel-python-style
    :Python服务规范
  • otel-fastapi-style
    :FastAPI服务规范
  • otel-livekit-style
    :LiveKit代理规范
  • otel-nextjs-style
    :Next.js/Vercel应用规范
  • otel-expo-style
    :Expo / React Native应用规范
  • otel-supabase-edge-style
    :Supabase Edge Functions规范
  • otel-generic-style
    :其他语言(Go、Java/Kotlin、Ruby、Rust、.NET/C#、PHP、Elixir、纯Node.js等)规范——当以上均不匹配时,以此作为备选方案。

Step 0 — Endpoint and key handling

步骤0 — 端点与密钥处理

The OTLP endpoint is always
https://intake.superlog.sh
and goes inline in the bootstrap code — it's not a secret, no env-var indirection needed.
The ingest API key starts with
superlog_live_
and is project-scoped + write-only — it can only ingest events into one project, can't read anything, can't change settings. Treat it like a Sentry DSN, a PostHog public key, or a Datadog RUM client token: inline it directly in the OTel bootstrap source alongside the endpoint. No
.env
files, no deploy-target wiring, no
process.env.OTEL_EXPORTER_OTLP_HEADERS
. The user deploys their code and events flow.
Two paths, no questions asked:
OTLP端点固定为
https://intake.superlog.sh
,需直接写入引导代码中——它不是机密信息,无需通过环境变量间接引用。
接入API密钥
superlog_live_
开头,是项目级别的只读密钥——它只能向一个项目注入事件,无法读取任何内容或修改设置。请将其视为Sentry DSN、PostHog公钥或Datadog RUM客户端令牌:直接与端点一起写入OTel引导源代码。无需使用
.env
文件、部署目标配置或
process.env.OTEL_EXPORTER_OTLP_HEADERS
。用户部署代码后,事件将自动流向Superlog。
两种处理路径,无需额外询问:

Key in the prompt

提示中包含密钥

If the invoking prompt already contains a
superlog_live_…
key, validate the prefix and inline it in every bootstrap file you write. Done — move on to Step 1.
如果调用提示中已包含
superlog_live_…
格式的密钥,请验证前缀后将其直接写入所有引导文件。完成后进入步骤1。

No key

无密钥

Kick off the device flow immediately, then keep working in parallel — don't block install on signup.
  1. POST https://api.superlog.sh/oauth/device
    with
    Content-Type: application/json
    and body
    {"flow":"skill"}
    . Response includes
    device_code
    ,
    user_code
    ,
    verification_uri_complete
    (a
    https://superlog.sh/activate?code=…&flow=skill
    URL),
    expires_in
    (seconds),
    interval
    (poll interval seconds).
  2. Open
    verification_uri_complete
    in the user's default browser (
    open
    /
    xdg-open
    /
    start ""
    ). Print the URL too so they can copy it if the open command silently fails. Tell the user briefly what's happening: signup is open in their browser, the key flows back here automatically, you're going to keep working.
  3. While the user signs up, do not block. Keep going with Steps 1–4. Inline the literal sentinel
    SUPERLOG_TEST
    in the bootstrap source as a placeholder — Superlog's ingest accepts it from anyone (returns 200 without forwarding anywhere), so the user's app can boot and exercise the OTel bootstrap path while signup is in flight.
  4. At Step 5, poll
    POST https://api.superlog.sh/oauth/token
    with
    {"device_code":"…"}
    every
    interval
    seconds.
    428
    =
    authorization_pending
    , keep waiting.
    200
    returns
    {ingest_key, project_id, user, org, flow:"skill"}
    .
    410
    = expired.
  5. On 200: walk the source files you wrote and replace the
    SUPERLOG_TEST
    literal with the real
    ingest_key
    . Never print the key back to chat (transcripts get logged). The web
    /activate
    page already confirms hand-off to the user.
  6. On 410 / user closed the tab: leave
    SUPERLOG_TEST
    in place and tell the user to sign up at https://superlog.sh/ and swap it later.
立即启动设备流,同时继续后续操作——不要因等待注册而中断安装流程。
  1. 发送
    POST https://api.superlog.sh/oauth/device
    请求,设置
    Content-Type: application/json
    ,请求体为
    {"flow":"skill"}
    。响应包含
    device_code
    user_code
    verification_uri_complete
    (格式为
    https://superlog.sh/activate?code=…&flow=skill
    的URL)、
    expires_in
    (有效期,秒)、
    interval
    (轮询间隔,秒)。
  2. 在用户默认浏览器中打开
    verification_uri_complete
    (使用
    open
    /
    xdg-open
    /
    start ""
    命令)。同时打印该URL,以便用户在命令静默失败时手动复制。简要告知用户当前状态:注册页面已在浏览器中打开,密钥将自动返回,安装流程将继续进行。
  3. 在用户注册期间,不要中断流程。继续执行步骤1–4。在引导源代码中写入占位符
    SUPERLOG_TEST
    ——Superlog的接入服务接受该占位符(返回200状态码但不转发数据),因此用户的应用可以在注册过程中启动并测试OTel引导路径。
  4. 在步骤5中,每隔
    interval
    秒发送
    POST https://api.superlog.sh/oauth/token
    请求,携带
    {"device_code":"…"}
    。返回
    428
    表示
    authorization_pending
    ,继续等待;返回
    200
    则返回
    {ingest_key, project_id, user, org, flow:"skill"}
    ;返回
    410
    表示密钥已过期。
  5. 收到200响应:遍历已编写的源代码文件,将
    SUPERLOG_TEST
    替换为实际的
    ingest_key
    。切勿将密钥打印回聊天窗口(对话记录会被记录)。网页
    /activate
    页面已向用户确认密钥交接完成。
  6. 收到410响应/用户关闭标签页:保留
    SUPERLOG_TEST
    占位符,并告知用户后续可访问https://superlog.sh/注册后手动替换。

Step 1 — Map every app/service in the repo

步骤1 — 梳理仓库中的所有应用/服务

Before instrumenting anything, enumerate what's here. Check workspace manifests (
pnpm-workspace.yaml
, root
package.json
workspaces
,
go.work
, Cargo workspace, Python
pyproject.toml
workspace setups,
apps/*
and
services/*
conventions). Identify each service: web frontend, API, workers, background jobs, CLIs, sample/demo apps, mobile apps, Supabase and/or server functions. Mobile and serverless/edge functions are in scope; do not skip them merely because they are client-side or short-lived. Skip pure type/config packages with no runtime entry point. Do not skip any runnable services or leave them "out of scope": instrument absolutely everything in this run; there may be no follow-up.
Show the user the list before you start, so they can correct it.
开始 instrumentation 前,先枚举仓库中的所有内容。检查工作区清单(
pnpm-workspace.yaml
、根目录
package.json
中的
workspaces
go.work
、Cargo工作区、Python
pyproject.toml
工作区配置、
apps/*
services/*
约定)。识别每个服务:Web前端、API、工作进程、后台任务、CLI、示例/演示应用、移动应用、Supabase和/或服务器函数。移动应用和无服务器/边缘函数均在范围内;不要仅因为它们是客户端或短生命周期服务而跳过。跳过无运行时入口点的纯类型/配置包。不要跳过任何可运行服务或标记为“超出范围”:本次操作需instrument所有内容,可能没有后续跟进机会。
开始操作前向用户展示梳理清单,以便用户进行修正。

Step 2 — For each service, install native OTel and bootstrap

步骤2 — 为每个服务安装原生OTel并完成引导

Use the language's native OpenTelemetry SDK. Don't reach for vendor wrappers or hand-rolled helpers when an official package exists. Examples of what "native" means here:
@opentelemetry/sdk-node
for Node servers,
@vercel/otel
for normal Next.js/Vercel apps (sdk-node breaks Next's webpack and misses the framework bootstrap),
@opentelemetry/sdk-trace-web
+ browser/mobile-compatible exporters for Vite/SPA/Expo,
opentelemetry-instrumentation-*
+
opentelemetry-sdk
for Python,
go.opentelemetry.io/otel
for Go.
No broad wrapper APIs. Avoid reusable helpers like
sendSuperlogSpan
,
recordCounter
,
recordLog
,
startTelemetrySpan
, or
withTelemetry
. Acquire native tracers/meters/loggers at module scope and use the SDK's own APIs directly. In TypeScript/JavaScript, use the published
@superlog/otel-helpers
withSpan
helper for bounded business spans and add
@superlog/otel-helpers
to
package.json
; this is required when the package can be installed because it avoids expanding a whole function into
startActiveSpan
plus
try
/
catch
/
finally
. Do not use helpers around provider SDK calls that OpenInference/provider instrumentation can observe directly. If an edge runtime genuinely cannot load an upstream OTel SDK, keep the shim tiny, provider-neutral, and OTel-shaped:
tracer.startActiveSpan
,
span.setAttributes
,
SpanStatusCode
,
meter.createCounter
,
histogram.record
.
Wire all three signals — traces, logs, metrics. Logs go through OTLP, not just stdout — set up the OTel log bridge for the language so app logs (with their existing log levels and structured fields) carry the active
trace_id
/
span_id
automatically. The user's existing logger keeps working; you're just adding an OTLP handler/processor underneath.
The log bridge is not optional and is not "covered" by the SDK init alone — most language SDKs and framework wrappers wire traces (and sometimes metrics) by default but require an explicit
LoggerProvider
+ OTLP log exporter + log-record processor + a bridge to the existing logger. Examples: Python stdlib needs
LoggerProvider
+
OTLPLogExporter
+
LoggingHandler
attached to the root logger (and
LoggingInstrumentor
for trace correlation on existing records); Node needs
@opentelemetry/sdk-logs
+ an instrumentation for the project's logger (
pino
/
winston
/
bunyan
);
@vercel/otel
requires the
logRecordProcessor(s)
option — without it, no logs leave the process. The companion style skills spell out the exact pieces per stack — read them.
Common log-bridge mistakes to actively check for:
  • Handler attached to a named logger when the app uses the root logger (or vice versa) — nothing flows.
  • Default level filter (e.g. WARNING) swallowing the INFO/DEBUG lines the user actually wants in Superlog.
  • BatchLogRecordProcessor
    not flushed on shutdown → short-lived CLIs, serverless, and edge functions drop the last batch. Wire
    shutdown()
    /
    forceFlush()
    into the runtime's exit hook.
  • An existing vendor transport (Pino → Logtail, Winston → Datadog, etc.) left in place is fine and expected — but make sure you're bridging from the logger, not adding a second transport that double-emits the same line through a different formatter.
  • Logs emitted outside any span will arrive without
    trace_id
    /
    span_id
    . That is correct and expected; do not "fix" it by starting throwaway spans around log calls.
Bootstrap rules:
  • The bootstrap file must run before any framework imports. Use the language/framework's documented hook (
    --import
    flag,
    instrumentation.ts
    , top-of-
    main.py
    import, etc.).
  • Inline the endpoint (
    https://intake.superlog.sh
    ) and the project's ingest key directly in the bootstrap source. Don't read from
    process.env.OTEL_EXPORTER_OTLP_*
    or write any
    .env
    files — the key is write-only, and inline configuration removes a whole class of "OTel didn't start because env vars weren't set" deploy failures. (See the framework-specific style skills for the exact shape per stack.)
  • Use HTTP OTLP exporters, not gRPC. gRPC pulls in native bindings that break bundlers and complicate containers.
  • Use the project's existing package manager (detect via lockfile).
  • Prefer idempotent edits. If a config file already exists, edit don't overwrite.
  • Set resource attributes on the OTel resource for every service:
    service.name
    ,
    service.version
    ,
    deployment.environment.name
    , and
    vcs.repository.url.full
    — the canonical https URL of the repo (e.g.
    https://github.com/acme/api
    ). The repo URL is the important one and is fine to hardcode alongside
    service.name
    in the SDK init; if the build platform exposes the slug (Vercel
    VERCEL_GIT_REPO_OWNER
    /
    VERCEL_GIT_REPO_SLUG
    , Railway
    RAILWAY_GIT_REPO_OWNER
    /
    RAILWAY_GIT_REPO_NAME
    ), prefer reading from env. Also set
    vcs.ref.head.revision
    (commit SHA) on a best-effort basis from whatever env the runtime already injects (
    VERCEL_GIT_COMMIT_SHA
    ,
    RAILWAY_GIT_COMMIT_SHA
    ,
    GITHUB_SHA
    ,
    SOURCE_COMMIT
    ,
    GIT_COMMIT
    ,
    HEROKU_SLUG_COMMIT
    , …). Do not shell out to
    git
    from the running process. Skipping the SHA is fine, skipping the URL is not. Use the OTel semantic-convention keys exactly — do not invent
    git.repo
    /
    app.repo_url
    .
Framework rules:
  • Next.js/Vercel: use
    instrumentation.ts
    with
    @vercel/otel
    registerOTel(...)
    as the bootstrap. Do not substitute a raw
    @opentelemetry/sdk-node
    /
    NodeSDK
    bootstrap unless the repo already uses that architecture and you are extending it. Use
    @opentelemetry/api
    tracers/meters inside route handlers only where auto-instrumentation is blind.
    registerOTel
    does not export logs by default
    — pass
    logRecordProcessor
    (v1) /
    logRecordProcessors
    (v2) with an
    OTLPLogExporter
    from
    @opentelemetry/exporter-logs-otlp-http
    , or no logs will leave the process. Match the option name to the installed
    @vercel/otel
    major version.
  • Expo/React Native: preserve existing Expo Go / unsupported-runtime guards. In supported builds, call
    initObservability()
    before Sentry and before app registration/user code. Inline the endpoint + ingest key in the observability module — no
    EXPO_PUBLIC_OTEL_*
    env vars. The bootstrap reads them straight from constants.
  • Supabase Edge Functions: native Deno OpenTelemetry does not work in hosted Supabase Edge today. Use the tiny OTel-shaped shim pattern above; keep exporter endpoint/headers in one setup area and avoid Superlog-specific function/file names.
  • Python/FastAPI: use native instrumentation such as
    FastAPIInstrumentor.instrument_app(app)
    rather than replacing request handling with manual middleware.
  • Python/LiveKit: lifecycle spans that cross shutdown callbacks may use
    start_span
    +
    trace.use_span(..., end_on_exit=False)
    and end in the shutdown callback. Bounded work should still use decorators or context managers.
Coexist with existing observability vendors. Don't remove Sentry, Datadog, New Relic, Honeycomb, Logtail, Pino transports, etc. OTel sits alongside them. The user explicitly wants both signals flowing during migration; ripping out the incumbent is not your call.
使用对应语言的原生OpenTelemetry SDK。当存在官方包时,不要使用厂商封装或手动编写的辅助工具。“原生”的示例包括:Node服务器使用
@opentelemetry/sdk-node
,普通Next.js/Vercel应用使用
@vercel/otel
sdk-node
会破坏Next的webpack并错过框架引导),Vite/SPA/Expo使用
@opentelemetry/sdk-trace-web
+ 兼容浏览器/移动设备的导出器,Python使用
opentelemetry-instrumentation-*
+
opentelemetry-sdk
,Go使用
go.opentelemetry.io/otel
不要使用宽泛的封装API。避免使用
sendSuperlogSpan
recordCounter
recordLog
startTelemetrySpan
withTelemetry
等可复用辅助工具。在模块作用域获取原生tracer/meter/logger,并直接使用SDK自身的API。在TypeScript/JavaScript中,对于有界业务跨度,使用已发布的
@superlog/otel-helpers
中的
withSpan
辅助工具,并将
@superlog/otel-helpers
添加到
package.json
;当该包可安装时必须使用,因为它避免了将整个函数扩展为
startActiveSpan
加上
try
/
catch
/
finally
。不要在OpenInference/provider instrumentation可直接观测的provider SDK调用周围使用辅助工具。如果边缘运行时确实无法加载上游OTel SDK,请保持shim精简、与provider无关且符合OTel规范:
tracer.startActiveSpan
span.setAttributes
SpanStatusCode
meter.createCounter
histogram.record
同时配置三种信号——追踪、日志、指标。日志需通过OTLP传输,而非仅输出到stdout——为对应语言设置OTel日志桥,使应用日志(保留现有日志级别和结构化字段)自动携带当前
trace_id
/
span_id
。用户现有的logger可继续使用;只需在底层添加OTLP处理器/处理程序。
日志桥是必需的,不能仅通过SDK初始化“覆盖”——大多数语言SDK和框架封装默认配置追踪(有时包括指标),但需要显式配置
LoggerProvider
+ OTLP日志导出器 + 日志记录处理器 + 与现有logger的桥接。示例:Python标准库需要将
LoggerProvider
+
OTLPLogExporter
+
LoggingHandler
附加到根logger(并使用
LoggingInstrumentor
为现有记录添加追踪关联);Node.js需要
@opentelemetry/sdk-logs
+ 项目logger的instrumentation(
pino
/
winston
/
bunyan
);
@vercel/otel
需要
logRecordProcessor(s)
选项——否则日志无法离开进程。配套的风格技能文档详细说明了各技术栈的具体配置,请务必阅读。
需主动检查的常见日志桥错误:
  • 处理程序附加到命名logger,但应用使用根logger(反之亦然)——无数据传输。
  • 默认级别过滤器(如WARNING)过滤掉用户实际希望在Superlog中查看的INFO/DEBUG日志。
  • BatchLogRecordProcessor
    在关闭时未刷新→短生命周期CLI、无服务器和边缘函数会丢失最后一批日志。需将
    shutdown()
    /
    forceFlush()
    连接到运行时的退出钩子。
  • 保留现有厂商传输(Pino → Logtail、Winston → Datadog等)是正常且预期的——但确保从logger进行桥接,而非添加第二个传输器通过不同格式重复输出同一日志行。
  • 在任何跨度之外发出的日志将不带
    trace_id
    /
    span_id
    。这是正确且预期的;不要通过在日志调用周围启动临时跨度来“修复”此问题。
引导规则:
  • 引导文件必须在任何框架导入前运行。使用语言/框架文档中记录的钩子(
    --import
    标志、
    instrumentation.ts
    main.py
    顶部导入等)。
  • 将端点(
    https://intake.superlog.sh
    )和项目接入密钥直接写入引导源代码。不要从
    process.env.OTEL_EXPORTER_OTLP_*
    读取或写入任何
    .env
    文件——密钥是只读的,直接写入配置可消除一类“因未设置环境变量导致OTel无法启动”的部署失败。(请参考特定框架的风格技能文档获取各技术栈的具体格式。)
  • 使用HTTP OTLP导出器,而非gRPC。gRPC会引入原生绑定,破坏打包工具并使容器配置复杂化。
  • 使用项目现有的包管理器(通过锁文件检测)。
  • 优先使用幂等编辑。如果配置文件已存在,编辑而非覆盖。
  • 为每个服务的OTel资源设置资源属性:
    service.name
    service.version
    deployment.environment.name
    ,以及**
    vcs.repository.url.full
    ——仓库的标准HTTPS URL(例如
    https://github.com/acme/api
    )。仓库URL是重要属性,可与
    service.name
    一起硬编码到SDK初始化中;如果构建平台暴露了仓库标识(Vercel
    VERCEL_GIT_REPO_OWNER
    /
    VERCEL_GIT_REPO_SLUG
    、Railway
    RAILWAY_GIT_REPO_OWNER
    /
    RAILWAY_GIT_REPO_NAME
    ),优先从环境变量读取。同时尽最大努力设置
    vcs.ref.head.revision
    **(提交SHA),从运行时已注入的环境变量中获取(
    VERCEL_GIT_COMMIT_SHA
    RAILWAY_GIT_COMMIT_SHA
    GITHUB_SHA
    SOURCE_COMMIT
    GIT_COMMIT
    HEROKU_SLUG_COMMIT
    等)。不要从运行进程中调用
    git
    命令。跳过SHA是可以接受的,但不能跳过URL。请严格使用OTel语义规范中的键——不要自定义
    git.repo
    /
    app.repo_url
框架规则:
  • Next.js/Vercel:使用
    instrumentation.ts
    搭配
    @vercel/otel
    registerOTel(...)
    作为引导方式。除非仓库已使用该架构并需要扩展,否则不要替换为原始的
    @opentelemetry/sdk-node
    /
    NodeSDK
    引导方式。仅在自动instrumentation无法覆盖的路由处理程序中使用
    @opentelemetry/api
    tracers/meters。
    registerOTel
    默认不导出日志
    ——需传递
    logRecordProcessor
    (v1)/
    logRecordProcessors
    (v2)参数,搭配
    @opentelemetry/exporter-logs-otlp-http
    中的
    OTLPLogExporter
    ,否则日志无法离开进程。请根据安装的
    @vercel/otel
    主版本匹配参数名称。
  • Expo/React Native:保留现有的Expo Go / 不支持的运行时防护。在支持的构建中,在Sentry和应用注册/用户代码之前调用
    initObservability()
    。将端点+接入密钥直接写入可观测性模块——不要使用
    EXPO_PUBLIC_OTEL_*
    环境变量。引导代码直接从常量中读取这些值。
  • Supabase Edge Functions:原生Deno OpenTelemetry目前在托管的Supabase Edge环境中无法工作。使用上述精简的OTel风格shim模式;将导出器端点/headers集中在一个设置区域,避免使用Superlog特定的函数/文件名。
  • Python/FastAPI:使用原生instrumentation,如
    FastAPIInstrumentor.instrument_app(app)
    ,而非用手动中间件替换请求处理。
  • Python/LiveKit:跨关闭回调的生命周期跨度可使用
    start_span
    +
    trace.use_span(..., end_on_exit=False)
    ,并在关闭回调中结束跨度。有界工作仍应使用装饰器或上下文管理器。
与现有可观测性厂商共存。不要移除Sentry、Datadog、New Relic、Honeycomb、Logtail、Pino传输器等。OTel与它们并行运行。用户明确希望在迁移期间同时保留两种信号;移除现有工具并非你的职责。

Step 3 — Add custom spans, metrics, and logs around business operations

步骤3 — 为业务操作添加自定义跨度、指标和日志

Auto-instrumentation gets you HTTP in/out, DB queries, framework lifecycle. That's the floor, not the ceiling. Read the project to find the operations a human operator would actually want to see when something looks wrong.
自动instrumentation可覆盖HTTP进出、数据库查询、框架生命周期。这是基础,而非终点。分析项目,找出当出现问题时运维人员实际需要关注的操作。

Traces

追踪

Wrap every critical business operation with an active span. Auto-instrumented spans are fine where they exist — but if an operation isn't already getting a span, add one.
  • Naming:
    domain.verb
    (
    order.process
    ,
    payment.charge
    ,
    email.send
    ,
    agent.run
    ,
    interview.create
    ,
    job.<type>
    ).
  • Attributes: entity IDs (order.id, user.id, workspace.id, tenant.id), counts, key boolean branch outcomes, model name / provider for LLM calls.
  • Record exceptions:
    span.recordException(err)
    +
    span.setStatus({ code: ERROR })
    on failure paths.
  • For Python functions with clear boundaries, prefer
    @tracer.start_as_current_span("operation.name")
    — the same call works as a decorator and as a context manager, and the decorator form is usually what you want for a whole function:
    python
    @tracer.start_as_current_span("do_work")
    def do_work():
        print("doing some work...")
    Use a context manager when a decorator does not fit (partial scope, dynamic span name, etc.). Do not use detached
    start_span()
    + manual
    end()
    for bounded work.
  • Skip trivial getters, pure transforms, internal helpers — anything with no real latency or failure mode.
  • Never put PII in attributes (emails, passwords, tokens, full request bodies).
每个关键业务操作添加活动跨度。自动instrumented跨度在存在时是可行的——但如果某个操作尚未被添加跨度,请为其添加。
  • 命名规则:
    domain.verb
    (例如
    order.process
    payment.charge
    email.send
    agent.run
    interview.create
    job.<type>
    )。
  • 属性:实体ID(order.id、user.id、workspace.id、tenant.id)、计数、关键布尔分支结果、LLM调用的模型名称/提供商。
  • 记录异常:在失败路径调用
    span.recordException(err)
    +
    span.setStatus({ code: ERROR })
  • 对于边界清晰的Python函数,优先使用
    @tracer.start_as_current_span("operation.name")
    ——该调用可作为装饰器或上下文管理器使用,装饰器形式通常适用于整个函数:
    python
    @tracer.start_as_current_span("do_work")
    def do_work():
        print("doing some work...")
    当装饰器不适用时(部分范围、动态跨度名称等),使用上下文管理器。不要对有界工作使用分离的
    start_span()
    + 手动
    end()
  • 跳过琐碎的getter、纯转换、内部辅助函数——任何无实际延迟或失败模式的操作。
  • 切勿在属性中包含PII(电子邮件、密码、令牌、完整请求体)。

Logs

日志

Make sure logs are structured and carry operation context. Concretely: every log line emitted inside a span should arrive at Superlog with
trace_id
/
span_id
populated and any structured fields (orderId, userId, etc.) preserved as attributes. Trace/span context may be added natively by the log bridge or integration, or may require additional work.
Use logs for narrative ("starting batch reconcile", "retrying after 3xx") and exceptional events. An error log must only be emitted if the operation cannot recover and manual intervention is required.
确保日志结构化并携带操作上下文。具体而言:在跨度内发出的每条日志行到达Superlog时应包含
trace_id
/
span_id
,并保留所有结构化字段(orderId、userId等)作为属性。追踪/跨度上下文可由日志桥或集成原生添加,或可能需要额外配置。
使用日志记录事件描述(如“开始批量对账”、“3xx后重试”)和异常事件。仅当操作无法恢复且需要人工干预时才发出错误日志。

Metrics

指标

Cover business + performance + cost. Three categories to look for:
  • Business logic counters. Every meaningful state transition: created, started, completed, failed, retried. Per-tenant, per-channel, per-status — low-cardinality dimensions only (never user/order IDs).
  • Performance histograms. Latency of operations the user cares about, queue depth, batch sizes, payload sizes. Reuse existing timing instrumentation if the project already has any (
    time.perf_counter
    blocks, custom
    LatencyTracker
    s, "[TIMING]" log lines) — emit a histogram from those measurements rather than measuring twice.
  • Costs — especially LLM costs. If the project calls OpenAI / Anthropic / Google / any LLM provider, prefer provider instrumentation such as OpenInference where available so native SDK calls stay readable. Do not add pricing constants or LLM cost math in product handlers; Superlog computes estimated cost centrally in the UI/query layer from captured provider/model/token attributes. Avoid duplicating token counters already captured by provider instrumentation.
Get the meter once at module level, create instruments at module level, increment in the hot path. Don't create a fresh meter per call.
覆盖业务+性能+成本三类指标:
  • 业务逻辑计数器:每个有意义的状态转换:创建、启动、完成、失败、重试。按租户、渠道、状态统计——仅使用低基数维度(切勿使用用户/订单ID)。
  • 性能直方图:用户关注的操作延迟、队列深度、批处理大小、负载大小。如果项目已有计时instrumentation(
    time.perf_counter
    块、自定义
    LatencyTracker
    、“[TIMING]”日志行),请复用现有测量结果——从这些测量结果中生成直方图,而非重复测量。
  • 成本——尤其是LLM成本。如果项目调用OpenAI / Anthropic / Google / 任何LLM提供商,优先使用OpenInference等提供商instrumentation,以便原生SDK调用保持可读性。不要在产品处理程序中添加定价常量或LLM成本计算;Superlog会在UI/查询层根据捕获的提供商/模型/令牌属性集中计算估算成本。避免重复计算提供商instrumentation已捕获的令牌计数器。
在模块作用域获取一次meter,在模块作用域创建instrument,在热点路径中递增计数。不要每次调用都创建新的meter。

Step 4 — Verify the app still works

步骤4 — 验证应用仍可正常运行

Per service:
  1. Run the project's own dev or build command (whatever its
    package.json
    /
    pyproject
    /
    Makefile
    already wires up). Confirm it starts cleanly with no errors that trace back to your OTel install. Also run a telemetry bootstrap smoke that imports or starts the app, so provider setup, exporter construction, log bridging, and framework instrumentation all initialize. For a Python server this can be an import/startup command such as
    uv run python -c 'from app.main import app; print(app.title)'
    ; for Node/Next use the repo's build/start path. For a server, hit at least one route with curl so traffic flows through the instrumentation; choose a route that exercises an instrumented operation when practical, not only a static health route. For a CLI, invoke a real command. Don't ship if the app's own startup is now broken — that's a regression.
  2. Confirm telemetry leaves the process — for all three signals. With the inline
    SUPERLOG_TEST
    (or real key) in the bootstrap, OTLP POSTs from the dev server should return 2xx for each of
    /v1/traces
    ,
    /v1/logs
    , and
    /v1/metrics
    — that proves the full bootstrap is reaching the network, not just the trace pipeline. The signal is the running app's own POSTs to all three paths succeeding by the time the dev server shuts down. To force the logs path specifically, hit a route (or invoke a CLI command) that you know calls the project's logger inside an instrumented operation, then watch the dev server's outbound traffic / debug exporter output for a
    /v1/logs
    POST. If only
    /v1/traces
    shows up, the log bridge isn't wired (most common causes:
    LoggerProvider
    never set, handler attached to the wrong logger, level filter too strict,
    @vercel/otel
    missing
    logRecordProcessor(s)
    , or shutdown not flushing the batch processor).
A bootstrap that loads but never POSTs — or POSTs traces but no logs/metrics — is not a partial success. Fix it before moving on.
针对每个服务:
  1. 运行项目自身的开发或构建命令(无论其
    package.json
    /
    pyproject
    /
    Makefile
    中已配置的命令)。确认应用可正常启动,无因你的OTel安装导致的错误。同时运行遥测引导测试,导入或启动应用,确保provider设置、导出器构造、日志桥接和框架instrumentation均已初始化。对于Python服务器,可使用导入/启动命令如
    uv run python -c 'from app.main import app; print(app.title)'
    ;对于Node/Next.js,使用仓库的构建/启动路径。对于服务器,至少使用curl访问一个路由,使流量通过instrumentation;如果可行,选择一个可测试instrumented操作的路由,而非仅静态健康检查路由。对于CLI,调用一个实际命令。如果应用自身启动失败,请勿交付——这属于回归问题。
  2. 确认遥测数据已离开进程——三种信号均需验证。引导代码中使用
    SUPERLOG_TEST
    (或实际密钥)时,开发服务器的OTLP POST请求应针对
    /v1/traces
    /v1/logs
    /v1/metrics
    均返回2xx状态码——这证明完整的引导流程已连接到网络,而非仅追踪管道正常工作。验证信号的方式是:在开发服务器关闭前,运行中的应用向三个路径发送的POST请求均成功。为专门验证日志路径,访问一个你知道会在instrumented操作中调用项目logger的路由(或调用CLI命令),然后观察开发服务器的出站流量/调试导出器输出中是否有
    /v1/logs
    POST请求。如果仅
    /v1/traces
    有请求,则日志桥未正确配置(最常见原因:
    LoggerProvider
    未设置、处理程序附加到错误的logger、级别过滤器过严、
    @vercel/otel
    缺少
    logRecordProcessor(s)
    、或关闭时未刷新批处理处理器)。
引导代码已加载但从未发送POST请求——或仅发送追踪请求而无日志/指标请求——不属于部分成功。修复后再继续。

Step 5 — Hand-off (final message to the user)

步骤5 — 交接(给用户的最终消息)

If you started a device flow in Step 0, collect the key first. Print one line that you're waiting for sign-up to finish (so the user knows the terminal isn't frozen), then poll
POST https://api.superlog.sh/oauth/token
with
{"device_code":"…"}
at the
interval
returned earlier. Cap the wait at
expires_in
(default 600s).
  • On 200: walk every source file where you inlined
    SUPERLOG_TEST
    and replace it with the real
    ingest_key
    . Never print the key back to chat (transcripts get logged); the web
    /activate
    page already confirms hand-off to the user.
  • On 410 / poll timeout: leave
    SUPERLOG_TEST
    inline, tell the user "sign-up didn't finish in time — sign up at https://superlog.sh/ when you're ready and swap the literal in the bootstrap files I wrote." Continue with the rest of the closing message.
If the key was already supplied in the prompt, no polling needed — it's been inline from the start.
如果在步骤0中启动了设备流,请先获取密钥。打印一行说明你正在等待注册完成(以便用户知道终端未冻结),然后按照之前返回的
interval
间隔轮询
POST https://api.superlog.sh/oauth/token
,携带
{"device_code":"…"}
。最长等待时间为
expires_in
(默认600秒)。
如果提示中已提供密钥,则无需轮询——密钥已直接写入代码。

What changed

变更内容

3–7 short factual bullets covering: packages installed, files created/modified, business spans/metrics added. Per service if changes differed, grouped if uniform. Mention any existing observability vendor (Sentry, Datadog, Logtail, Pino transports, etc.) you intentionally left in place so the coexistence is explicit.
3–7条简短的事实性要点,涵盖:安装的包、创建/修改的文件、添加的业务跨度/指标。如果各服务的变更不同,请按服务分别说明;如果变更统一,则分组说明。提及你有意保留的现有可观测性厂商(Sentry、Datadog、Logtail、Pino传输器等),明确说明共存关系。

Deploy

部署

Tell the user to deploy as they normally would — push to their hosting platform, run their existing CI, or run locally. There are no env vars to wire and nothing platform-specific to configure: the endpoint and key are inline in the bootstrap, so events start flowing the moment the instrumented code runs.
If the user asks "where do I put the key in production?" — the answer is "you already did, it's in the source you just deployed."
告知用户按常规方式部署——推送到托管平台、运行现有CI流程或本地运行。无需配置环境变量或特定平台设置:端点和密钥已直接写入引导代码,因此instrumented代码运行后,事件将立即开始流向Superlog。
如果用户询问“生产环境中我应将密钥放在哪里?”——回答是“你已经完成配置,密钥就在你刚刚部署的源代码中。”

Step 6 — Drive GitHub, Slack, and MCP install

步骤6 — 引导安装GitHub、Slack和MCP集成

Only run this step if the device flow completed successfully (you have a real
ingest_key
and the
user_code
from the
/oauth/token
response). Skip it on poll timeout / sentinel — the user can install integrations from the dashboard later.
Walk the user through three short steps in this order. Pause for confirmation between each so they can keep up.
仅当设备流成功完成(你已获取实际的
ingest_key
/oauth/token
响应中的
user_code
)时才执行此步骤。如果轮询超时/使用占位符,则跳过此步骤——用户可稍后从仪表板安装集成。
按以下顺序引导用户完成三个简短步骤。每个步骤后暂停等待确认,以便用户跟上进度。

GitHub

GitHub

"Opening the GitHub install page — pick the repos you want Superlog to read and approve. Press enter when you're back."
Open
https://api.superlog.sh/github/install?user_code=<USER_CODE>
in their default browser (
open
/
xdg-open
/
start ""
). The browser walks them through GitHub's app install; the page bounces back to
/activate?…&gh=done
with a "GitHub connected" confirmation.
When they hit enter (or say done in chat), move on. If they say "skip" or close the tab, move on without complaint.
“正在打开GitHub安装页面——选择你希望Superlog读取的仓库并批准。返回后按回车键。”
在用户默认浏览器中打开
https://api.superlog.sh/github/install?user_code=<USER_CODE>
(使用
open
/
xdg-open
/
start ""
命令)。浏览器将引导用户完成GitHub应用安装流程;页面将跳转回
/activate?…&gh=done
,显示“GitHub已连接”确认信息。
用户按回车键(或在聊天中表示完成)后,继续下一步。如果用户表示“跳过”或关闭标签页,无需抱怨,直接继续。

Slack

Slack

"Opening Slack OAuth — pick the workspace and approve. Press enter when you're back."
Open
https://api.superlog.sh/slack/install?user_code=<USER_CODE>
the same way. Slack returns to
/activate?…&slack=done
.
Same skip semantics.
“正在打开Slack OAuth页面——选择工作区并批准。返回后按回车键。”
以相同方式打开
https://api.superlog.sh/slack/install?user_code=<USER_CODE>
。Slack将返回至
/activate?…&slack=done
跳过规则同上。

Superlog MCP

Superlog MCP

Suggest installing the Superlog MCP server so the agent (Claude Code, Codex, Cursor, etc.) can query telemetry directly next time they're debugging — search logs, pull traces, check error rates from inside the chat without context-switching to the dashboard.
For Claude Code (most common — that's where this skill is running), offer to run it for them:
claude mcp add --transport http superlog https://api.superlog.sh/mcp
This edits the user's Claude Code config. Confirm before running (the user may have a custom MCP scope or want to install elsewhere). If they decline, print the command so they can run it themselves later.
For other agents the user might also use, mention but do not run:
  • Codex:
    codex mcp add superlog --url https://api.superlog.sh/mcp && codex mcp login superlog
  • Cursor / others: copy the
    mcpServers
    snippet from https://superlog.sh/ → Connect.
When all three are done (or skipped), close out with a single line directing the user to deploy their app — they're ready to ship.
建议安装Superlog MCP服务器,以便代理(Claude Code、Codex、Cursor等)下次调试时可直接查询遥测数据——无需切换到仪表板,即可在聊天中搜索日志、提取追踪信息、检查错误率。
对于Claude Code(最常见——本技能运行的环境),主动提出为用户运行以下命令:
claude mcp add --transport http superlog https://api.superlog.sh/mcp
此命令将编辑用户的Claude Code配置。运行前请确认(用户可能有自定义MCP范围或希望安装到其他位置)。如果用户拒绝,请打印该命令以便用户稍后自行运行。
对于用户可能使用的其他代理,仅提及但不要运行
  • Codex:
    codex mcp add superlog --url https://api.superlog.sh/mcp && codex mcp login superlog
  • Cursor / 其他:从https://superlog.sh/ → Connect复制
    mcpServers
    代码片段。
完成(或跳过)所有三个步骤后,用一句话引导用户部署应用——他们已准备就绪。

Hard rules

硬性规则

  • Never modify files outside the project root.
  • Never commit, push, or open PRs.
  • Inline the ingest key in source. It's a project-scoped, write-only token (think Sentry DSN); env-var indirection just adds deploy-time failure modes for no gain.
  • Never remove an existing observability vendor unless the user asks for it.
  • Use the project's existing package manager and existing logger.
  • Prefer native OTel packages for the language; don't reinvent telemetry plumbing the SDK already provides.
  • If the dev/build command errors out because of your instrumentation, that's a failure — fix it or report it, don't paper over it.
  • 切勿修改项目根目录以外的文件。
  • 切勿提交、推送或打开PR。
  • 将接入密钥直接写入源代码。它是项目级别的只读令牌(类似Sentry DSN);环境变量间接引用只会增加部署时的失败风险,无任何益处。
  • 除非用户明确要求,否则切勿移除现有可观测性厂商。
  • 使用项目现有的包管理器和logger。
  • 优先使用对应语言的原生OTel包;不要重复造轮子,SDK已提供的遥测管道无需自行编写。
  • 如果开发/构建命令因你的instrumentation而失败,这属于故障——修复或报告问题,不要掩盖。