launchdarkly-metric-choose

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LaunchDarkly Metric Choose

LaunchDarkly 指标选择

You're using a skill that helps users select the right metrics before setting up an experiment, guarded rollout, or release policy. Your job is to understand the feature context, surface what will auto-attach from existing project policies, inventory what's available and healthy, and produce a clear typed recommendation.

This skill is advisory. It does not create metrics, attach them to experiments, or configure rollouts. For those tasks, see the related skills at the end of this document.

你正在使用的技能可帮助用户在搭建实验、防护式灰度发布或发布策略前选择合适的指标。你的职责是理解功能上下文，展示现有项目策略中会自动关联的内容，清点所有可用且状态正常的指标，并给出清晰的分类推荐。

本技能仅提供建议，不会创建指标、将指标关联到实验，也不会配置发布流程。如需完成这些任务，请参考本文末尾的相关技能。

Prerequisites

前置条件

This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.

Required MCP tools:

```
list-metrics
```
— inventory available metrics with their types and event keys
```
list-metric-events
```
— check which event keys have recent activity

Optional MCP tools (enhance workflow):

```
list-release-policies
```
— fetch project-level policies that configure which metrics auto-attach to guarded rollouts. Use this for the guarded rollout and release policy paths.

本技能需要你在环境中配置远程托管的LaunchDarkly MCP服务器。

所需MCP工具：

```
list-metrics
```
— 清点所有可用指标，包含其类型和事件键
```
list-metric-events
```
— 检查哪些事件键有近期活动数据

可选MCP工具（优化工作流程）：

```
list-release-policies
```
— 获取项目级策略，这些策略配置了会自动关联到防护式灰度发布的指标。用于防护式灰度发布和发布策略场景。

Workflow

工作流程

Step 1: Identify the Context

步骤1：确定场景上下文

Ask two questions upfront:

What is this for?
- (a) Experiment — testing a hypothesis with a flag variant
- (b) Guarded rollout — progressively rolling out a change with automatic regression detection
- (c) Release policy — creating or editing a project-wide policy that configures default metrics for all guarded rollouts matching certain conditions
What is the change?
- Flag key (if applicable)
- Plain-language description: "Rolling out a new checkout flow" / "Testing a new recommendation algorithm"

首先询问两个问题：

本次配置的用途是什么？
- (a) 实验 — 使用开关变体测试业务假设
- (b) 防护式灰度发布 — 渐进式发布变更，同时具备自动回归检测能力
- (c) 发布策略 — 创建或编辑项目全局策略，为符合特定条件的所有防护式灰度发布配置默认指标
本次发布的变更是啥？
- 开关Key（如适用）
- 通俗描述：「上线新的结账流程」/「测试新的推荐算法」

Step 2: Fetch Existing Configuration (Guarded Rollout and Release Policy only)

步骤2：获取现有配置（仅防护式灰度发布和发布策略场景需要）

For experiments — skip this step. There is no pre-existing configuration to surface.

For guarded rollouts and release policy work, call

list-release-policies

first:

list-release-policies(projectKey)

Surface the results before making any recommendations:

Your project has 2 release policies:

Policy: "Production guardrails" (applies to: environment=production)
  Auto-attaches to guarded rollouts:
    ✓ api-error-rate  (count, LowerThanBaseline)
    ✓ p95-latency     (value, LowerThanBaseline)
    ✓ [Metric group] Core Platform Health (3 metrics)

Policy: "Default" (applies to: all environments)
  No metrics configured.

This tells the user what's already covered before they choose anything additional. For a guarded rollout, these metrics will appear automatically — the recommendation is about what to add on top, not rebuild from scratch.

If no policies exist or none have metrics configured, note that all metrics must be selected manually.

实验场景跳过本步骤，没有需要提前展示的预设配置。

针对防护式灰度发布和发布策略工作，请先调用

list-release-policies

：

list-release-policies(projectKey)

在给出任何推荐前先展示查询结果：

Your project has 2 release policies:

Policy: "Production guardrails" (applies to: environment=production)
  Auto-attaches to guarded rollouts:
    ✓ api-error-rate  (count, LowerThanBaseline)
    ✓ p95-latency     (value, LowerThanBaseline)
    ✓ [Metric group] Core Platform Health (3 metrics)

Policy: "Default" (applies to: all environments)
  No metrics configured.

这可以让用户在选择额外指标前了解已经覆盖的内容。对于防护式灰度发布，这些指标会自动关联，推荐仅需要覆盖需要额外添加的内容，无需从零开始配置。

如果没有任何现有策略，或策略未配置任何指标，请告知用户所有指标都需要手动选择。

Step 3: Inventory Available Metrics with Event Health

步骤3：清点可用指标及事件健康度

Call

list-metrics

to see all metrics in the project, then cross-reference with

list-metric-events

Organize into two groups:

Group	Criteria	Note
Healthy	Event key appears in `list-metric-events`	Safe to recommend
At-risk	Event key absent from `list-metric-events`	Warn: may not produce data

Show this inventory before recommending — it may reveal that a metric the user has in mind has no events flowing.

调用

list-metrics

查看项目中的所有指标，然后和

list-metric-events

的结果交叉校验。

将指标分为两组：

分组	判断标准	备注
健康	事件键出现在 `list-metric-events` 结果中	可安全推荐
有风险	事件键未出现在 `list-metric-events` 结果中	警告：可能无法产生数据

在给出推荐前先展示这个清点结果，可能会发现用户想要使用的指标没有事件上报。

Step 4: Recommend

步骤4：给出推荐

The reasoning differs meaningfully by context.

推荐逻辑根据场景不同有明显差异。

(a) Experiment

(a) 实验

Start with the hypothesis, not the metric list.

Ask the user to complete this sentence before looking at available metrics:

"If this change succeeds, [metric] will [increase / decrease]."

The primary metric must directly measure that hypothesis — not a proxy, not a correlation. If the user can't complete the sentence, help them get there first.

Propose one primary metric. It must:

Directly measure the hypothesis
Have events actively flowing
Have an unambiguous success direction (
```
HigherThanBaseline
```
or
```
LowerThanBaseline
```
)

Propose typed secondary metrics. Suggest at least one of each type that applies:

Type	Purpose	Example
Guardrail	Did the change break anything?	Error rate, crash rate, latency p95
Counter-metric	Did A improve at the cost of B?	If primary is conversion, add support tickets or session length
Supporting signal	Does correlated behavior confirm the hypothesis?	If primary is signup, add onboarding step 2 completion

One of each type is usually the right amount. More secondary metrics add noise and interpretation burden.

从业务假设出发，而非从指标列表出发。

在查看可用指标前，先让用户补全这句话：

「如果本次变更成功，[指标]将会[上升/下降]。」

核心指标必须直接衡量这个假设，不能是 proxy 指标，也不能是关联指标。如果用户无法补全这句话，先帮助用户明确假设。

推荐1个核心指标，必须满足：

直接衡量业务假设
有持续的事件上报
有明确的成功方向（
```
HigherThanBaseline
```
或
```
LowerThanBaseline
```
）

推荐分类的次级指标，至少建议每个适用类型各一个：

类型	用途	示例
防护规则	变更是否导致了故障？	错误率、崩溃率、p95延迟
对冲指标	A指标的提升是否以B指标下降为代价？	如果核心指标是转化率，添加支持工单数量或会话时长
支撑信号	关联行为是否能验证假设？	如果核心指标是注册量，添加新手流程第二步完成率

通常每个类型各一个就足够了，更多的次级指标会增加噪音和解读成本。

(b) Guarded Rollout

(b) 防护式灰度发布

Guarded rollouts are safety mechanisms, not experiments. Each metric you add is a potential automatic rollback trigger — if it regresses beyond its threshold before the rollout completes, LaunchDarkly can stop and revert the release.

Start from what auto-attaches. After surfacing the release policy results in Step 2, ask: "Are the auto-attached metrics enough, or do you want to add more for this specific rollout?"

When recommending additional metrics:

Bias toward reliability — engineering metrics (error rate, latency, crash rate) with stable, predictable baselines
Avoid exploratory product metrics that are noisy or hard to interpret under regression analysis
Fewer is better. Two or three high-signal metrics is the right size. More than five creates false positive rollback risk.
Only recommend metrics with events actively flowing. An at-risk metric in a guarded rollout either produces no signal or, worse, triggers a false rollback due to data quality issues, not a real regression.

Suggested starting point for any guarded rollout (if not already covered by a policy):

Error rate — are we seeing more errors in the new variation?
Latency / response time — is the new variation slower?
One domain-specific metric tied to the core user action the change affects

防护式灰度发布是安全机制，不是实验。你添加的每个指标都是潜在的自动回滚触发条件：如果在发布完成前指标退化超过阈值，LaunchDarkly可以停止发布并回滚版本。

从自动关联的指标出发，在步骤2展示完发布策略结果后，询问：「自动关联的指标是否足够，还是需要为本次特定发布添加更多指标？」

推荐额外指标时：

优先选择可靠性类工程指标（错误率、延迟、崩溃率），这类指标有稳定可预测的基线
避免使用探索性产品指标，这类指标噪音大，回归分析时难以解读
越少越好，2-3个高信号指标是最优规模，超过5个会增加误回滚的风险
仅推荐有持续事件上报的指标，防护式灰度发布中的高风险指标要么没有信号，更糟的情况是会因为数据质量问题而非真实回归触发错误回滚。

任何防护式灰度发布的建议起始指标（如果没有被策略覆盖）：

错误率 — 新变体的错误量是否上升？
延迟/响应时间 — 新变体是否更慢？
1个和变更影响的核心用户行为相关的领域特定指标

(c) Release Policy

(c) 发布策略

Release policies apply to every rollout in the project that matches their conditions. This is the highest bar.

Start from the current state. After surfacing existing policies in Step 2, ask: "Which policy are you editing, or do you want to create a new one? What environments or flag conditions will it apply to?"

When recommending metrics for a policy:

2–3 metrics maximum. More than that turns the policy into a burden on every rollout, including ones where the metrics don't apply well.
Only recommend metrics with a long, stable event history. If an event has been flowing reliably for months, it's a safe project-wide default. Occasional gaps will create problems at scale.
Push back on additions. If the user proposes more than 3, ask which ones they'd remove. The discipline of choosing is the point.
Explain scope conditions. A policy scoped to
```
environment=production
```
only applies to production rollouts. Help the user think through whether they want the same metrics in staging (where baselines may differ) or a separate policy.

Typical strong policy candidates: error rate, a core conversion or engagement metric, latency.

发布策略适用于项目中所有符合条件的发布，标准最高。

从当前状态出发，在步骤2展示完现有策略后，询问：「你要编辑哪个策略，还是要创建新策略？它会适用于哪些环境或开关条件？」

为策略推荐指标时：

最多2-3个指标，太多会让策略成为所有发布的负担，包括那些指标不适用的发布场景
仅推荐有长期稳定事件历史的指标，如果一个事件已经可靠上报了几个月，作为项目全局默认值是安全的，偶尔的上报中断会在规模化使用时产生问题
拒绝过多添加请求，如果用户提议超过3个指标，询问他们要移除哪些，选择的严谨性是核心
解释适用范围条件，限定为
```
environment=production
```
的策略仅适用于生产环境发布，帮助用户考虑是否要在测试环境（基线可能不同）使用相同指标，还是单独配置策略。

典型的优质策略候选指标：错误率、核心转化或参与度指标、延迟。

Step 5: Deliver the Recommendation

步骤5：交付推荐结果

Output a clear, named list. Be explicit about what each metric is for and what's already covered:

Recommended metrics for: new checkout flow guarded rollout (environment: production)

AUTO-ATTACHED (from "Production guardrails" policy):
  ✓ api-error-rate    (count, LowerThanBaseline)
  ✓ p95-latency       (value, LowerThanBaseline)

ADDITIONAL — recommended for this rollout:
  ✓ checkout-conversion  (occurrence, HigherThanBaseline)
    → Confirms the rollout isn't degrading the core conversion the feature targets

⚠ page-load-time — no recent events. Instrument the event before including it,
  or remove it from the list to avoid a false rollback trigger.

Then close with next steps:

If a metric the user needs doesn't exist → use the metric-create skill
If an event isn't flowing → use the metric-instrument skill
Once the list is confirmed → configure the guarded rollout or experiment (via the LaunchDarkly UI or API)

输出清晰的命名列表，明确说明每个指标的用途和已经覆盖的内容：

Recommended metrics for: new checkout flow guarded rollout (environment: production)

AUTO-ATTACHED (from "Production guardrails" policy):
  ✓ api-error-rate    (count, LowerThanBaseline)
  ✓ p95-latency       (value, LowerThanBaseline)

ADDITIONAL — recommended for this rollout:
  ✓ checkout-conversion  (occurrence, HigherThanBaseline)
    → Confirms the rollout isn't degrading the core conversion the feature targets

⚠ page-load-time — no recent events. Instrument the event before including it,
  or remove it from the list to avoid a false rollback trigger.

然后附上后续步骤：

如果用户需要的指标不存在 → 使用 metric-create 技能
如果事件没有上报 → 使用 metric-instrument 技能
列表确认后 → 配置防护式灰度发布或实验（通过LaunchDarkly UI或API）

Important Context

重要注意事项

Mid-experiment metric changes require a restart. LaunchDarkly snapshots the metric configuration when an experiment starts. Adding, removing, or changing metrics after launch requires stopping the experiment and restarting it — historical data from before the change is not comparable. Raise this immediately if the user mentions they're mid-experiment.
A primary metric with no events is worse than no primary metric. The experiment produces no statistical output. Event health is a hard requirement for the primary metric.
CUPED and percentile analysis are incompatible. If the experiment uses CUPED variance reduction, percentile-based metrics (e.g. p95 latency) silently degrade to mean-based analysis. Flag this if the user selects a percentile metric in a CUPED-enabled experiment.
Context kind mismatches cause missing data. If the metric event is tracked with a
```
device
```
context but the experiment randomizes on
```
user
```
, the event won't be attributed correctly. Confirm that the context kind in
```
track()
```
calls matches the experiment's randomization unit.
Release policy metrics must share the same context kind. All metrics in a guarded rollout release policy must use the same randomization unit. If the user proposes metrics with mismatched context kinds, flag it before they try to configure the policy.

实验中途修改指标需要重启实验，LaunchDarkly会在实验启动时快照指标配置，启动后添加、移除或修改指标需要停止并重启实验，变更前的历史数据不具备可比性。如果用户提到正在实验进行中，请立即告知这个规则。
没有事件上报的核心指标比没有核心指标更糟糕，实验不会产生任何统计输出。事件健康度是核心指标的硬性要求。
CUPED和百分位分析不兼容，如果实验使用了CUPED方差削减，基于百分位的指标（比如p95延迟）会静默降级为均值分析。如果用户在启用CUPED的实验中选择了百分位指标，请标记这个问题。
上下文类型不匹配会导致数据缺失，如果指标事件是用
```
device
```
上下文上报的，但实验是基于
```
user
```
随机分流的，事件不会被正确归因。请确认
```
track()
```
调用中的上下文类型和实验的随机分流单位匹配。
发布策略指标必须使用相同的上下文类型，防护式灰度发布策略中的所有指标必须使用相同的随机分流单位。如果用户提议的指标上下文类型不匹配，在他们配置策略前请标记这个问题。