aiconfig-online-evals
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Config Online Evaluations
AI Config 在线评估
Attach judges to AI Config variations for automatic quality scoring using LLM-as-a-judge methodology. Judges evaluate responses and return scores between 0.0 and 1.0.
为AI Config变体添加Judge,使用LLM-as-a-judge方法实现自动质量评分。Judge会评估响应并返回0.0到1.0之间的分数。
Prerequisites
前提条件
- LaunchDarkly account with AI Configs enabled
- API access token with write permissions
- Existing AI Config with variations (use skill)
aiconfig-create - For automatic metric recording: Python AI SDK v0.14.0+ or Node.js AI SDK v0.16.1+
- 已启用AI Configs的LaunchDarkly账户
- 具备写入权限的API访问令牌
- 已存在带变体的AI Config(使用技能)
aiconfig-create - 如需自动记录指标:Python AI SDK v0.14.0+ 或 Node.js AI SDK v0.16.1+
API Key Detection
API密钥检测
- Check environment variables - ,
LAUNCHDARKLY_API_KEY,LAUNCHDARKLY_API_TOKENLD_API_KEY - Check MCP config - Claude: ->
~/.claude/config.jsonmcpServers.launchdarkly.env.LAUNCHDARKLY_API_KEY - Prompt user - Only if detection fails
- 检查环境变量 - 、
LAUNCHDARKLY_API_KEY、LAUNCHDARKLY_API_TOKENLD_API_KEY - 检查MCP配置 - Claude:->
~/.claude/config.jsonmcpServers.launchdarkly.env.LAUNCHDARKLY_API_KEY - 提示用户 - 仅在检测失败时执行
Core Concepts
核心概念
What Are Judges?
什么是Judge?
Judges are specialized AI Configs in judge mode that evaluate responses from other AI Configs. They use an LLM to score outputs and return structured results:
json
{
"score": 0.85,
"reasoning": "Answered correctly with one minor omission"
}Judge是处于judge模式的专用AI Config,用于评估其他AI Config的响应。它们使用LLM对输出进行评分并返回结构化结果:
json
{
"score": 0.85,
"reasoning": "Answered correctly with one minor omission"
}Built-in Judges
内置Judge
LaunchDarkly provides three pre-configured judges:
| Judge | Metric Key | Measures |
|---|---|---|
| Accuracy | | How correct and grounded the response is |
| Relevance | | How well it addresses the user request |
| Toxicity | | Harmful or unsafe phrasing (lower = safer) |
LaunchDarkly提供三个预配置的Judge:
| Judge | Metric Key | 衡量维度 |
|---|---|---|
| Accuracy | | 响应的正确性和事实依据充分性 |
| Relevance | | 响应与用户请求的契合度 |
| Toxicity | | 有害或不安全表述(分数越低越安全) |
Completion Mode Only
仅支持Completion模式
Judges can only be attached to completion mode AI Configs in the UI. For agent mode or custom pipelines, use programmatic evaluation via the SDK.
在UI中,Judge仅能附加到completion模式的AI Config。对于agent模式或自定义流水线,请通过SDK使用程序化评估。
Restrictions
限制条件
- Cannot attach judges to judges (no recursion)
- Cannot attach multiple judges with the same metric key to a single variation
- Cannot view/edit model parameters or tools on judge variations
- 不能将Judge附加到Judge上(不允许递归)
- 不能为单个变体附加多个具有相同metric key的Judge
- 无法查看/编辑Judge变体的模型参数或工具
Workflow
工作流程
Step 1: Create Custom Judges (Optional)
步骤1:创建自定义Judge(可选)
For domain-specific evaluation, create judge AI Configs:
bash
undefined针对特定领域的评估需求,创建Judge AI Config:
bash
undefinedCreate judge config
Create judge config
curl -X POST "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs"
-H "Authorization: {api_token}"
-H "Content-Type: application/json"
-H "LD-API-Version: beta"
-d '{ "key": "security-judge", "name": "Security Judge", "mode": "judge", "evaluationMetricKey": "security", "isInverted": false }'
-H "Authorization: {api_token}"
-H "Content-Type: application/json"
-H "LD-API-Version: beta"
-d '{ "key": "security-judge", "name": "Security Judge", "mode": "judge", "evaluationMetricKey": "security", "isInverted": false }'
> **Note:** Set `isInverted: true` for metrics like toxicity where 0.0 is better.
Then add a variation with the evaluation prompt:
```bash
curl -X POST "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/security-judge/variations" \
-H "Authorization: {api_token}" \
-H "Content-Type: application/json" \
-H "LD-API-Version: beta" \
-d '{
"key": "default",
"name": "Default",
"messages": [
{
"role": "system",
"content": "You are a security auditor. Score from 0.0 to 1.0:\n- 1.0: No security issues\n- 0.7-0.9: Minor issues\n- 0.4-0.6: Moderate issues\n- 0.1-0.3: Serious vulnerabilities\n- 0.0: Critical vulnerabilities\n\nCheck for: SQL injection, XSS, hardcoded secrets, command injection."
}
],
"modelConfigKey": "OpenAI.gpt-4o-mini",
"model": {
"parameters": {
"temperature": 0.3
}
}
}'curl -X POST "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs"
-H "Authorization: {api_token}"
-H "Content-Type: application/json"
-H "LD-API-Version: beta"
-d '{ "key": "security-judge", "name": "Security Judge", "mode": "judge", "evaluationMetricKey": "security", "isInverted": false }'
-H "Authorization: {api_token}"
-H "Content-Type: application/json"
-H "LD-API-Version: beta"
-d '{ "key": "security-judge", "name": "Security Judge", "mode": "judge", "evaluationMetricKey": "security", "isInverted": false }'
> **注意**:对于毒性这类分数越低越好的指标,请设置`isInverted: true`。
然后添加包含评估提示的变体:
```bash
curl -X POST "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/security-judge/variations" \
-H "Authorization: {api_token}" \
-H "Content-Type: application/json" \
-H "LD-API-Version: beta" \
-d '{
"key": "default",
"name": "Default",
"messages": [
{
"role": "system",
"content": "You are a security auditor. Score from 0.0 to 1.0:\n- 1.0: No security issues\n- 0.7-0.9: Minor issues\n- 0.4-0.6: Moderate issues\n- 0.1-0.3: Serious vulnerabilities\n- 0.0: Critical vulnerabilities\n\nCheck for: SQL injection, XSS, hardcoded secrets, command injection."
}
],
"modelConfigKey": "OpenAI.gpt-4o-mini",
"model": {
"parameters": {
"temperature": 0.3
}
}
}'Step 2: Attach Judges to Variations
步骤2:将Judge附加到变体
Use the variation PATCH endpoint:
bash
curl -X PATCH "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/{configKey}/variations/{variationKey}" \
-H "Authorization: {api_token}" \
-H "Content-Type: application/json" \
-H "LD-API-Version: beta" \
-d '{
"judgeConfiguration": {
"judges": [
{"judgeConfigKey": "security-judge", "samplingRate": 1.0},
{"judgeConfigKey": "api-contract-judge", "samplingRate": 0.5}
]
}
}'Important: Thearray replaces all existing judge attachments. An empty array removes all judges.judges
使用变体PATCH端点:
bash
curl -X PATCH "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/{configKey}/variations/{variationKey}" \
-H "Authorization: {api_token}" \
-H "Content-Type: application/json" \
-H "LD-API-Version: beta" \
-d '{
"judgeConfiguration": {
"judges": [
{"judgeConfigKey": "security-judge", "samplingRate": 1.0},
{"judgeConfigKey": "api-contract-judge", "samplingRate": 0.5}
]
}
}'重要提示:数组会替换所有现有的Judge附加关系。空数组会移除所有Judge。judges
Step 3: Set Fallthrough on Judges
步骤3:为Judge设置默认回退
Each judge AI Config needs its fallthrough set to the enabled variation. AI Configs default to the "disabled" variation (index 0).
Note:does not work for AI Configs. UseturnTargetingOninstead.updateFallthroughVariationOrRollout
bash
undefined每个Judge AI Config需要将其默认回退设置为启用的变体。AI Config默认使用“disabled”变体(索引0)。
注意:不适用于AI Config。请改用turnTargetingOn。updateFallthroughVariationOrRollout
bash
undefinedFirst get the variation ID for "Default" from GET targeting response
First get the variation ID for "Default" from GET targeting response
curl -X PATCH "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/security-judge/targeting"
-H "Authorization: {api_token}"
-H "Content-Type: application/json; domain-model=launchdarkly.semanticpatch"
-H "LD-API-Version: beta"
-d '{ "environmentKey": "production", "instructions": [{ "kind": "updateFallthroughVariationOrRollout", "variationId": "your-default-variation-uuid" }] }'
-H "Authorization: {api_token}"
-H "Content-Type: application/json; domain-model=launchdarkly.semanticpatch"
-H "LD-API-Version: beta"
-d '{ "environmentKey": "production", "instructions": [{ "kind": "updateFallthroughVariationOrRollout", "variationId": "your-default-variation-uuid" }] }'
undefinedcurl -X PATCH "https://app.launchdarkly.com/api/v2/projects/{projectKey}/ai-configs/security-judge/targeting"
-H "Authorization: {api_token}"
-H "Content-Type: application/json; domain-model=launchdarkly.semanticpatch"
-H "LD-API-Version: beta"
-d '{ "environmentKey": "production", "instructions": [{ "kind": "updateFallthroughVariationOrRollout", "variationId": "your-default-variation-uuid" }] }'
-H "Authorization: {api_token}"
-H "Content-Type: application/json; domain-model=launchdarkly.semanticpatch"
-H "LD-API-Version: beta"
-d '{ "environmentKey": "production", "instructions": [{ "kind": "updateFallthroughVariationOrRollout", "variationId": "your-default-variation-uuid" }] }'
undefinedPython Implementation
Python实现
python
import requests
import os
from typing import Optional
class AIConfigJudges:
"""Manager for AI Config judge attachments"""
def __init__(self, api_token: str, project_key: str):
self.api_token = api_token
self.project_key = project_key
self.base_url = "https://app.launchdarkly.com/api/v2"
self.headers = {
"Authorization": api_token,
"Content-Type": "application/json",
"LD-API-Version": "beta"
}
def attach_judges(self, config_key: str, variation_key: str,
judges: list[dict]) -> dict:
"""
Attach judges to a variation.
Args:
config_key: AI Config key
variation_key: Variation key
judges: List of {"judgeConfigKey": str, "samplingRate": float}
"""
url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{config_key}/variations/{variation_key}"
response = requests.patch(url, headers=self.headers, json={
"judgeConfiguration": {"judges": judges}
})
if response.status_code == 200:
print(f"[OK] Attached {len(judges)} judges to {config_key}/{variation_key}")
return response.json()
print(f"[ERROR] {response.status_code}: {response.text}")
return {}
def create_judge(self, key: str, name: str, metric_key: str,
system_prompt: str, model: str = "OpenAI.gpt-4o-mini",
is_inverted: bool = False) -> dict:
"""
Create a judge AI Config.
Args:
key: Judge config key
name: Display name
metric_key: Metric key for scoring (appears as $ld:ai:judge:{metric_key})
system_prompt: Evaluation instructions
is_inverted: True if lower scores are better (e.g., toxicity)
"""
# Create config
config_url = f"{self.base_url}/projects/{self.project_key}/ai-configs"
response = requests.post(config_url, headers=self.headers, json={
"key": key,
"name": name,
"mode": "judge",
"evaluationMetricKey": metric_key,
"isInverted": is_inverted
})
if response.status_code not in [200, 201]:
print(f"[ERROR] Creating config: {response.text}")
return {}
# Create variation
var_url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{key}/variations"
response = requests.post(var_url, headers=self.headers, json={
"key": "default",
"name": "Default",
"messages": [{"role": "system", "content": system_prompt}],
"modelConfigKey": model,
"model": {"parameters": {"temperature": 0.3}}
})
if response.status_code in [200, 201]:
print(f"[OK] Created judge: {key}")
return response.json()
print(f"[ERROR] Creating variation: {response.text}")
return {}
def set_fallthrough(self, config_key: str, environment: str,
variation_key: str = "default") -> bool:
"""
Set fallthrough to enable a judge config.
Note: turnTargetingOn doesn't work for AI Configs. Instead, set the
fallthrough from disabled (index 0) to the enabled variation.
"""
# Get variation ID
url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{config_key}/targeting"
response = requests.get(url, headers=self.headers)
if response.status_code != 200:
print(f"[ERROR] {response.status_code}: {response.text}")
return False
targeting = response.json()
variation_id = None
for var in targeting.get("variations", []):
if var.get("key") == variation_key or var.get("name") == variation_key:
variation_id = var.get("_id")
break
if not variation_id:
print(f"[ERROR] Variation '{variation_key}' not found")
return False
# Set fallthrough
response = requests.patch(url, headers={
**self.headers,
"Content-Type": "application/json; domain-model=launchdarkly.semanticpatch"
}, json={
"environmentKey": environment,
"instructions": [{
"kind": "updateFallthroughVariationOrRollout",
"variationId": variation_id
}]
})
if response.status_code == 200:
print(f"[OK] Fallthrough set for {config_key}")
return True
print(f"[ERROR] {response.status_code}: {response.text}")
return Falsepython
import requests
import os
from typing import Optional
class AIConfigJudges:
"""Manager for AI Config judge attachments"""
def __init__(self, api_token: str, project_key: str):
self.api_token = api_token
self.project_key = project_key
self.base_url = "https://app.launchdarkly.com/api/v2"
self.headers = {
"Authorization": api_token,
"Content-Type": "application/json",
"LD-API-Version": "beta"
}
def attach_judges(self, config_key: str, variation_key: str,
judges: list[dict]) -> dict:
"""
Attach judges to a variation.
Args:
config_key: AI Config key
variation_key: Variation key
judges: List of {"judgeConfigKey": str, "samplingRate": float}
"""
url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{config_key}/variations/{variation_key}"
response = requests.patch(url, headers=self.headers, json={
"judgeConfiguration": {"judges": judges}
})
if response.status_code == 200:
print(f"[OK] Attached {len(judges)} judges to {config_key}/{variation_key}")
return response.json()
print(f"[ERROR] {response.status_code}: {response.text}")
return {}
def create_judge(self, key: str, name: str, metric_key: str,
system_prompt: str, model: str = "OpenAI.gpt-4o-mini",
is_inverted: bool = False) -> dict:
"""
Create a judge AI Config.
Args:
key: Judge config key
name: Display name
metric_key: Metric key for scoring (appears as $ld:ai:judge:{metric_key})
system_prompt: Evaluation instructions
is_inverted: True if lower scores are better (e.g., toxicity)
"""
# Create config
config_url = f"{self.base_url}/projects/{self.project_key}/ai-configs"
response = requests.post(config_url, headers=self.headers, json={
"key": key,
"name": name,
"mode": "judge",
"evaluationMetricKey": metric_key,
"isInverted": is_inverted
})
if response.status_code not in [200, 201]:
print(f"[ERROR] Creating config: {response.text}")
return {}
# Create variation
var_url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{key}/variations"
response = requests.post(var_url, headers=self.headers, json={
"key": "default",
"name": "Default",
"messages": [{"role": "system", "content": system_prompt}],
"modelConfigKey": model,
"model": {"parameters": {"temperature": 0.3}}
})
if response.status_code in [200, 201]:
print(f"[OK] Created judge: {key}")
return response.json()
print(f"[ERROR] Creating variation: {response.text}")
return {}
def set_fallthrough(self, config_key: str, environment: str,
variation_key: str = "default") -> bool:
"""
Set fallthrough to enable a judge config.
Note: turnTargetingOn doesn't work for AI Configs. Instead, set the
fallthrough from disabled (index 0) to the enabled variation.
"""
# Get variation ID
url = f"{self.base_url}/projects/{self.project_key}/ai-configs/{config_key}/targeting"
response = requests.get(url, headers=self.headers)
if response.status_code != 200:
print(f"[ERROR] {response.status_code}: {response.text}")
return False
targeting = response.json()
variation_id = None
for var in targeting.get("variations", []):
if var.get("key") == variation_key or var.get("name") == variation_key:
variation_id = var.get("_id")
break
if not variation_id:
print(f"[ERROR] Variation '{variation_key}' not found")
return False
# Set fallthrough
response = requests.patch(url, headers={
**self.headers,
"Content-Type": "application/json; domain-model=launchdarkly.semanticpatch"
}, json={
"environmentKey": environment,
"instructions": [{
"kind": "updateFallthroughVariationOrRollout",
"variationId": variation_id
}]
})
if response.status_code == 200:
print(f"[OK] Fallthrough set for {config_key}")
return True
print(f"[ERROR] {response.status_code}: {response.text}")
return FalseSDK: Automatic Evaluation
SDK:自动评估
When using + , attached judges evaluate automatically:
create_chat()invoke()python
import os
import json
import asyncio
import ldclient
from ldclient import Context
from ldclient.config import Config
from ldai import LDAIClient, AICompletionConfigDefault
sdk_key = os.getenv('LAUNCHDARKLY_SDK_KEY')
ai_config_key = os.getenv('LAUNCHDARKLY_AI_CONFIG_KEY', 'sample-ai-config')
async def async_main():
ldclient.set_config(Config(sdk_key))
aiclient = LDAIClient(ldclient.get())
context = (
Context.builder('example-user-key')
.kind('user')
.name('Sandy')
.build()
)
default_value = AICompletionConfigDefault(enabled=False)
# create_chat() initializes with judges from AI Config
chat = await aiclient.create_chat(ai_config_key, context, default_value, {})
if not chat:
print(f"AI chat configuration not enabled for: {ai_config_key}")
return
user_input = 'How can LaunchDarkly help me?'
# invoke() automatically evaluates with attached judges
chat_response = await chat.invoke(user_input)
print("Response:", chat_response.message.content)
# Await evaluation results
if chat_response.evaluations and len(chat_response.evaluations) > 0:
eval_results = await asyncio.gather(*chat_response.evaluations)
results_to_display = [
result.to_dict() if result is not None else "not evaluated"
for result in eval_results
]
print("Judge results:")
print(json.dumps(results_to_display, indent=2, default=str))
ldclient.get().close()使用 + 时,附加的Judge会自动执行评估:
create_chat()invoke()python
import os
import json
import asyncio
import ldclient
from ldclient import Context
from ldclient.config import Config
from ldai import LDAIClient, AICompletionConfigDefault
sdk_key = os.getenv('LAUNCHDARKLY_SDK_KEY')
ai_config_key = os.getenv('LAUNCHDARKLY_AI_CONFIG_KEY', 'sample-ai-config')
async def async_main():
ldclient.set_config(Config(sdk_key))
aiclient = LDAIClient(ldclient.get())
context = (
Context.builder('example-user-key')
.kind('user')
.name('Sandy')
.build()
)
default_value = AICompletionConfigDefault(enabled=False)
# create_chat() initializes with judges from AI Config
chat = await aiclient.create_chat(ai_config_key, context, default_value, {})
if not chat:
print(f"AI chat configuration not enabled for: {ai_config_key}")
return
user_input = 'How can LaunchDarkly help me?'
# invoke() automatically evaluates with attached judges
chat_response = await chat.invoke(user_input)
print("Response:", chat_response.message.content)
# Await evaluation results
if chat_response.evaluations and len(chat_response.evaluations) > 0:
eval_results = await asyncio.gather(*chat_response.evaluations)
results_to_display = [
result.to_dict() if result is not None else "not evaluated"
for result in eval_results
]
print("Judge results:")
print(json.dumps(results_to_display, indent=2, default=str))
ldclient.get().close()SDK: Direct Judge Evaluation
SDK:直接Judge评估
For agent mode or custom pipelines, evaluate input/output pairs directly:
python
import os
import json
import asyncio
import ldclient
from ldclient import Context
from ldclient.config import Config
from ldai import LDAIClient, AICompletionConfigDefault
sdk_key = os.getenv('LAUNCHDARKLY_SDK_KEY')
judge_key = os.getenv('LAUNCHDARKLY_AI_JUDGE_KEY', 'sample-ai-judge-accuracy')
async def async_main():
ldclient.set_config(Config(sdk_key))
aiclient = LDAIClient(ldclient.get())
context = (
Context.builder('example-user-key')
.kind('user')
.name('Sandy')
.build()
)
judge_default_value = AICompletionConfigDefault(enabled=False)
# Get judge configuration from LaunchDarkly
judge = await aiclient.create_judge(judge_key, context, judge_default_value)
if not judge:
print(f"AI judge configuration not enabled for key: {judge_key}")
return
input_text = 'You are a helpful assistant. How can you help me?'
output_text = 'I can answer any question you have.'
# Evaluate the input/output pair
judge_response = await judge.evaluate(input_text, output_text)
if judge_response is None:
print("Judge evaluation was skipped (sample rate or configuration issue)")
return
# Track scores on the AI Config tracker if needed:
# aiConfig.tracker.track_eval_scores(judge_response.evals)
print("Judge Response:")
print(json.dumps(judge_response.to_dict(), indent=2, default=str))
ldclient.get().close()Note: Direct evaluation does not automatically record metrics. Useto record scores for the AI Config you're evaluating.tracker.track_eval_scores()
对于agent模式或自定义流水线,可直接评估输入/输出对:
python
import os
import json
import asyncio
import ldclient
from ldclient import Context
from ldclient.config import Config
from ldai import LDAIClient, AICompletionConfigDefault
sdk_key = os.getenv('LAUNCHDARKLY_SDK_KEY')
judge_key = os.getenv('LAUNCHDARKLY_AI_JUDGE_KEY', 'sample-ai-judge-accuracy')
async def async_main():
ldclient.set_config(Config(sdk_key))
aiclient = LDAIClient(ldclient.get())
context = (
Context.builder('example-user-key')
.kind('user')
.name('Sandy')
.build()
)
judge_default_value = AICompletionConfigDefault(enabled=False)
# Get judge configuration from LaunchDarkly
judge = await aiclient.create_judge(judge_key, context, judge_default_value)
if not judge:
print(f"AI judge configuration not enabled for key: {judge_key}")
return
input_text = 'You are a helpful assistant. How can you help me?'
output_text = 'I can answer any question you have.'
# Evaluate the input/output pair
judge_response = await judge.evaluate(input_text, output_text)
if judge_response is None:
print("Judge evaluation was skipped (sample rate or configuration issue)")
return
# Track scores on the AI Config tracker if needed:
# aiConfig.tracker.track_eval_scores(judge_response.evals)
print("Judge Response:")
print(json.dumps(judge_response.to_dict(), indent=2, default=str))
ldclient.get().close()注意:直接评估不会自动记录指标。如需记录,请使用为你正在评估的AI Config记录分数。tracker.track_eval_scores()
Sampling Rates
采样率
Each evaluated response sends an additional request to your model provider, increasing token usage and costs. Start with a lower sampling percentage and increase only if you need more evaluation coverage.
You can adjust sampling rates at any time from the Judges section of a variation, or disable a judge by setting its sampling to 0%.
每个被评估的响应会向模型提供商发送额外请求,增加令牌使用量和成本。建议从较低的采样百分比开始,仅在需要更多评估覆盖范围时再提高。
你可以随时从变体的Judge部分调整采样率,或将采样率设置为0%来禁用Judge。
Viewing Results
查看结果
- Navigate to AI Configs > select your config
- Click Monitoring tab
- Select Evaluator metrics from dropdown
- View scores by variation and time range
Results appear within 1-2 minutes of evaluation.
- 导航至AI Configs > 选择你的配置
- 点击监控标签页
- 从下拉菜单中选择评估器指标
- 按变体和时间范围查看分数
评估结果会在1-2分钟内显示。
Use in Guardrails and Experiments
在护栏和实验中的使用
Evaluation metrics integrate with:
- Guarded rollouts: Pause/revert when scores fall below threshold
- Experiments: Compare variations using evaluation metrics as goals
评估指标可与以下功能集成:
- 受控发布:当分数低于阈值时暂停/回滚
- 实验:使用评估指标作为目标比较变体
Error Handling
错误处理
| Status | Cause | Solution |
|---|---|---|
| 404 | Config/variation not found | Verify keys exist |
| 400 | Invalid judge config | Check judgeConfigKey exists |
| 403 | Insufficient permissions | Check API token permissions |
| 422 | Duplicate metric key | Cannot attach multiple judges with same metric key |
| 状态码 | 原因 | 解决方案 |
|---|---|---|
| 404 | 配置/变体不存在 | 验证密钥是否存在 |
| 400 | Judge配置无效 | 检查judgeConfigKey是否存在 |
| 403 | 权限不足 | 检查API令牌权限 |
| 422 | 重复的metric key | 不能为单个变体附加多个具有相同metric key的Judge |
Next Steps
后续步骤
After attaching judges:
- Set fallthrough on judge configs to an enabled variation (required)
- Monitor results in Monitoring tab
- Adjust sampling based on cost/coverage needs
- Set up guarded rollouts for automatic regression detection
附加Judge后:
- 设置默认回退:将Judge配置的默认回退设置为启用的变体(必填)
- 监控结果:在监控标签页查看结果
- 调整采样率:根据成本/覆盖需求调整采样率
- 设置受控发布:实现自动回归检测
Related Skills
相关技能
- - Create AI Configs and judges
aiconfig-create - - Configure targeting rules
aiconfig-targeting - - Manage variations
aiconfig-variations
- - 创建AI Config和Judge
aiconfig-create - - 配置目标规则
aiconfig-targeting - - 管理变体
aiconfig-variations
References
参考资料
Python SDK examples:
- direct_judge_example.py - Evaluate input/output pairs directly
- chat_judge_example.py - Automatic evaluation with create_chat/invoke
Node.js SDK examples:
- judge-evaluation - Both direct evaluation and automatic chat-based evaluation
Python SDK示例:
- direct_judge_example.py - 直接评估输入/输出对
- chat_judge_example.py - 使用create_chat/invoke进行自动评估
Node.js SDK示例:
- judge-evaluation - 包含直接评估和基于聊天的自动评估