Reason about failure modes, then propose signals across three layers — propose all that apply:
1. Explicit signals (highest value — direct user expression)
Look at the UX from 0b. Find every place the user interacts with AI-generated content.
- Feedback already exists (thumbs up/down, rating, feedback text)? Wire to it — don't replace it.
- No feedback mechanism? Suggest adding VoteFeedback and explain what it unlocks for RCA.
- Edit tracking: if the user can modify AI-generated content, tracking those edits is highly valuable (accepted but
corrected = "close but wrong"). Implement appropriately for the stack.
2. Coded signals (implicit behavioral events in the app)
Find events that imply the AI got it right or wrong — dismiss, accept, retry, undo, escalate, rephrase, skip. Wire
to the exact locations. When proposing a signal, verify with the developer that the event is specific
to AI content (not a general UI action).
3. Synthetic signals (platform-run, no app code)
Based on failure modes from 0b, propose LLM-as-judge synthetic signal evaluators (semantic/quality) and heuristic
synthetic signal evaluators (structural/metric). Delivered LATER (after user approval) via deeplink — developer clicks
once to activate.
Ground every synthetic signal evaluator in observed behavior. Only propose synthetic signal evaluators for things
the agent actually does — don't invent features. If you're unsure whether the agent produces a certain output (e.g.
citations, confidence scores, structured data), ask the developer before proposing a synthetic signal evaluator that
depends on it. For
type: the check must be fully deterministic from the raw output (e.g. response length, JSON
validity, presence of a known token). If you're reaching for any natural language understanding, it's
, not
.
STOP — this is a REQUIRED interactive checkpoint. Use
with
— two questions:
- One for explicit + coded signals (options = each proposed signal)
- One for synthetic evaluators (options = each proposed evaluator)
Ask if any coded signals need steering (e.g., "does this event apply only to AI content?") and wait for their response.
You don't need to implement synthetics on your own — let Kelet do that for you. After the developer has selected
which synthetic evaluators they want, generate the deeplink scoped to exactly those evaluators and present it as a bold
standalone action item:
Action required → click this link to activate your synthetic evaluators:
https://console.kelet.ai/synthetics/setup?deeplink=<encoded>
This will generate evaluators for: [list selected names]. Click "Activate All" once you've reviewed them.
Generate the deeplink like this — include only the evaluators the developer selected:
python
python3 - c
"
import base64, json
payload = {
'use_case': '<agent use case>',
'ideas': [
{'name': '<name>', 'evaluator_type': 'llm', 'description': '<description>'},
{'name': '<name>', 'evaluator_type': 'code', 'description': '<description>'},
]
}
encoded = base64.urlsafe_b64encode(json.dumps(payload, separators=(',', ':')).encode()).rstrip(b'=').decode()
print(f'https://console.kelet.ai/synthetics/setup?deeplink={encoded}')
"
ONLY create and send the link AFTER the developer has selected which evaluators they want. Do NOT generate or present
the link before they make their selection — that would be confusing and overwhelming. The link should reflect their
choices, not all possible ideas!
For each idea, decide the type:
is this check deterministic/measurable? →
.
Is it semantic/qualitative?
→
. Add
only when you need to steer the evaluator toward something specific.
After presenting the link, use
to confirm the developer has clicked it and activated the evaluators
before proceeding to Phase 0d. Do NOT proceed until confirmed.
Only write
signal code if the developer explicitly asks AND the platform cannot implement it (explain
why + ask to confirm).
See references/signals.md for signal kinds, sources, and when to use each.
梳理故障模式,然后从三个层面提出适用的信号方案:
1. 显式信号(价值最高——用户直接表达的反馈)
参考0b阶段梳理的UX,找到用户和AI生成内容交互的所有位置。
- 已经有反馈功能(点赞/点踩、评分、反馈文本)?将对接现有功能即可,不要替换。
- 没有反馈机制? 建议添加VoteFeedback,并说明它能为RCA带来的价值。
- 编辑追踪:如果用户可以修改AI生成的内容,追踪这些编辑的价值非常高(被接受但经过修改 = 「接近但不正确」)。根据技术栈选择合适的实现方式。
2. 编码信号(应用中的隐式行为事件)
找到能反映AI输出正确或错误的事件——驳回、接受、重试、撤销、升级、重新提问、跳过。将
对接这些事件的准确位置。提出信号方案时,请和开发者确认事件是AI内容专属的(不是通用UI操作)。
3. 合成信号(平台运行,无需修改应用代码)
基于0b阶段梳理的故障模式,提出LLM裁判类合成信号评估器(语义/质量维度)和启发式合成信号评估器(结构/指标维度)。后续(用户批准后)通过深度链接交付——开发者点击一次即可激活。
所有合成信号评估器都要基于实际观测到的行为。 只为智能体实际会产生的输出提出合成信号评估器,不要凭空捏造功能。如果你不确定智能体是否会产生某种输出(例如引用、置信度分数、结构化数据),在提出依赖该输出的合成信号评估器前先询问开发者。对于
类型:检查必须是可以从原始输出完全确定的(例如响应长度、JSON合法性、已知token是否存在)。如果需要用到自然语言理解能力,那就是
类型,不是
类型。
暂停——这是必填的交互检查点。 使用带
参数的
,问两个问题:
- 显式+编码信号确认(选项 = 每个提出的信号)
- 合成评估器确认(选项 = 每个提出的评估器)
询问是否有编码信号需要调整(例如「这个事件是否仅适用于AI内容?」),等待开发者回复。
你不需要自己实现合成信号——Kelet会帮你处理。 开发者选择完想要的合成评估器后,生成仅包含所选评估器的深度链接,作为加粗的独立待办项展示:
需要操作 → 点击该链接激活你的合成评估器:
https://console.kelet.ai/synthetics/setup?deeplink=<encoded>
这将为以下内容生成评估器:[列出选中的评估器名称]。确认后点击「全部激活」即可。
按照以下方式生成深度链接——仅包含开发者选中的评估器:
python
python3 - c
"
import base64, json
payload = {
'use_case': '<agent use case>',
'ideas': [
{'name': '<name>', 'evaluator_type': 'llm', 'description': '<description>'},
{'name': '<name>', 'evaluator_type': 'code', 'description': '<description>'},
]
}
encoded = base64.urlsafe_b64encode(json.dumps(payload, separators=(',', ':')).encode()).rstrip(b'=').decode()
print(f'https://console.kelet.ai/synthetics/setup?deeplink={encoded}')
"
仅在开发者选择完想要的评估器后再创建并发送链接。 在他们做出选择前不要生成或展示链接——这会造成混淆和信息过载。链接应该反映他们的选择,而不是所有可能的方案!
对每个方案确定类型:
这个检查是可确定/可量化的吗? →
。
是语义/定性的吗? →
。仅当你需要引导评估器关注特定内容时才添加
字段。
展示链接后,使用
确认开发者已经点击链接并激活了评估器,再进入阶段0d。没有确认前不要继续。
仅当开发者明确要求,且平台无法实现该信号时,才需要编写
的信号代码(请先说明原因并请求确认)。
参考references/signals.md了解信号类型、来源以及适用场景。