ai-native-product
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI-Native Product Development
AI原生产品开发
"AI products aren't deterministic. They require continuous calibration, not just A/B tests."
This skill covers AI-Native Product Development — the overlay that modifies discovery, architecture, and delivery when AI is at the core. It addresses the unique challenges of building products where AI agents perform tasks autonomously.
Part of: Modern Product Operating Model — a collection of composable product skills.
Related skills: , , , ,
product-strategyproduct-discoveryproduct-architectureproduct-deliveryproduct-leadership"AI产品并非确定性的,它们需要持续校准,而不仅仅是A/B测试。"
本技能涵盖AI原生产品开发——当AI处于核心地位时,用于调整产品探索、架构设计和交付流程的方法论。它解决了构建AI Agent自主执行任务的产品时所面临的独特挑战。
所属合集:Modern Product Operating Model —— 一套可组合的产品技能合集。
相关技能:, , , ,
product-strategyproduct-discoveryproduct-architectureproduct-deliveryproduct-leadershipWhen to Use This Skill
何时使用该技能
Use this skill when:
- Building AI agents that act on behalf of users
- Adding LLM-powered features to existing products
- Designing human-AI interaction patterns
- Deciding how much autonomy to give AI
- Setting up eval strategies and calibration loops
- Managing the "agency-control tradeoff"
Not needed for: Traditional software products, ML models used only for backend optimization (no user-facing autonomy)
在以下场景中使用本技能:
- 构建代表用户执行操作的AI Agent
- 为现有产品添加LLM驱动的功能
- 设计人机AI交互模式
- 确定赋予AI的自主权限程度
- 制定评估策略和校准循环
- 管理“代理控制权权衡”
无需使用的场景:传统软件产品、仅用于后端优化的ML模型(无用户侧自主功能)
What Makes AI Products Different
AI产品的独特之处
Traditional Software vs. AI Products
传统软件 vs AI原生产品
| Dimension | Traditional Software | AI-Native Products |
|---|---|---|
| Behavior | Deterministic | Probabilistic |
| Testing | Unit tests, QA | Evals, calibration |
| Correctness | Binary (works or doesn't) | Spectrum (good enough?) |
| User role | Operator | Delegator + Reviewer |
| Failure mode | Error messages | Plausible but wrong outputs |
| Iteration | Ship → Measure → Iterate | Ship → Observe → Calibrate |
| Trust building | Feature completeness | Demonstrated reliability |
| 维度 | 传统软件 | AI原生产品 |
|---|---|---|
| 行为特性 | 确定性 | 概率性 |
| 测试方式 | 单元测试、QA测试 | 评估、校准 |
| 正确性标准 | 二元化(可用或不可用) | 梯度化(是否足够好?) |
| 用户角色 | 操作者 | 委托者 + 审核者 |
| 故障模式 | 错误提示 | 看似合理但实际错误的输出 |
| 迭代方式 | 发布 → 衡量 → 迭代 | 发布 → 观察 → 校准 |
| 信任建立 | 功能完整性 | 可验证的可靠性 |
The Core Challenge
核心挑战
AI products must navigate a fundamental tension:
More autonomy = More value (fewer steps, faster outcomes)
More autonomy = More risk (errors affect real work)
More autonomy = More risk (errors affect real work)
This is the Agency-Control Tradeoff.
AI产品必须应对一个根本性的矛盾:
更高自主性 = 更高价值(步骤更少、结果更快)
更高自主性 = 更高风险(错误会影响实际工作)
更高自主性 = 更高风险(错误会影响实际工作)
这就是代理控制权权衡。
Framework: The CCCD Loop
框架:CCCD循环
Credit: Aishwarya Goel & Kiriti Gavini
AI products require a Continuous Calibration and Confidence Development (CCCD) loop:
┌─────────────────────────────────────────────────────────────────┐
│ CCCD LOOP │
│ │
│ CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE │
│ ↓ ↓ ↓ ↓ │
│ Eval and Build user Observe AI Update evals │
│ adjust AI trust over interactions and models │
│ behavior time at scale │
└─────────────────────────────────────────────────────────────────┘CCCD Components:
| Component | Purpose | Activities |
|---|---|---|
| Calibrate | Tune AI behavior to match user expectations | Run evals, adjust prompts/models, set guardrails |
| Confidence | Build appropriate user trust | Show AI reasoning, enable verification, demonstrate reliability |
| Continuous Discovery | Observe AI-user interactions at scale | Log interactions, identify failure patterns, surface edge cases |
| → Back to Calibrate | Update based on learnings | Improve evals, retrain, adjust prompts |
致谢:Aishwarya Goel & Kiriti Gavini
AI产品需要持续校准与信任构建(CCCD)循环:
┌─────────────────────────────────────────────────────────────────┐
│ CCCD LOOP │
│ │
│ CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE │
│ ↓ ↓ ↓ ↓ │
│ Eval and Build user Observe AI Update evals │
│ adjust AI trust over interactions and models │
│ behavior time at scale │
└─────────────────────────────────────────────────────────────────┘CCCD组件说明:
| 组件 | 目标 | 相关活动 |
|---|---|---|
| 校准(Calibrate) | 调整AI行为以匹配用户预期 | 运行评估、调整提示词/模型、设置防护机制 |
| 信任构建(Confidence) | 建立用户对AI的合理信任 | 展示AI推理过程、支持验证、证明可靠性 |
| 持续探索(Continuous Discovery) | 规模化观察AI与用户的交互 | 记录交互、识别故障模式、发现边缘案例 |
| → 回到校准 | 根据学习成果更新优化 | 改进评估、重新训练模型、调整提示词 |
The Agency-Control Progression
代理控制权进阶路径
Five Levels of AI Agency
AI代理的五个层级
| Level | Description | AI Does | User Does | Example |
|---|---|---|---|---|
| 1. Assist | AI suggests, user executes | Generates options | Chooses and acts | Autocomplete, suggestions |
| 2. Recommend | AI ranks, user approves | Analyzes and recommends | Reviews and approves | "AI recommends these 3 actions" |
| 3. Execute with confirmation | AI acts after approval | Prepares action | Confirms before execution | "Send this email?" → Yes/No |
| 4. Execute with notification | AI acts, notifies after | Acts autonomously | Reviews outcomes | "I scheduled the meeting and sent invites" |
| 5. Fully autonomous | AI acts without notification | Handles end-to-end | Sets goals, reviews exceptions | AI handles routine tasks silently |
| 层级 | 描述 | AI的工作 | 用户的工作 | 示例 |
|---|---|---|---|---|
| 1. 辅助(Assist) | AI提供建议,用户执行操作 | 生成选项 | 选择并执行 | 自动补全、智能建议 |
| 2. 推荐(Recommend) | AI进行排序,用户审批 | 分析并给出推荐 | 审核并批准 | "AI推荐以下3个操作" |
| 3. 确认后执行(Execute with confirmation) | AI获得批准后执行操作 | 准备操作内容 | 执行前确认 | "发送这封邮件?" → 是/否 |
| 4. 执行后通知(Execute with notification) | AI自主执行,事后通知用户 | 自主执行操作 | 审核结果 | "我已安排会议并发送了邀请" |
| 5. 完全自主(Fully autonomous) | AI无需通知即可执行 | 处理端到端任务 | 设置目标、审核异常情况 | AI静默处理常规任务 |
Progression Strategy
进阶策略
Start lower, earn higher:
Level 1 → Build trust → Level 2 → Demonstrate reliability → Level 3 → ...Graduation Criteria:
| From Level | To Level | Requires |
|---|---|---|
| 1 → 2 | Assist → Recommend | User accepts suggestions > 70% |
| 2 → 3 | Recommend → Execute with confirm | User approves recommendations > 80% |
| 3 → 4 | Execute+confirm → Execute+notify | User confirms without edit > 90% |
| 4 → 5 | Execute+notify → Autonomous | User overrides < 5%, high-stakes scenarios excluded |
Never fully autonomous for:
- Irreversible actions (delete, send, purchase)
- High-stakes decisions (financial, legal, health)
- Novel situations outside training distribution
- Actions affecting third parties
从低层级起步,逐步升级至高层级:
Level 1 → 建立信任 → Level 2 → 证明可靠性 → Level 3 → ...升级标准:
| 从层级 | 到层级 | 所需条件 |
|---|---|---|
| 1 → 2 | 辅助 → 推荐 | 用户接受建议的比例 > 70% |
| 2 → 3 | 推荐 → 确认后执行 | 用户批准推荐的比例 > 80% |
| 3 → 4 | 确认后执行 → 执行后通知 | 用户无修改直接确认的比例 > 90% |
| 4 → 5 | 执行后通知 → 完全自主 | 用户覆写比例 < 5%,排除高风险场景 |
永远不要设置为完全自主的场景:
- 不可逆操作(删除、发送、购买)
- 高风险决策(金融、法律、健康)
- 超出训练范围的全新场景
- 影响第三方的操作
AI-Native Discovery
AI原生产品探索
Standard discovery practices need adaptation for AI products.
标准的产品探索流程需要针对AI产品进行调整。
Modified Discovery Focus
调整后的探索重点
| Standard Discovery | AI-Native Adaptation |
|---|---|
| "What job are you trying to do?" | + "How much do you want to delegate?" |
| "What's your current workflow?" | + "Which steps are you comfortable AI handling?" |
| "What would success look like?" | + "What errors would be unacceptable?" |
| "Show me how you do this today" | + "Show me how you verify AI work today" |
| 标准产品探索 | AI原生产品调整项 |
|---|---|
| "你想要完成什么任务?" | + "你希望将多少工作委托给AI?" |
| "你当前的工作流程是怎样的?" | + "你愿意让AI处理哪些步骤?" |
| "成功的标准是什么?" | + "哪些错误是你无法接受的?" |
| "展示一下你现在是如何做这件事的" | + "展示一下你现在如何验证AI的工作成果?" |
AI-Specific Discovery Questions
AI专属探索问题
Delegation appetite:
- "Which parts of this task feel tedious vs. require your judgment?"
- "If AI made an error here, what would the consequences be?"
- "How would you want to verify AI's work?"
Trust calibration:
- "What would AI need to demonstrate before you'd trust it to [action]?"
- "Have you used AI tools before? What built or broke your trust?"
- "Would you prefer AI to do more but occasionally err, or do less perfectly?"
Failure tolerance:
- "What kinds of errors are annoying vs. damaging?"
- "How quickly do you need to catch and fix AI mistakes?"
- "What's your 'undo' option if AI gets it wrong?"
委托意愿:
- "这项任务中哪些部分让你觉得繁琐,哪些部分需要你的判断?"
- "如果AI在这部分出错,会有什么后果?"
- "你希望如何验证AI的工作成果?"
信任校准:
- "AI需要展示什么,你才会信任它去执行[操作]?"
- "你之前使用过AI工具吗?是什么建立或破坏了你的信任?"
- "你更倾向于AI多做一些但偶尔出错,还是少做一些但不出错?"
故障容忍度:
- "哪些类型的错误只是烦人,哪些会造成损害?"
- "你需要多快发现并修复AI的错误?"
- "如果AI出错了,你的‘撤销’选项是什么?"
Observing AI Interactions
观察AI交互
In addition to interviews, AI discovery includes:
| Method | What to Look For |
|---|---|
| Session recordings | Where do users override AI? Where do they accept blindly? |
| Interaction logs | Patterns in edits, rejections, corrections |
| Feedback analysis | Explicit signals (thumbs down, ratings) |
| Support tickets | AI-related complaints and confusion |
除了用户访谈,AI产品探索还包括以下方法:
| 方法 | 关注要点 |
|---|---|
| 会话录制 | 用户在哪些地方覆写AI?在哪些地方盲目接受? |
| 交互日志 | 编辑、拒绝、修正的模式 |
| 反馈分析 | 明确信号(差评、评分) |
| 支持工单 | 与AI相关的投诉和困惑 |
AI-Native Architecture
AI原生产品架构
Solution Brief Additions
解决方案简报补充内容
For AI features, add to standard solution brief:
AI-SPECIFIC SECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]
FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]
EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]
CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]
CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]对于AI功能,需在标准解决方案简报中添加以下部分:
AI专属部分
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]
FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]
EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]
CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]
CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]AI Bet Categories
AI投注分类
In addition to standard bet categories:
| Category | Description | Example |
|---|---|---|
| Capability expansion | AI can handle new task types | "AI can now summarize documents" |
| Agency graduation | Move to higher autonomy level | "AI sends emails without confirmation" |
| Calibration improvement | Better accuracy/reliability | "Reduce hallucination rate from 5% to 2%" |
| Confidence building | Better user trust | "Show AI reasoning before action" |
| Guardrail strengthening | Prevent harmful outputs | "Add content policy enforcement" |
除了标准投注分类,还需添加以下分类:
| 分类 | 描述 | 示例 |
|---|---|---|
| 能力扩展 | AI可处理新的任务类型 | "AI现在可以总结文档" |
| 代理层级升级 | 提升至更高自主层级 | "AI无需确认即可发送邮件" |
| 校准优化 | 提升准确性/可靠性 | "将幻觉率从5%降至2%" |
| 信任构建 | 增强用户信任 | "执行操作前展示AI推理过程" |
| 防护机制强化 | 防止有害输出 | "添加内容政策强制执行" |
AI-Native Delivery
AI原生产品交付
Eval Strategy (Replaces Traditional Testing)
评估策略(替代传统测试)
Eval Types:
| Eval Type | Purpose | When to Run |
|---|---|---|
| Unit evals | Test specific capabilities | Every code change |
| Behavioral evals | Test end-to-end flows | Daily/weekly |
| Adversarial evals | Test edge cases and attacks | Before major releases |
| Human evals | Test subjective quality | Weekly sample |
| Production evals | Test on real traffic | Continuous |
Eval Metrics:
| Metric | What It Measures | Target |
|---|---|---|
| Task success rate | Does AI complete the intended task? | > 95% |
| Factual accuracy | Is output factually correct? | > 98% |
| Hallucination rate | Does AI make things up? | < 2% |
| Harmful output rate | Does AI produce unsafe content? | < 0.1% |
| User acceptance rate | Do users accept AI output? | > 80% |
| Override rate | How often do users correct AI? | < 15% |
Eval Cadence:
Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)评估类型:
| 评估类型 | 目标 | 运行时机 |
|---|---|---|
| 单元评估 | 测试特定能力 | 每次代码变更后 |
| 行为评估 | 测试端到端流程 | 每日/每周 |
| 对抗性评估 | 测试边缘案例和攻击场景 | 重大发布前 |
| 人工评估 | 测试主观质量 | 每周抽样 |
| 生产环境评估 | 在真实流量中测试 | 持续进行 |
评估指标:
| 指标 | 衡量内容 | 目标值 |
|---|---|---|
| 任务成功率 | AI是否完成预期任务? | > 95% |
| 事实准确率 | 输出内容是否符合事实? | > 98% |
| 幻觉率 | AI是否生成虚假内容? | < 2% |
| 有害输出率 | AI是否生成不安全内容? | < 0.1% |
| 用户接受率 | 用户是否接受AI输出? | > 80% |
| 覆写率 | 用户修正AI的频率? | < 15% |
评估频率:
Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)Staged Rollout for AI Features
AI功能的分阶段发布
AI features require more cautious rollout:
| Stage | Audience | Focus | Duration |
|---|---|---|---|
| Internal | Team | Find obvious failures | 1 week |
| Alpha | 5-10 trusted users | Qualitative feedback on AI behavior | 2 weeks |
| Beta | 5% of users | Quantitative eval metrics | 2-4 weeks |
| Gradual GA | 5% → 25% → 50% → 100% | Monitor at each stage | 4+ weeks |
AI-Specific Rollout Gates:
| Gate | Criteria to Proceed |
|---|---|
| Alpha → Beta | Eval metrics above threshold, no harmful outputs |
| Beta → Gradual GA | User acceptance > 80%, override rate < 15% |
| Each GA increment | Metrics stable, no new failure modes |
AI功能需要更谨慎的发布策略:
| 阶段 | 受众 | 重点 | 持续时间 |
|---|---|---|---|
| 内部测试 | 团队成员 | 发现明显故障 | 1周 |
| Alpha测试 | 5-10位可信用户 | 收集AI行为的定性反馈 | 2周 |
| Beta测试 | 5%的用户 | 量化评估指标 | 2-4周 |
| 逐步全面发布(Gradual GA) | 5% → 25% → 50% → 100% | 每个阶段都进行监控 | 4+周 |
AI专属发布门槛:
| 门槛 | 进阶条件 |
|---|---|
| Alpha → Beta | 评估指标达标,无有害输出 |
| Beta → 逐步全面发布 | 用户接受率 > 80%,覆写率 < 15% |
| 全面发布的每个阶段 | 指标稳定,无新故障模式 |
Calibration Loop
校准循环
Continuous calibration process:
OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
↑ │
└───────────────────────────────────────────┘| Step | Activities | Cadence |
|---|---|---|
| Observe | Monitor production interactions, logs, feedback | Continuous |
| Identify | Surface failure patterns, edge cases, drift | Daily/weekly |
| Calibrate | Adjust prompts, fine-tune, add guardrails | As needed |
| Validate | Run evals on calibrated version | Before deploy |
| Deploy | Ship updates, continue observing | Staged |
Calibration Triggers:
- Eval metrics below threshold
- New failure pattern identified
- User feedback trend (negative)
- Model update available
- New use case discovered
持续校准流程:
OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
↑ │
└───────────────────────────────────────────┘| 步骤 | 活动 | 频率 |
|---|---|---|
| 观察(Observe) | 监控生产环境交互、日志、反馈 | 持续进行 |
| 识别(Identify) | 发现故障模式、边缘案例、模型漂移 | 每日/每周 |
| 校准(Calibrate) | 调整提示词、微调模型、添加防护机制 | 按需进行 |
| 验证(Validate) | 对校准后的版本运行评估 | 部署前 |
| 部署(Deploy) | 发布更新,继续观察 | 分阶段进行 |
校准触发条件:
- 评估指标低于阈值
- 发现新的故障模式
- 用户反馈呈负面趋势
- 有可用的模型更新
- 发现新的使用场景
AI Metrics Hierarchy
AI指标层级
LAGGING
├── User retention (AI users vs. non-AI users)
├── Task completion rate (with AI assist)
└── Revenue from AI features
CORE
├── User acceptance rate
├── Override rate
├── Time-to-completion (with AI)
└── User-reported satisfaction
LEADING
├── Eval metrics (accuracy, hallucination, etc.)
├── Interaction volume
├── Feature discovery rate
└── Feedback sentiment
GUARDRAILS
├── Harmful output rate
├── Latency P95
├── Error rate
└── Cost per interaction滞后指标
├── 用户留存率(AI用户 vs 非AI用户)
├── 任务完成率(借助AI辅助)
└── AI功能带来的收入
核心指标
├── 用户接受率
├── 覆写率
├── 任务完成时间(借助AI)
└── 用户反馈满意度
领先指标
├── 评估指标(准确率、幻觉率等)
├── 交互量
├── 功能发现率
└── 反馈情感倾向
防护指标
├── 有害输出率
├── Latency P95
├── 错误率
└── 每次交互成本AI-Specific Anti-Patterns
AI专属反模式
| Anti-Pattern | Why It Fails | Instead |
|---|---|---|
| Ship and hope | AI behavior drifts without monitoring | Continuous calibration |
| Autonomous by default | Users don't trust, don't adopt | Earn autonomy progressively |
| Black box AI | Users can't verify, won't trust | Show reasoning, enable verification |
| No evals | Quality degrades silently | Comprehensive eval strategy |
| Ignore overrides | Miss calibration signals | Override patterns inform calibration |
| One-size-fits-all agency | Different tasks need different levels | Task-specific agency levels |
| 反模式 | 失败原因 | 正确做法 |
|---|---|---|
| 发布后听天由命 | 无监控情况下AI行为会漂移 | 持续校准 |
| 默认完全自主 | 用户不信任,不愿采用 | 逐步获取自主权限 |
| 黑盒AI | 用户无法验证,不愿信任 | 展示推理过程,支持验证 |
| 无评估机制 | 质量会悄然下降 | 全面的评估策略 |
| 忽略用户覆写 | 错过校准信号 | 基于覆写模式优化校准 |
| 一刀切的自主层级 | 不同任务需要不同层级 | 针对任务设置专属自主层级 |
Templates
模板
This skill includes templates in the directory:
templates/- — Determine appropriate agency level
agency-assessment.md - — Design eval suite for AI feature
eval-strategy.md - — Set up continuous calibration
calibration-plan.md
本技能在目录中包含以下模板:
templates/- —— 确定合适的代理层级
agency-assessment.md - —— 为AI功能设计评估套件
eval-strategy.md - —— 建立持续校准机制
calibration-plan.md
Using This Skill with Claude
结合Claude使用本技能
Ask Claude to:
- Assess agency level: "What agency level should [AI feature] have?"
- Design agency progression: "Create a graduation path from assist to autonomous for [feature]"
- Identify failure modes: "What could go wrong with [AI feature]? How do we mitigate?"
- Design eval strategy: "Design an eval suite for [AI feature]"
- Plan calibration: "Create a calibration plan for [AI feature]"
- Adapt discovery: "What AI-specific questions should I ask in discovery for [use case]?"
- Design confidence building: "How should [AI feature] show its reasoning?"
- Plan AI rollout: "Create a staged rollout plan for [AI feature]"
- Set AI metrics: "What metrics should we track for [AI feature]?"
- Review AI brief: "Critique this solution brief for AI considerations"
可以让Claude完成以下任务:
- 评估代理层级:"[AI功能]应该设置什么代理层级?"
- 设计代理升级路径:"为[功能]创建从辅助到完全自主的升级路径"
- 识别故障模式:"[AI功能]可能会出现什么问题?如何缓解?"
- 设计评估策略:"为[AI功能]设计评估套件"
- 规划校准机制:"为[AI功能]创建校准计划"
- 调整探索流程:"针对[使用场景],我在产品探索中应该问哪些AI专属问题?"
- 设计信任构建机制:"[AI功能]应该如何展示其推理过程?"
- 规划AI发布:"为[AI功能]创建分阶段发布计划"
- 设置AI指标:"我们应该为[AI功能]跟踪哪些指标?"
- 评审AI简报:"Critique这份解决方案简报中的AI相关考虑"
Connection to Other Skills
与其他技能的关联
| When you need to... | Use skill |
|---|---|
| Define overall product strategy | |
| Run discovery (with AI adaptations) | |
| Structure bets and roadmap | |
| Plan rollout and metrics | |
| Scale AI products across teams | |
| 当你需要... | 使用对应技能 |
|---|---|
| 定义整体产品策略 | |
| 开展产品探索(结合AI调整) | |
| 规划投注和路线图 | |
| 规划发布和指标 | |
| 跨团队规模化AI产品 | |
Quick Reference: AI Product Checklist
快速参考:AI产品检查清单
Before shipping AI features:
- Agency level defined — Clear level for this feature
- Graduation criteria set — How we'll earn higher autonomy
- Failure modes mapped — Know what can go wrong
- Evals in place — Automated quality checks
- Human evals scheduled — Subjective quality review
- Calibration loop running — Continuous improvement process
- Confidence mechanisms built — Users can verify AI work
- Guardrails active — Prevent harmful outputs
- Rollout staged — More cautious than traditional features
- Override tracking — Learning from user corrections
在发布AI功能前,请确认:
- 代理层级已定义 —— 该功能有明确的代理层级
- 升级标准已设置 —— 如何升级至更高自主层级
- 故障模式已梳理 —— 明确可能出现的问题
- 评估机制已就位 —— 自动化质量检查
- 人工评估已安排 —— 主观质量评审
- 校准循环已运行 —— 持续改进流程
- 信任构建机制已建立 —— 用户可验证AI工作成果
- 防护机制已激活 —— 防止有害输出
- 发布已分阶段 —— 比传统功能更谨慎
- 覆写跟踪已开启 —— 从用户修正中学习
Sources & Influences
参考来源与灵感
- Aishwarya Goel & Kiriti Gavini — CCCD Loop, Agency-Control Trade-off
- Anthropic — Constitutional AI, RLHF approaches
- OpenAI — Eval best practices
- Google DeepMind — AI safety frameworks
Part of the Modern Product Operating Model by Yannick Maurice
- Aishwarya Goel & Kiriti Gavini —— CCCD循环、代理控制权权衡
- Anthropic —— Constitutional AI、RLHF方法
- OpenAI —— 评估最佳实践
- Google DeepMind —— AI安全框架
属于Yannick Maurice的Modern Product Operating Model合集