ai-native-product

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI-Native Product Development

AI原生产品开发

"AI products aren't deterministic. They require continuous calibration, not just A/B tests."
This skill covers AI-Native Product Development — the overlay that modifies discovery, architecture, and delivery when AI is at the core. It addresses the unique challenges of building products where AI agents perform tasks autonomously.
Part of: Modern Product Operating Model — a collection of composable product skills.
Related skills:
product-strategy
,
product-discovery
,
product-architecture
,
product-delivery
,
product-leadership

"AI产品并非确定性的,它们需要持续校准,而不仅仅是A/B测试。"
本技能涵盖AI原生产品开发——当AI处于核心地位时,用于调整产品探索、架构设计和交付流程的方法论。它解决了构建AI Agent自主执行任务的产品时所面临的独特挑战。
所属合集Modern Product Operating Model —— 一套可组合的产品技能合集。
相关技能
product-strategy
,
product-discovery
,
product-architecture
,
product-delivery
,
product-leadership

When to Use This Skill

何时使用该技能

Use this skill when:
  • Building AI agents that act on behalf of users
  • Adding LLM-powered features to existing products
  • Designing human-AI interaction patterns
  • Deciding how much autonomy to give AI
  • Setting up eval strategies and calibration loops
  • Managing the "agency-control tradeoff"
Not needed for: Traditional software products, ML models used only for backend optimization (no user-facing autonomy)

在以下场景中使用本技能:
  • 构建代表用户执行操作的AI Agent
  • 为现有产品添加LLM驱动的功能
  • 设计人机AI交互模式
  • 确定赋予AI的自主权限程度
  • 制定评估策略和校准循环
  • 管理“代理控制权权衡”
无需使用的场景:传统软件产品、仅用于后端优化的ML模型(无用户侧自主功能)

What Makes AI Products Different

AI产品的独特之处

Traditional Software vs. AI Products

传统软件 vs AI原生产品

DimensionTraditional SoftwareAI-Native Products
BehaviorDeterministicProbabilistic
TestingUnit tests, QAEvals, calibration
CorrectnessBinary (works or doesn't)Spectrum (good enough?)
User roleOperatorDelegator + Reviewer
Failure modeError messagesPlausible but wrong outputs
IterationShip → Measure → IterateShip → Observe → Calibrate
Trust buildingFeature completenessDemonstrated reliability
维度传统软件AI原生产品
行为特性确定性概率性
测试方式单元测试、QA测试评估、校准
正确性标准二元化(可用或不可用)梯度化(是否足够好?)
用户角色操作者委托者 + 审核者
故障模式错误提示看似合理但实际错误的输出
迭代方式发布 → 衡量 → 迭代发布 → 观察 → 校准
信任建立功能完整性可验证的可靠性

The Core Challenge

核心挑战

AI products must navigate a fundamental tension:
More autonomy = More value (fewer steps, faster outcomes)
More autonomy = More risk (errors affect real work)
This is the Agency-Control Tradeoff.

AI产品必须应对一个根本性的矛盾:
更高自主性 = 更高价值(步骤更少、结果更快)
更高自主性 = 更高风险(错误会影响实际工作)
这就是代理控制权权衡

Framework: The CCCD Loop

框架:CCCD循环

Credit: Aishwarya Goel & Kiriti Gavini
AI products require a Continuous Calibration and Confidence Development (CCCD) loop:
┌─────────────────────────────────────────────────────────────────┐
│                        CCCD LOOP                                │
│                                                                 │
│    CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE   │
│         ↓           ↓              ↓                 ↓         │
│     Eval and    Build user    Observe AI       Update evals    │
│     adjust AI    trust over   interactions     and models      │
│     behavior     time         at scale                         │
└─────────────────────────────────────────────────────────────────┘
CCCD Components:
ComponentPurposeActivities
CalibrateTune AI behavior to match user expectationsRun evals, adjust prompts/models, set guardrails
ConfidenceBuild appropriate user trustShow AI reasoning, enable verification, demonstrate reliability
Continuous DiscoveryObserve AI-user interactions at scaleLog interactions, identify failure patterns, surface edge cases
→ Back to CalibrateUpdate based on learningsImprove evals, retrain, adjust prompts

致谢:Aishwarya Goel & Kiriti Gavini
AI产品需要持续校准与信任构建(CCCD)循环
┌─────────────────────────────────────────────────────────────────┐
│                        CCCD LOOP                                │
│                                                                 │
│    CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE   │
│         ↓           ↓              ↓                 ↓         │
│     Eval and    Build user    Observe AI       Update evals    │
│     adjust AI    trust over   interactions     and models      │
│     behavior     time         at scale                         │
└─────────────────────────────────────────────────────────────────┘
CCCD组件说明:
组件目标相关活动
校准(Calibrate)调整AI行为以匹配用户预期运行评估、调整提示词/模型、设置防护机制
信任构建(Confidence)建立用户对AI的合理信任展示AI推理过程、支持验证、证明可靠性
持续探索(Continuous Discovery)规模化观察AI与用户的交互记录交互、识别故障模式、发现边缘案例
→ 回到校准根据学习成果更新优化改进评估、重新训练模型、调整提示词

The Agency-Control Progression

代理控制权进阶路径

Five Levels of AI Agency

AI代理的五个层级

LevelDescriptionAI DoesUser DoesExample
1. AssistAI suggests, user executesGenerates optionsChooses and actsAutocomplete, suggestions
2. RecommendAI ranks, user approvesAnalyzes and recommendsReviews and approves"AI recommends these 3 actions"
3. Execute with confirmationAI acts after approvalPrepares actionConfirms before execution"Send this email?" → Yes/No
4. Execute with notificationAI acts, notifies afterActs autonomouslyReviews outcomes"I scheduled the meeting and sent invites"
5. Fully autonomousAI acts without notificationHandles end-to-endSets goals, reviews exceptionsAI handles routine tasks silently
层级描述AI的工作用户的工作示例
1. 辅助(Assist)AI提供建议,用户执行操作生成选项选择并执行自动补全、智能建议
2. 推荐(Recommend)AI进行排序,用户审批分析并给出推荐审核并批准"AI推荐以下3个操作"
3. 确认后执行(Execute with confirmation)AI获得批准后执行操作准备操作内容执行前确认"发送这封邮件?" → 是/否
4. 执行后通知(Execute with notification)AI自主执行,事后通知用户自主执行操作审核结果"我已安排会议并发送了邀请"
5. 完全自主(Fully autonomous)AI无需通知即可执行处理端到端任务设置目标、审核异常情况AI静默处理常规任务

Progression Strategy

进阶策略

Start lower, earn higher:
Level 1 → Build trust → Level 2 → Demonstrate reliability → Level 3 → ...
Graduation Criteria:
From LevelTo LevelRequires
1 → 2Assist → RecommendUser accepts suggestions > 70%
2 → 3Recommend → Execute with confirmUser approves recommendations > 80%
3 → 4Execute+confirm → Execute+notifyUser confirms without edit > 90%
4 → 5Execute+notify → AutonomousUser overrides < 5%, high-stakes scenarios excluded
Never fully autonomous for:
  • Irreversible actions (delete, send, purchase)
  • High-stakes decisions (financial, legal, health)
  • Novel situations outside training distribution
  • Actions affecting third parties

从低层级起步,逐步升级至高层级:
Level 1 → 建立信任 → Level 2 → 证明可靠性 → Level 3 → ...
升级标准:
从层级到层级所需条件
1 → 2辅助 → 推荐用户接受建议的比例 > 70%
2 → 3推荐 → 确认后执行用户批准推荐的比例 > 80%
3 → 4确认后执行 → 执行后通知用户无修改直接确认的比例 > 90%
4 → 5执行后通知 → 完全自主用户覆写比例 < 5%,排除高风险场景
永远不要设置为完全自主的场景:
  • 不可逆操作(删除、发送、购买)
  • 高风险决策(金融、法律、健康)
  • 超出训练范围的全新场景
  • 影响第三方的操作

AI-Native Discovery

AI原生产品探索

Standard discovery practices need adaptation for AI products.
标准的产品探索流程需要针对AI产品进行调整。

Modified Discovery Focus

调整后的探索重点

Standard DiscoveryAI-Native Adaptation
"What job are you trying to do?"+ "How much do you want to delegate?"
"What's your current workflow?"+ "Which steps are you comfortable AI handling?"
"What would success look like?"+ "What errors would be unacceptable?"
"Show me how you do this today"+ "Show me how you verify AI work today"
标准产品探索AI原生产品调整项
"你想要完成什么任务?"+ "你希望将多少工作委托给AI?"
"你当前的工作流程是怎样的?"+ "你愿意让AI处理哪些步骤?"
"成功的标准是什么?"+ "哪些错误是你无法接受的?"
"展示一下你现在是如何做这件事的"+ "展示一下你现在如何验证AI的工作成果?"

AI-Specific Discovery Questions

AI专属探索问题

Delegation appetite:
  • "Which parts of this task feel tedious vs. require your judgment?"
  • "If AI made an error here, what would the consequences be?"
  • "How would you want to verify AI's work?"
Trust calibration:
  • "What would AI need to demonstrate before you'd trust it to [action]?"
  • "Have you used AI tools before? What built or broke your trust?"
  • "Would you prefer AI to do more but occasionally err, or do less perfectly?"
Failure tolerance:
  • "What kinds of errors are annoying vs. damaging?"
  • "How quickly do you need to catch and fix AI mistakes?"
  • "What's your 'undo' option if AI gets it wrong?"
委托意愿:
  • "这项任务中哪些部分让你觉得繁琐,哪些部分需要你的判断?"
  • "如果AI在这部分出错,会有什么后果?"
  • "你希望如何验证AI的工作成果?"
信任校准:
  • "AI需要展示什么,你才会信任它去执行[操作]?"
  • "你之前使用过AI工具吗?是什么建立或破坏了你的信任?"
  • "你更倾向于AI多做一些但偶尔出错,还是少做一些但不出错?"
故障容忍度:
  • "哪些类型的错误只是烦人,哪些会造成损害?"
  • "你需要多快发现并修复AI的错误?"
  • "如果AI出错了,你的‘撤销’选项是什么?"

Observing AI Interactions

观察AI交互

In addition to interviews, AI discovery includes:
MethodWhat to Look For
Session recordingsWhere do users override AI? Where do they accept blindly?
Interaction logsPatterns in edits, rejections, corrections
Feedback analysisExplicit signals (thumbs down, ratings)
Support ticketsAI-related complaints and confusion

除了用户访谈,AI产品探索还包括以下方法:
方法关注要点
会话录制用户在哪些地方覆写AI?在哪些地方盲目接受?
交互日志编辑、拒绝、修正的模式
反馈分析明确信号(差评、评分)
支持工单与AI相关的投诉和困惑

AI-Native Architecture

AI原生产品架构

Solution Brief Additions

解决方案简报补充内容

For AI features, add to standard solution brief:
AI-SPECIFIC SECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]

FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]

EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]

CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]

CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]
对于AI功能,需在标准解决方案简报中添加以下部分:
AI专属部分
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]

FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]

EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]

CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]

CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]

AI Bet Categories

AI投注分类

In addition to standard bet categories:
CategoryDescriptionExample
Capability expansionAI can handle new task types"AI can now summarize documents"
Agency graduationMove to higher autonomy level"AI sends emails without confirmation"
Calibration improvementBetter accuracy/reliability"Reduce hallucination rate from 5% to 2%"
Confidence buildingBetter user trust"Show AI reasoning before action"
Guardrail strengtheningPrevent harmful outputs"Add content policy enforcement"

除了标准投注分类,还需添加以下分类:
分类描述示例
能力扩展AI可处理新的任务类型"AI现在可以总结文档"
代理层级升级提升至更高自主层级"AI无需确认即可发送邮件"
校准优化提升准确性/可靠性"将幻觉率从5%降至2%"
信任构建增强用户信任"执行操作前展示AI推理过程"
防护机制强化防止有害输出"添加内容政策强制执行"

AI-Native Delivery

AI原生产品交付

Eval Strategy (Replaces Traditional Testing)

评估策略(替代传统测试)

Eval Types:
Eval TypePurposeWhen to Run
Unit evalsTest specific capabilitiesEvery code change
Behavioral evalsTest end-to-end flowsDaily/weekly
Adversarial evalsTest edge cases and attacksBefore major releases
Human evalsTest subjective qualityWeekly sample
Production evalsTest on real trafficContinuous
Eval Metrics:
MetricWhat It MeasuresTarget
Task success rateDoes AI complete the intended task?> 95%
Factual accuracyIs output factually correct?> 98%
Hallucination rateDoes AI make things up?< 2%
Harmful output rateDoes AI produce unsafe content?< 0.1%
User acceptance rateDo users accept AI output?> 80%
Override rateHow often do users correct AI?< 15%
Eval Cadence:
Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)
评估类型:
评估类型目标运行时机
单元评估测试特定能力每次代码变更后
行为评估测试端到端流程每日/每周
对抗性评估测试边缘案例和攻击场景重大发布前
人工评估测试主观质量每周抽样
生产环境评估在真实流量中测试持续进行
评估指标:
指标衡量内容目标值
任务成功率AI是否完成预期任务?> 95%
事实准确率输出内容是否符合事实?> 98%
幻觉率AI是否生成虚假内容?< 2%
有害输出率AI是否生成不安全内容?< 0.1%
用户接受率用户是否接受AI输出?> 80%
覆写率用户修正AI的频率?< 15%
评估频率:
Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)

Staged Rollout for AI Features

AI功能的分阶段发布

AI features require more cautious rollout:
StageAudienceFocusDuration
InternalTeamFind obvious failures1 week
Alpha5-10 trusted usersQualitative feedback on AI behavior2 weeks
Beta5% of usersQuantitative eval metrics2-4 weeks
Gradual GA5% → 25% → 50% → 100%Monitor at each stage4+ weeks
AI-Specific Rollout Gates:
GateCriteria to Proceed
Alpha → BetaEval metrics above threshold, no harmful outputs
Beta → Gradual GAUser acceptance > 80%, override rate < 15%
Each GA incrementMetrics stable, no new failure modes
AI功能需要更谨慎的发布策略:
阶段受众重点持续时间
内部测试团队成员发现明显故障1周
Alpha测试5-10位可信用户收集AI行为的定性反馈2周
Beta测试5%的用户量化评估指标2-4周
逐步全面发布(Gradual GA)5% → 25% → 50% → 100%每个阶段都进行监控4+周
AI专属发布门槛:
门槛进阶条件
Alpha → Beta评估指标达标,无有害输出
Beta → 逐步全面发布用户接受率 > 80%,覆写率 < 15%
全面发布的每个阶段指标稳定,无新故障模式

Calibration Loop

校准循环

Continuous calibration process:
OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
   ↑                                           │
   └───────────────────────────────────────────┘
StepActivitiesCadence
ObserveMonitor production interactions, logs, feedbackContinuous
IdentifySurface failure patterns, edge cases, driftDaily/weekly
CalibrateAdjust prompts, fine-tune, add guardrailsAs needed
ValidateRun evals on calibrated versionBefore deploy
DeployShip updates, continue observingStaged
Calibration Triggers:
  • Eval metrics below threshold
  • New failure pattern identified
  • User feedback trend (negative)
  • Model update available
  • New use case discovered
持续校准流程:
OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
   ↑                                           │
   └───────────────────────────────────────────┘
步骤活动频率
观察(Observe)监控生产环境交互、日志、反馈持续进行
识别(Identify)发现故障模式、边缘案例、模型漂移每日/每周
校准(Calibrate)调整提示词、微调模型、添加防护机制按需进行
验证(Validate)对校准后的版本运行评估部署前
部署(Deploy)发布更新,继续观察分阶段进行
校准触发条件:
  • 评估指标低于阈值
  • 发现新的故障模式
  • 用户反馈呈负面趋势
  • 有可用的模型更新
  • 发现新的使用场景

AI Metrics Hierarchy

AI指标层级

LAGGING
├── User retention (AI users vs. non-AI users)
├── Task completion rate (with AI assist)
└── Revenue from AI features

CORE
├── User acceptance rate
├── Override rate
├── Time-to-completion (with AI)
└── User-reported satisfaction

LEADING
├── Eval metrics (accuracy, hallucination, etc.)
├── Interaction volume
├── Feature discovery rate
└── Feedback sentiment

GUARDRAILS
├── Harmful output rate
├── Latency P95
├── Error rate
└── Cost per interaction

滞后指标
├── 用户留存率(AI用户 vs 非AI用户)
├── 任务完成率(借助AI辅助)
└── AI功能带来的收入

核心指标
├── 用户接受率
├── 覆写率
├── 任务完成时间(借助AI)
└── 用户反馈满意度

领先指标
├── 评估指标(准确率、幻觉率等)
├── 交互量
├── 功能发现率
└── 反馈情感倾向

防护指标
├── 有害输出率
├── Latency P95
├── 错误率
└── 每次交互成本

AI-Specific Anti-Patterns

AI专属反模式

Anti-PatternWhy It FailsInstead
Ship and hopeAI behavior drifts without monitoringContinuous calibration
Autonomous by defaultUsers don't trust, don't adoptEarn autonomy progressively
Black box AIUsers can't verify, won't trustShow reasoning, enable verification
No evalsQuality degrades silentlyComprehensive eval strategy
Ignore overridesMiss calibration signalsOverride patterns inform calibration
One-size-fits-all agencyDifferent tasks need different levelsTask-specific agency levels

反模式失败原因正确做法
发布后听天由命无监控情况下AI行为会漂移持续校准
默认完全自主用户不信任,不愿采用逐步获取自主权限
黑盒AI用户无法验证,不愿信任展示推理过程,支持验证
无评估机制质量会悄然下降全面的评估策略
忽略用户覆写错过校准信号基于覆写模式优化校准
一刀切的自主层级不同任务需要不同层级针对任务设置专属自主层级

Templates

模板

This skill includes templates in the
templates/
directory:
  • agency-assessment.md
    — Determine appropriate agency level
  • eval-strategy.md
    — Design eval suite for AI feature
  • calibration-plan.md
    — Set up continuous calibration

本技能在
templates/
目录中包含以下模板:
  • agency-assessment.md
    —— 确定合适的代理层级
  • eval-strategy.md
    —— 为AI功能设计评估套件
  • calibration-plan.md
    —— 建立持续校准机制

Using This Skill with Claude

结合Claude使用本技能

Ask Claude to:
  1. Assess agency level: "What agency level should [AI feature] have?"
  2. Design agency progression: "Create a graduation path from assist to autonomous for [feature]"
  3. Identify failure modes: "What could go wrong with [AI feature]? How do we mitigate?"
  4. Design eval strategy: "Design an eval suite for [AI feature]"
  5. Plan calibration: "Create a calibration plan for [AI feature]"
  6. Adapt discovery: "What AI-specific questions should I ask in discovery for [use case]?"
  7. Design confidence building: "How should [AI feature] show its reasoning?"
  8. Plan AI rollout: "Create a staged rollout plan for [AI feature]"
  9. Set AI metrics: "What metrics should we track for [AI feature]?"
  10. Review AI brief: "Critique this solution brief for AI considerations"

可以让Claude完成以下任务:
  1. 评估代理层级:"[AI功能]应该设置什么代理层级?"
  2. 设计代理升级路径:"为[功能]创建从辅助到完全自主的升级路径"
  3. 识别故障模式:"[AI功能]可能会出现什么问题?如何缓解?"
  4. 设计评估策略:"为[AI功能]设计评估套件"
  5. 规划校准机制:"为[AI功能]创建校准计划"
  6. 调整探索流程:"针对[使用场景],我在产品探索中应该问哪些AI专属问题?"
  7. 设计信任构建机制:"[AI功能]应该如何展示其推理过程?"
  8. 规划AI发布:"为[AI功能]创建分阶段发布计划"
  9. 设置AI指标:"我们应该为[AI功能]跟踪哪些指标?"
  10. 评审AI简报:"Critique这份解决方案简报中的AI相关考虑"

Connection to Other Skills

与其他技能的关联

When you need to...Use skill
Define overall product strategy
product-strategy
Run discovery (with AI adaptations)
product-discovery
Structure bets and roadmap
product-architecture
Plan rollout and metrics
product-delivery
Scale AI products across teams
product-leadership

当你需要...使用对应技能
定义整体产品策略
product-strategy
开展产品探索(结合AI调整)
product-discovery
规划投注和路线图
product-architecture
规划发布和指标
product-delivery
跨团队规模化AI产品
product-leadership

Quick Reference: AI Product Checklist

快速参考:AI产品检查清单

Before shipping AI features:
  • Agency level defined — Clear level for this feature
  • Graduation criteria set — How we'll earn higher autonomy
  • Failure modes mapped — Know what can go wrong
  • Evals in place — Automated quality checks
  • Human evals scheduled — Subjective quality review
  • Calibration loop running — Continuous improvement process
  • Confidence mechanisms built — Users can verify AI work
  • Guardrails active — Prevent harmful outputs
  • Rollout staged — More cautious than traditional features
  • Override tracking — Learning from user corrections

在发布AI功能前,请确认:
  • 代理层级已定义 —— 该功能有明确的代理层级
  • 升级标准已设置 —— 如何升级至更高自主层级
  • 故障模式已梳理 —— 明确可能出现的问题
  • 评估机制已就位 —— 自动化质量检查
  • 人工评估已安排 —— 主观质量评审
  • 校准循环已运行 —— 持续改进流程
  • 信任构建机制已建立 —— 用户可验证AI工作成果
  • 防护机制已激活 —— 防止有害输出
  • 发布已分阶段 —— 比传统功能更谨慎
  • 覆写跟踪已开启 —— 从用户修正中学习

Sources & Influences

参考来源与灵感

  • Aishwarya Goel & Kiriti Gavini — CCCD Loop, Agency-Control Trade-off
  • Anthropic — Constitutional AI, RLHF approaches
  • OpenAI — Eval best practices
  • Google DeepMind — AI safety frameworks

Part of the Modern Product Operating Model by Yannick Maurice
  • Aishwarya Goel & Kiriti Gavini —— CCCD循环、代理控制权权衡
  • Anthropic —— Constitutional AI、RLHF方法
  • OpenAI —— 评估最佳实践
  • Google DeepMind —— AI安全框架

属于Yannick Maurice的Modern Product Operating Model合集