ai-native-product

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI-Native Product Development

AI原生产品开发

"AI products aren't deterministic. They require continuous calibration, not just A/B tests."

This skill covers AI-Native Product Development — the overlay that modifies discovery, architecture, and delivery when AI is at the core. It addresses the unique challenges of building products where AI agents perform tasks autonomously.

Part of: Modern Product Operating Model — a collection of composable product skills.

Related skills:

product-strategy

product-discovery

product-architecture

product-delivery

product-leadership

"AI产品并非确定性的，它们需要持续校准，而不仅仅是A/B测试。"

本技能涵盖AI原生产品开发——当AI处于核心地位时，用于调整产品探索、架构设计和交付流程的方法论。它解决了构建AI Agent自主执行任务的产品时所面临的独特挑战。

所属合集：Modern Product Operating Model —— 一套可组合的产品技能合集。

相关技能：

product-strategy

product-discovery

product-architecture

product-delivery

product-leadership

When to Use This Skill

何时使用该技能

Use this skill when:

Building AI agents that act on behalf of users
Adding LLM-powered features to existing products
Designing human-AI interaction patterns
Deciding how much autonomy to give AI
Setting up eval strategies and calibration loops
Managing the "agency-control tradeoff"

Not needed for: Traditional software products, ML models used only for backend optimization (no user-facing autonomy)

在以下场景中使用本技能：

构建代表用户执行操作的AI Agent
为现有产品添加LLM驱动的功能
设计人机AI交互模式
确定赋予AI的自主权限程度
制定评估策略和校准循环
管理“代理控制权权衡”

无需使用的场景：传统软件产品、仅用于后端优化的ML模型（无用户侧自主功能）

What Makes AI Products Different

AI产品的独特之处

Traditional Software vs. AI Products

传统软件 vs AI原生产品

Dimension	Traditional Software	AI-Native Products
Behavior	Deterministic	Probabilistic
Testing	Unit tests, QA	Evals, calibration
Correctness	Binary (works or doesn't)	Spectrum (good enough?)
User role	Operator	Delegator + Reviewer
Failure mode	Error messages	Plausible but wrong outputs
Iteration	Ship → Measure → Iterate	Ship → Observe → Calibrate
Trust building	Feature completeness	Demonstrated reliability

维度	传统软件	AI原生产品
行为特性	确定性	概率性
测试方式	单元测试、QA测试	评估、校准
正确性标准	二元化（可用或不可用）	梯度化（是否足够好？）
用户角色	操作者	委托者 + 审核者
故障模式	错误提示	看似合理但实际错误的输出
迭代方式	发布 → 衡量 → 迭代	发布 → 观察 → 校准
信任建立	功能完整性	可验证的可靠性

The Core Challenge

核心挑战

AI products must navigate a fundamental tension:

More autonomy = More value (fewer steps, faster outcomes)
More autonomy = More risk (errors affect real work)

This is the Agency-Control Tradeoff.

AI产品必须应对一个根本性的矛盾：

更高自主性 = 更高价值（步骤更少、结果更快）
更高自主性 = 更高风险（错误会影响实际工作）

这就是代理控制权权衡。

Framework: The CCCD Loop

框架：CCCD循环

Credit: Aishwarya Goel & Kiriti Gavini

AI products require a Continuous Calibration and Confidence Development (CCCD) loop:

┌─────────────────────────────────────────────────────────────────┐
│                        CCCD LOOP                                │
│                                                                 │
│    CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE   │
│         ↓           ↓              ↓                 ↓         │
│     Eval and    Build user    Observe AI       Update evals    │
│     adjust AI    trust over   interactions     and models      │
│     behavior     time         at scale                         │
└─────────────────────────────────────────────────────────────────┘

CCCD Components:

Component	Purpose	Activities
Calibrate	Tune AI behavior to match user expectations	Run evals, adjust prompts/models, set guardrails
Confidence	Build appropriate user trust	Show AI reasoning, enable verification, demonstrate reliability
Continuous Discovery	Observe AI-user interactions at scale	Log interactions, identify failure patterns, surface edge cases
→ Back to Calibrate	Update based on learnings	Improve evals, retrain, adjust prompts

致谢：Aishwarya Goel & Kiriti Gavini

AI产品需要持续校准与信任构建（CCCD）循环：

┌─────────────────────────────────────────────────────────────────┐
│                        CCCD LOOP                                │
│                                                                 │
│    CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE   │
│         ↓           ↓              ↓                 ↓         │
│     Eval and    Build user    Observe AI       Update evals    │
│     adjust AI    trust over   interactions     and models      │
│     behavior     time         at scale                         │
└─────────────────────────────────────────────────────────────────┘

CCCD组件说明：

组件	目标	相关活动
校准（Calibrate）	调整AI行为以匹配用户预期	运行评估、调整提示词/模型、设置防护机制
信任构建（Confidence）	建立用户对AI的合理信任	展示AI推理过程、支持验证、证明可靠性
持续探索（Continuous Discovery）	规模化观察AI与用户的交互	记录交互、识别故障模式、发现边缘案例
→ 回到校准	根据学习成果更新优化	改进评估、重新训练模型、调整提示词

The Agency-Control Progression

代理控制权进阶路径

Five Levels of AI Agency

AI代理的五个层级

Level	Description	AI Does	User Does	Example
1. Assist	AI suggests, user executes	Generates options	Chooses and acts	Autocomplete, suggestions
2. Recommend	AI ranks, user approves	Analyzes and recommends	Reviews and approves	"AI recommends these 3 actions"
3. Execute with confirmation	AI acts after approval	Prepares action	Confirms before execution	"Send this email?" → Yes/No
4. Execute with notification	AI acts, notifies after	Acts autonomously	Reviews outcomes	"I scheduled the meeting and sent invites"
5. Fully autonomous	AI acts without notification	Handles end-to-end	Sets goals, reviews exceptions	AI handles routine tasks silently

层级	描述	AI的工作	用户的工作	示例
1. 辅助（Assist）	AI提供建议，用户执行操作	生成选项	选择并执行	自动补全、智能建议
2. 推荐（Recommend）	AI进行排序，用户审批	分析并给出推荐	审核并批准	"AI推荐以下3个操作"
3. 确认后执行（Execute with confirmation）	AI获得批准后执行操作	准备操作内容	执行前确认	"发送这封邮件？" → 是/否
4. 执行后通知（Execute with notification）	AI自主执行，事后通知用户	自主执行操作	审核结果	"我已安排会议并发送了邀请"
5. 完全自主（Fully autonomous）	AI无需通知即可执行	处理端到端任务	设置目标、审核异常情况	AI静默处理常规任务

Progression Strategy

进阶策略

Start lower, earn higher:

Level 1 → Build trust → Level 2 → Demonstrate reliability → Level 3 → ...

Graduation Criteria:

From Level	To Level	Requires
1 → 2	Assist → Recommend	User accepts suggestions > 70%
2 → 3	Recommend → Execute with confirm	User approves recommendations > 80%
3 → 4	Execute+confirm → Execute+notify	User confirms without edit > 90%
4 → 5	Execute+notify → Autonomous	User overrides < 5%, high-stakes scenarios excluded

Never fully autonomous for:

Irreversible actions (delete, send, purchase)
High-stakes decisions (financial, legal, health)
Novel situations outside training distribution
Actions affecting third parties

从低层级起步，逐步升级至高层级：

Level 1 → 建立信任 → Level 2 → 证明可靠性 → Level 3 → ...

升级标准：

从层级	到层级	所需条件
1 → 2	辅助 → 推荐	用户接受建议的比例 > 70%
2 → 3	推荐 → 确认后执行	用户批准推荐的比例 > 80%
3 → 4	确认后执行 → 执行后通知	用户无修改直接确认的比例 > 90%
4 → 5	执行后通知 → 完全自主	用户覆写比例 < 5%，排除高风险场景

永远不要设置为完全自主的场景：

不可逆操作（删除、发送、购买）
高风险决策（金融、法律、健康）
超出训练范围的全新场景
影响第三方的操作

AI-Native Discovery

AI原生产品探索

Standard discovery practices need adaptation for AI products.

标准的产品探索流程需要针对AI产品进行调整。

Modified Discovery Focus

调整后的探索重点

Standard Discovery	AI-Native Adaptation
"What job are you trying to do?"	+ "How much do you want to delegate?"
"What's your current workflow?"	+ "Which steps are you comfortable AI handling?"
"What would success look like?"	+ "What errors would be unacceptable?"
"Show me how you do this today"	+ "Show me how you verify AI work today"

标准产品探索	AI原生产品调整项
"你想要完成什么任务？"	+ "你希望将多少工作委托给AI？"
"你当前的工作流程是怎样的？"	+ "你愿意让AI处理哪些步骤？"
"成功的标准是什么？"	+ "哪些错误是你无法接受的？"
"展示一下你现在是如何做这件事的"	+ "展示一下你现在如何验证AI的工作成果？"

AI-Specific Discovery Questions

AI专属探索问题

Delegation appetite:

"Which parts of this task feel tedious vs. require your judgment?"
"If AI made an error here, what would the consequences be?"
"How would you want to verify AI's work?"

Trust calibration:

"What would AI need to demonstrate before you'd trust it to [action]?"
"Have you used AI tools before? What built or broke your trust?"
"Would you prefer AI to do more but occasionally err, or do less perfectly?"

Failure tolerance:

"What kinds of errors are annoying vs. damaging?"
"How quickly do you need to catch and fix AI mistakes?"
"What's your 'undo' option if AI gets it wrong?"

委托意愿：

"这项任务中哪些部分让你觉得繁琐，哪些部分需要你的判断？"
"如果AI在这部分出错，会有什么后果？"
"你希望如何验证AI的工作成果？"

信任校准：

"AI需要展示什么，你才会信任它去执行[操作]？"
"你之前使用过AI工具吗？是什么建立或破坏了你的信任？"
"你更倾向于AI多做一些但偶尔出错，还是少做一些但不出错？"

故障容忍度：

"哪些类型的错误只是烦人，哪些会造成损害？"
"你需要多快发现并修复AI的错误？"
"如果AI出错了，你的‘撤销’选项是什么？"

Observing AI Interactions

观察AI交互

In addition to interviews, AI discovery includes:

Method	What to Look For
Session recordings	Where do users override AI? Where do they accept blindly?
Interaction logs	Patterns in edits, rejections, corrections
Feedback analysis	Explicit signals (thumbs down, ratings)
Support tickets	AI-related complaints and confusion

除了用户访谈，AI产品探索还包括以下方法：

方法	关注要点
会话录制	用户在哪些地方覆写AI？在哪些地方盲目接受？
交互日志	编辑、拒绝、修正的模式
反馈分析	明确信号（差评、评分）
支持工单	与AI相关的投诉和困惑

AI-Native Architecture

AI原生产品架构

Solution Brief Additions

解决方案简报补充内容

For AI features, add to standard solution brief:

AI-SPECIFIC SECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]

FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]

EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]

CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]

CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]

对于AI功能，需在标准解决方案简报中添加以下部分：

AI专属部分
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]

FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]

EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]

CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]

CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]

AI Bet Categories

AI投注分类

In addition to standard bet categories:

Category	Description	Example
Capability expansion	AI can handle new task types	"AI can now summarize documents"
Agency graduation	Move to higher autonomy level	"AI sends emails without confirmation"
Calibration improvement	Better accuracy/reliability	"Reduce hallucination rate from 5% to 2%"
Confidence building	Better user trust	"Show AI reasoning before action"
Guardrail strengthening	Prevent harmful outputs	"Add content policy enforcement"

除了标准投注分类，还需添加以下分类：

分类	描述	示例
能力扩展	AI可处理新的任务类型	"AI现在可以总结文档"
代理层级升级	提升至更高自主层级	"AI无需确认即可发送邮件"
校准优化	提升准确性/可靠性	"将幻觉率从5%降至2%"
信任构建	增强用户信任	"执行操作前展示AI推理过程"
防护机制强化	防止有害输出	"添加内容政策强制执行"

AI-Native Delivery

AI原生产品交付

Eval Strategy (Replaces Traditional Testing)

评估策略（替代传统测试）

Eval Types:

Eval Type	Purpose	When to Run
Unit evals	Test specific capabilities	Every code change
Behavioral evals	Test end-to-end flows	Daily/weekly
Adversarial evals	Test edge cases and attacks	Before major releases
Human evals	Test subjective quality	Weekly sample
Production evals	Test on real traffic	Continuous

Eval Metrics:

Metric	What It Measures	Target
Task success rate	Does AI complete the intended task?	> 95%
Factual accuracy	Is output factually correct?	> 98%
Hallucination rate	Does AI make things up?	< 2%
Harmful output rate	Does AI produce unsafe content?	< 0.1%
User acceptance rate	Do users accept AI output?	> 80%
Override rate	How often do users correct AI?	< 15%

Eval Cadence:

Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)

评估类型：

评估类型	目标	运行时机
单元评估	测试特定能力	每次代码变更后
行为评估	测试端到端流程	每日/每周
对抗性评估	测试边缘案例和攻击场景	重大发布前
人工评估	测试主观质量	每周抽样
生产环境评估	在真实流量中测试	持续进行

评估指标：

指标	衡量内容	目标值
任务成功率	AI是否完成预期任务？	> 95%
事实准确率	输出内容是否符合事实？	> 98%
幻觉率	AI是否生成虚假内容？	< 2%
有害输出率	AI是否生成不安全内容？	< 0.1%
用户接受率	用户是否接受AI输出？	> 80%
覆写率	用户修正AI的频率？	< 15%

评估频率：

Code change → Unit evals (automated)
Daily → Behavioral evals (automated)
Weekly → Human evals (sample)
Release → Adversarial evals (red team)
Continuous → Production evals (monitoring)

Staged Rollout for AI Features

AI功能的分阶段发布

AI features require more cautious rollout:

Stage	Audience	Focus	Duration
Internal	Team	Find obvious failures	1 week
Alpha	5-10 trusted users	Qualitative feedback on AI behavior	2 weeks
Beta	5% of users	Quantitative eval metrics	2-4 weeks
Gradual GA	5% → 25% → 50% → 100%	Monitor at each stage	4+ weeks

AI-Specific Rollout Gates:

Gate	Criteria to Proceed
Alpha → Beta	Eval metrics above threshold, no harmful outputs
Beta → Gradual GA	User acceptance > 80%, override rate < 15%
Each GA increment	Metrics stable, no new failure modes

AI功能需要更谨慎的发布策略：

阶段	受众	重点	持续时间
内部测试	团队成员	发现明显故障	1周
Alpha测试	5-10位可信用户	收集AI行为的定性反馈	2周
Beta测试	5%的用户	量化评估指标	2-4周
逐步全面发布（Gradual GA）	5% → 25% → 50% → 100%	每个阶段都进行监控	4+周

AI专属发布门槛：

门槛	进阶条件
Alpha → Beta	评估指标达标，无有害输出
Beta → 逐步全面发布	用户接受率 > 80%，覆写率 < 15%
全面发布的每个阶段	指标稳定，无新故障模式

Calibration Loop

校准循环

Continuous calibration process:

OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
   ↑                                           │
   └───────────────────────────────────────────┘

Step	Activities	Cadence
Observe	Monitor production interactions, logs, feedback	Continuous
Identify	Surface failure patterns, edge cases, drift	Daily/weekly
Calibrate	Adjust prompts, fine-tune, add guardrails	As needed
Validate	Run evals on calibrated version	Before deploy
Deploy	Ship updates, continue observing	Staged

Calibration Triggers:

Eval metrics below threshold
New failure pattern identified
User feedback trend (negative)
Model update available
New use case discovered

持续校准流程：

OBSERVE → IDENTIFY → CALIBRATE → VALIDATE → DEPLOY
   ↑                                           │
   └───────────────────────────────────────────┘

步骤	活动	频率
观察（Observe）	监控生产环境交互、日志、反馈	持续进行
识别（Identify）	发现故障模式、边缘案例、模型漂移	每日/每周
校准（Calibrate）	调整提示词、微调模型、添加防护机制	按需进行
验证（Validate）	对校准后的版本运行评估	部署前
部署（Deploy）	发布更新，继续观察	分阶段进行

校准触发条件：

评估指标低于阈值
发现新的故障模式
用户反馈呈负面趋势
有可用的模型更新
发现新的使用场景

AI Metrics Hierarchy

AI指标层级

LAGGING
├── User retention (AI users vs. non-AI users)
├── Task completion rate (with AI assist)
└── Revenue from AI features

CORE
├── User acceptance rate
├── Override rate
├── Time-to-completion (with AI)
└── User-reported satisfaction

LEADING
├── Eval metrics (accuracy, hallucination, etc.)
├── Interaction volume
├── Feature discovery rate
└── Feedback sentiment

GUARDRAILS
├── Harmful output rate
├── Latency P95
├── Error rate
└── Cost per interaction

滞后指标
├── 用户留存率（AI用户 vs 非AI用户）
├── 任务完成率（借助AI辅助）
└── AI功能带来的收入

核心指标
├── 用户接受率
├── 覆写率
├── 任务完成时间（借助AI）
└── 用户反馈满意度

领先指标
├── 评估指标（准确率、幻觉率等）
├── 交互量
├── 功能发现率
└── 反馈情感倾向

防护指标
├── 有害输出率
├── Latency P95
├── 错误率
└── 每次交互成本

AI-Specific Anti-Patterns

AI专属反模式

Anti-Pattern	Why It Fails	Instead
Ship and hope	AI behavior drifts without monitoring	Continuous calibration
Autonomous by default	Users don't trust, don't adopt	Earn autonomy progressively
Black box AI	Users can't verify, won't trust	Show reasoning, enable verification
No evals	Quality degrades silently	Comprehensive eval strategy
Ignore overrides	Miss calibration signals	Override patterns inform calibration
One-size-fits-all agency	Different tasks need different levels	Task-specific agency levels

反模式	失败原因	正确做法
发布后听天由命	无监控情况下AI行为会漂移	持续校准
默认完全自主	用户不信任，不愿采用	逐步获取自主权限
黑盒AI	用户无法验证，不愿信任	展示推理过程，支持验证
无评估机制	质量会悄然下降	全面的评估策略
忽略用户覆写	错过校准信号	基于覆写模式优化校准
一刀切的自主层级	不同任务需要不同层级	针对任务设置专属自主层级

Templates

模板

This skill includes templates in the

templates/

directory:

```
agency-assessment.md
```
— Determine appropriate agency level
```
eval-strategy.md
```
— Design eval suite for AI feature
```
calibration-plan.md
```
— Set up continuous calibration

本技能在

templates/

目录中包含以下模板：

```
agency-assessment.md
```
—— 确定合适的代理层级
```
eval-strategy.md
```
—— 为AI功能设计评估套件
```
calibration-plan.md
```
—— 建立持续校准机制

Using This Skill with Claude

结合Claude使用本技能

Ask Claude to:

Assess agency level: "What agency level should [AI feature] have?"
Design agency progression: "Create a graduation path from assist to autonomous for [feature]"
Identify failure modes: "What could go wrong with [AI feature]? How do we mitigate?"
Design eval strategy: "Design an eval suite for [AI feature]"
Plan calibration: "Create a calibration plan for [AI feature]"
Adapt discovery: "What AI-specific questions should I ask in discovery for [use case]?"
Design confidence building: "How should [AI feature] show its reasoning?"
Plan AI rollout: "Create a staged rollout plan for [AI feature]"
Set AI metrics: "What metrics should we track for [AI feature]?"
Review AI brief: "Critique this solution brief for AI considerations"

可以让Claude完成以下任务：

评估代理层级："[AI功能]应该设置什么代理层级？"
设计代理升级路径："为[功能]创建从辅助到完全自主的升级路径"
识别故障模式："[AI功能]可能会出现什么问题？如何缓解？"
设计评估策略："为[AI功能]设计评估套件"
规划校准机制："为[AI功能]创建校准计划"
调整探索流程："针对[使用场景]，我在产品探索中应该问哪些AI专属问题？"
设计信任构建机制："[AI功能]应该如何展示其推理过程？"
规划AI发布："为[AI功能]创建分阶段发布计划"
设置AI指标："我们应该为[AI功能]跟踪哪些指标？"
评审AI简报："Critique这份解决方案简报中的AI相关考虑"

Connection to Other Skills

与其他技能的关联

When you need to...	Use skill
Define overall product strategy	`product-strategy`
Run discovery (with AI adaptations)	`product-discovery`
Structure bets and roadmap	`product-architecture`
Plan rollout and metrics	`product-delivery`
Scale AI products across teams	`product-leadership`

当你需要...	使用对应技能
定义整体产品策略	`product-strategy`
开展产品探索（结合AI调整）	`product-discovery`
规划投注和路线图	`product-architecture`
规划发布和指标	`product-delivery`
跨团队规模化AI产品	`product-leadership`

Quick Reference: AI Product Checklist

快速参考：AI产品检查清单

Sources & Influences

参考来源与灵感

Aishwarya Goel & Kiriti Gavini — CCCD Loop, Agency-Control Trade-off
Anthropic — Constitutional AI, RLHF approaches
OpenAI — Eval best practices
Google DeepMind — AI safety frameworks

Part of the Modern Product Operating Model by Yannick Maurice

Aishwarya Goel & Kiriti Gavini —— CCCD循环、代理控制权权衡
Anthropic —— Constitutional AI、RLHF方法
OpenAI —— 评估最佳实践
Google DeepMind —— AI安全框架

属于Yannick Maurice的Modern Product Operating Model合集