Loading...
Loading...
Compare original and translation side by side
"The AI model you're using today is the worst AI model you will ever use for the rest of your life. What computers can do changes every two months."
DON'T:
- "AI can't do X, so we won't support it"
- Build fallbacks that limit model capabilities
- Design UI that assumes current limitations
DO:
- Build interfaces that scale with model improvements
- Design for the capability you want, not current reality
- Test with future models in mind
- Make it easy to swap/upgrade modelsFeature: "AI code review"
❌ Current-Model Thinking:
- "Models can't catch logic bugs, only style"
- Limit to linting and formatting
- Don't even try complex reasoning
✅ Future-Model Thinking:
- Design for full logic review capability
- Start with style, but UI supports deeper analysis
- As models improve, feature gets better automatically
- Progressive: Basic → Advanced → Expert review"你如今使用的AI模型,会是你余生中用到的最差的AI模型。计算机的能力每两个月就会发生变化。"
DON'T:
- "AI can't do X, so we won't support it"
- Build fallbacks that limit model capabilities
- Design UI that assumes current limitations
DO:
- Build interfaces that scale with model improvements
- Design for the capability you want, not current reality
- Test with future models in mind
- Make it easy to swap/upgrade modelsFeature: "AI code review"
❌ Current-Model Thinking:
- "Models can't catch logic bugs, only style"
- Limit to linting and formatting
- Don't even try complex reasoning
✅ Future-Model Thinking:
- Design for full logic review capability
- Start with style, but UI supports deeper analysis
- As models improve, feature gets better automatically
- Progressive: Basic → Advanced → Expert review"At OpenAI, evals are the product spec. If you can define what good looks like in test cases, you've defined the product."
Requirement: "Search should return relevant results"// Eval as Product Spec
const searchEvals = [
{
query: "best PM frameworks",
expectedResults: ["RICE", "LNO", "Jobs-to-be-Done"],
quality: "all3InTop5",
},
{
query: "how to prioritize features",
expectedResults: ["Shreyas Doshi", "Marty Cagan"],
quality: "relevantInTop3",
},
{
query: "shiip prodcut", // typo
correctAs: "ship product",
quality: "handleTypos",
},
];1. Define Success Cases:
- Input: [specific user query/action]
- Expected: [what good output looks like]
- Quality bar: [how to measure success]
2. Define Failure Cases:
- Input: [edge case, adversarial, error]
- Expected: [graceful handling]
- Quality bar: [minimum acceptable]
3. Make Evals Runnable:
- Automated tests
- Run on every model change
- Track quality over time// Product Requirement as Eval
describe("AI Recommendations", () => {
test("cold start: new user gets popular items", async () => {
const newUser = { signupDate: today, interactions: [] };
const recs = await getRecommendations(newUser);
expect(recs).toIncludePopularItems();
expect(recs.length).toBeGreaterThan(5);
});
test("personalized: returning user gets relevant items", async () => {
const user = { interests: ["PM", "AI", "startups"] };
const recs = await getRecommendations(user);
expect(recs).toMatchInterests(user.interests);
expect(recs).toHaveDiversity(); // Not all same topic
});
test("quality bar: recommendations >70% click rate", async () => {
const users = await getTestUsers(100);
const clickRate = await measureClickRate(users);
expect(clickRate).toBeGreaterThan(0.7);
});
});"在OpenAI,评估用例就是产品规格。如果你能通过测试用例定义‘好’的标准,你就已经定义了产品。"
Requirement: "Search should return relevant results"// Eval as Product Spec
const searchEvals = [
{
query: "best PM frameworks",
expectedResults: ["RICE", "LNO", "Jobs-to-be-Done"],
quality: "all3InTop5",
},
{
query: "how to prioritize features",
expectedResults: ["Shreyas Doshi", "Marty Cagan"],
quality: "relevantInTop3",
},
{
query: "shiip prodcut", // typo
correctAs: "ship product",
quality: "handleTypos",
},
];1. Define Success Cases:
- Input: [specific user query/action]
- Expected: [what good output looks like]
- Quality bar: [how to measure success]
2. Define Failure Cases:
- Input: [edge case, adversarial, error]
- Expected: [graceful handling]
- Quality bar: [minimum acceptable]
3. Make Evals Runnable:
- Automated tests
- Run on every model change
- Track quality over time// Product Requirement as Eval
describe("AI Recommendations", () => {
test("cold start: new user gets popular items", async () => {
const newUser = { signupDate: today, interactions: [] };
const recs = await getRecommendations(newUser);
expect(recs).toIncludePopularItems();
expect(recs.length).toBeGreaterThan(5);
});
test("personalized: returning user gets relevant items", async () => {
const user = { interests: ["PM", "AI", "startups"] };
const recs = await getRecommendations(user);
expect(recs).toMatchInterests(user.interests);
expect(recs).toHaveDiversity(); // Not all same topic
});
test("quality bar: recommendations >70% click rate", async () => {
const users = await getTestUsers(100);
const clickRate = await measureClickRate(users);
expect(clickRate).toBeGreaterThan(0.7);
});
});"Don't make everything AI. Use AI where it shines, traditional code where it's reliable."
// Hybrid: AI understands, code executes
async function processUserQuery(query) {
// AI: Understand intent
const intent = await ai.classify(query, {
types: ["search", "create", "update", "delete"]
});
// Traditional: Execute deterministically
switch(intent.type) {
case "search": return search(intent.params);
case "create": return create(intent.params);
// ... reliable code paths
}
}// Hybrid: AI primary, rules backup
async function moderateContent(content) {
// Fast rules-based check first
if (containsProfanity(content)) return "reject";
if (content.length > 10000) return "reject";
// AI for nuanced cases
const aiModeration = await ai.moderate(content);
// Hybrid decision
if (aiModeration.confidence > 0.9) {
return aiModeration.decision;
} else {
return "human_review"; // Uncertain → human
}
}// Hybrid: AI generates, code filters
async function generateRecommendations(user) {
// AI: Generate candidates
const candidates = await ai.recommend(user, { count: 50 });
// Code: Apply business rules
const filtered = candidates
.filter(item => item.inStock)
.filter(item => item.price <= user.budget)
.filter(item => !user.previouslyPurchased(item));
// Code: Apply ranking logic
return filtered
.sort((a, b) => scoringFunction(a, b))
.slice(0, 10);
}"不要所有功能都用AI。在AI擅长的场景使用AI,在传统代码更可靠的场景使用传统代码。"
// Hybrid: AI understands, code executes
async function processUserQuery(query) {
// AI: Understand intent
const intent = await ai.classify(query, {
types: ["search", "create", "update", "delete"]
});
// Traditional: Execute deterministically
switch(intent.type) {
case "search": return search(intent.params);
case "create": return create(intent.params);
// ... reliable code paths
}
}// Hybrid: AI primary, rules backup
async function moderateContent(content) {
// Fast rules-based check first
if (containsProfanity(content)) return "reject";
if (content.length > 10000) return "reject";
// AI for nuanced cases
const aiModeration = await ai.moderate(content);
// Hybrid decision
if (aiModeration.confidence > 0.9) {
return aiModeration.decision;
} else {
return "human_review"; // Uncertain → human
}
}// Hybrid: AI generates, code filters
async function generateRecommendations(user) {
// AI: Generate candidates
const candidates = await ai.recommend(user, { count: 50 });
// Code: Apply business rules
const filtered = candidates
.filter(item => item.inStock)
.filter(item => item.price <= user.budget)
.filter(item => !user.previouslyPurchased(item));
// Code: Apply ranking logic
return filtered
.sort((a, b) => scoringFunction(a, b))
.slice(0, 10);
}// Show results as they arrive
for await (const chunk of ai.stream(prompt)) {
updateUI(chunk); // Immediate feedback
}[AI working...] → [Preview...] → [Full results]User: "Find PM articles"
AI: [shows results]
User: "More about prioritization"
AI: [refines results]if (result.confidence > 0.9) {
show(result); // High confidence
} else if (result.confidence > 0.5) {
show(result, { disclaimer: "AI-generated, verify" });
} else {
show("I'm not confident. Try rephrasing?");
}// Progressive cost
if (simpleQuery) {
return await smallModel(query); // Fast, cheap
} else {
return await largeModel(query); // Slow, expensive
}// Show results as they arrive
for await (const chunk of ai.stream(prompt)) {
updateUI(chunk); // Immediate feedback
}[AI working...] → [Preview...] → [Full results]User: "Find PM articles"
AI: [shows results]
User: "More about prioritization"
AI: [refines results]if (result.confidence > 0.9) {
show(result); // High confidence
} else if (result.confidence > 0.5) {
show(result, { disclaimer: "AI-generated, verify" });
} else {
show("I'm not confident. Try rephrasing?");
}// Progressive cost
if (simpleQuery) {
return await smallModel(query); // Fast, cheap
} else {
return await largeModel(query); // Slow, expensive
}FEATURE DECISION
│
├─ Deterministic logic needed? ────YES──→ TRADITIONAL CODE
│ (math, validation, access)
│ NO ↓
│
├─ Pattern matching / NLP? ────────YES──→ AI (with fallbacks)
│ (understanding intent, ambiguity)
│ NO ↓
│
├─ Creative generation? ───────────YES──→ AI (with human oversight)
│ (writing, images, ideas)
│ NO ↓
│
├─ Improves with more data? ───────YES──→ AI + ML
│ (recommendations, personalization)
│ NO ↓
│
└─ Use TRADITIONAL CODE ←──────────────────┘
(More reliable for this use case)FEATURE DECISION
│
├─ 是否需要确定性逻辑?────是──→ 传统代码
│ (数学计算、数据校验、权限控制)
│ 否 ↓
│
├─ 是否涉及模式匹配/自然语言处理?────是──→ AI(搭配兜底方案)
│ (意图识别、模糊输入处理)
│ 否 ↓
│
├─ 是否为创意生成场景?────是──→ AI(搭配人工审核)
│ (文案创作、图像生成、创意构思)
│ 否 ↓
│
├─ 是否可随数据迭代优化?────是──→ AI+机器学习
│ (推荐系统、个性化服务)
│ 否 ↓
│
└─ 使用传统代码 ←──────────────────┘
(该场景下传统代码更可靠)undefinedundefinedtest("handles typical user query", async () => {
const input = "[example]";
const output = await aiFeature(input);
expect(output).toMatch("[expected]");
});
test("handles edge case", async () => {
// Define edge cases as tests
});test("handles typical user query", async () => {
const input = "[example]";
const output = await aiFeature(input);
expect(output).toMatch("[expected]");
});
test("handles edge case", async () => {
// Define edge cases as tests
});undefinedundefinedundefinedundefinedundefinedundefinedundefinedundefined// Show results as they arrive
for await (const chunk of stream) {
appendToUI(chunk);
}// Show results as they arrive
for await (const chunk of stream) {
appendToUI(chunk);
}undefinedundefinedasync function search(query) {
// Traditional: Exact matches (fast, cheap)
const exactMatches = await traditionalSearch(query);
if (exactMatches.length > 10) return exactMatches;
// AI: Semantic search (smart, expensive)
const semanticResults = await aiSearch(query);
// Hybrid: Combine and rank
return dedupe([...exactMatches, ...semanticResults]);
}async function search(query) {
// Traditional: Exact matches (fast, cheap)
const exactMatches = await traditionalSearch(query);
if (exactMatches.length > 10) return exactMatches;
// AI: Semantic search (smart, expensive)
const semanticResults = await aiSearch(query);
// Hybrid: Combine and rank
return dedupe([...exactMatches, ...semanticResults]);
}"If you're building and the product is right on the edge of what's possible, keep going. In two months, there's going to be a better model."
"At OpenAI, we write evals as product specs. If you can define good output in test cases, you've defined the product."
"The AI model you're using today is the worst AI model you will ever use for the rest of your life."
“如果你正在构建的产品刚好触达当前AI能力的边界,别停下。两个月后,就会有更强大的模型出现。”
“在OpenAI,我们将评估用例作为产品规格。如果你能通过测试用例定义合格输出的标准,你就已经定义了产品。”
“你如今使用的AI模型,会是你余生中用到的最差的AI模型。”