axiom-foundation-models
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFoundation Models — On-Device AI for Apple Platforms
Foundation Models — Apple平台的端侧AI
When to Use This Skill
何时使用该技能
Use when:
- Implementing on-device AI features with Foundation Models
- Adding text summarization, classification, or extraction capabilities
- Creating structured output from LLM responses
- Building tool-calling patterns for external data integration
- Streaming generated content for better UX
- Debugging Foundation Models issues (context overflow, slow generation, wrong output)
- Deciding between Foundation Models vs server LLMs (ChatGPT, Claude, etc.)
适用于以下场景:
- 使用Foundation Models实现端侧AI功能
- 添加文本摘要、分类或提取能力
- 从大语言模型(LLM)响应中创建结构化输出
- 构建用于外部数据集成的工具调用模式
- 流式传输生成内容以提升用户体验(UX)
- 调试Foundation Models相关问题(上下文溢出、生成缓慢、输出错误)
- 选择Foundation Models还是云端大语言模型(ChatGPT、Claude等)
Related Skills
相关技能
- Use for systematic troubleshooting (context exceeded, guardrail violations, availability problems)
axiom-foundation-models-diag - Use for complete API reference with all WWDC code examples
axiom-foundation-models-ref
- 使用进行系统性故障排查(上下文超限、违反内容规范、可用性问题)
axiom-foundation-models-diag - 使用获取包含所有WWDC代码示例的完整API参考
axiom-foundation-models-ref
Red Flags — Anti-Patterns That Will Fail
警示信号——会导致失败的反模式
❌ Using for World Knowledge
❌ 用于获取通用知识
Why it fails: The on-device model is 3 billion parameters, optimized for summarization, extraction, classification — NOT world knowledge or complex reasoning.
Example of wrong use:
swift
// ❌ BAD - Asking for world knowledge
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")Why: Model will hallucinate or give low-quality answers. It's trained for content generation, not encyclopedic knowledge.
Correct approach: Use server LLMs (ChatGPT, Claude) for world knowledge, or provide factual data through Tool calling.
失败原因:端侧模型为30亿参数,针对摘要、提取、分类进行优化——不适合通用知识或复杂推理场景。
错误示例:
swift
// ❌ BAD - Asking for world knowledge
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")原因:模型会产生幻觉或输出低质量答案。它是为内容生成训练的,而非百科知识。
正确做法:使用云端大语言模型(ChatGPT、Claude)获取通用知识,或通过工具调用提供真实数据。
❌ Blocking Main Thread
❌ 阻塞主线程
Why it fails: is but if called synchronously on main thread, freezes UI for seconds.
session.respond()asyncExample of wrong use:
swift
// ❌ BAD - Blocking main thread
Button("Generate") {
let response = try await session.respond(to: prompt) // UI frozen!
}Why: Generation takes 1-5 seconds. User sees frozen app, bad reviews follow.
Correct approach:
swift
// ✅ GOOD - Async on background
Button("Generate") {
Task {
let response = try await session.respond(to: prompt)
// Update UI with response
}
}失败原因:是异步方法,但如果在主线程同步调用,会导致UI冻结数秒。
session.respond()错误示例:
swift
// ❌ BAD - Blocking main thread
Button("Generate") {
let response = try await session.respond(to: prompt) // UI frozen!
}原因:生成过程需要1-5秒。用户会看到应用冻结,进而给出差评。
正确做法:
swift
// ✅ GOOD - Async on background
Button("Generate") {
Task {
let response = try await session.respond(to: prompt)
// Update UI with response
}
}❌ Manual JSON Parsing
❌ 手动JSON解析
Why it fails: Prompting for JSON and parsing with JSONDecoder leads to hallucinated keys, invalid JSON, no type safety.
Example of wrong use:
swift
// ❌ BAD - Manual JSON parsing
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // CRASHES!Why: Model might output when you expect . Or invalid JSON entirely.
{firstName: "John"}{name: "John"}Correct approach:
swift
// ✅ GOOD - @Generable guarantees structure
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person instance失败原因:提示生成JSON并使用JSONDecoder解析会导致键幻觉、无效JSON、无类型安全问题。
错误示例:
swift
// ❌ BAD - Manual JSON parsing
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // CRASHES!原因:模型可能输出,而你期望的是。或者输出完全无效的JSON。
{firstName: "John"}{name: "John"}正确做法:
swift
// ✅ GOOD - @Generable guarantees structure
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person instance❌ Ignoring Availability Check
❌ 忽略可用性检查
Why it fails: Foundation Models only runs on Apple Intelligence devices in supported regions. App crashes or shows errors without check.
Example of wrong use:
swift
// ❌ BAD - No availability check
let session = LanguageModelSession() // Might fail!Correct approach:
swift
// ✅ GOOD - Check first
switch SystemLanguageModel.default.availability {
case .available:
let session = LanguageModelSession()
// proceed
case .unavailable(let reason):
// Show graceful UI: "AI features require Apple Intelligence"
}失败原因:Foundation Models仅在支持Apple Intelligence的设备及地区运行。如果不检查,应用会崩溃或显示错误。
错误示例:
swift
// ❌ BAD - No availability check
let session = LanguageModelSession() // Might fail!正确做法:
swift
// ✅ GOOD - Check first
switch SystemLanguageModel.default.availability {
case .available:
let session = LanguageModelSession()
// proceed
case .unavailable(let reason):
// Show graceful UI: "AI features require Apple Intelligence"
}❌ Single Huge Prompt
❌ 单个超大提示词
Why it fails: 4096 token context window (input + output). One massive prompt hits limit, gives poor results.
Example of wrong use:
swift
// ❌ BAD - Everything in one prompt
let prompt = """
Generate a 7-day itinerary for Tokyo including hotels, restaurants,
activities for each day, transportation details, budget breakdown...
"""
// Exceeds context, poor qualityCorrect approach: Break into smaller tasks, use tools for external data, multi-turn conversation.
失败原因:上下文窗口为4096个token(输入+输出)。一个庞大的提示词会超出限制,导致结果质量差。
错误示例:
swift
// ❌ BAD - Everything in one prompt
let prompt = """
Generate a 7-day itinerary for Tokyo including hotels, restaurants,
activities for each day, transportation details, budget breakdown...
"""
// Exceeds context, poor quality正确做法:拆分成更小的任务,使用工具获取外部数据,采用多轮对话。
❌ Not Handling Context Overflow
❌ 不处理上下文溢出
Why it fails: Multi-turn conversations grow transcript. Eventually exceeds 4096 tokens, throws error, conversation ends.
Must handle:
swift
// ✅ GOOD - Handle overflow
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// Condense transcript and create new session
session = condensedSession(from: session)
}失败原因:多轮对话会增加对话记录,最终会超过4096个token,抛出错误,对话终止。
必须处理:
swift
// ✅ GOOD - Handle overflow
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// Condense transcript and create new session
session = condensedSession(from: session)
}❌ Not Handling Guardrail Violations
❌ 不处理内容规范违规
Why it fails: Model has content policy. Certain prompts trigger guardrails, throw error.
Must handle:
swift
// ✅ GOOD - Handle guardrails
do {
let response = try await session.respond(to: userInput)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Show message: "I can't help with that request"
}失败原因:模型有内容政策。某些提示词会触发规范限制,抛出错误。
必须处理:
swift
// ✅ GOOD - Handle guardrails
do {
let response = try await session.respond(to: userInput)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Show message: "I can't help with that request"
}❌ Not Handling Unsupported Language
❌ 不处理不支持的语言
Why it fails: Model supports specific languages. User input might be unsupported, throws error.
Must check:
swift
// ✅ GOOD - Check supported languages
let supported = SystemLanguageModel.default.supportedLanguages
guard supported.contains(Locale.current.language) else {
// Show disclaimer
return
}失败原因:模型仅支持特定语言。用户输入可能不被支持,抛出错误。
必须检查:
swift
// ✅ GOOD - Check supported languages
let supported = SystemLanguageModel.default.supportedLanguages
guard supported.contains(Locale.current.language) else {
// Show disclaimer
return
}Mandatory First Steps
必备前置步骤
Before writing any Foundation Models code, complete these steps:
在编写任何Foundation Models代码之前,请完成以下步骤:
1. Check Availability
1. 检查可用性
swift
switch SystemLanguageModel.default.availability {
case .available:
// Proceed with implementation
print("✅ Foundation Models available")
case .unavailable(let reason):
// Handle gracefully - show UI message
print("❌ Unavailable: \(reason)")
}Why: Foundation Models requires:
- Apple Intelligence-enabled device
- Supported region
- User opted in to Apple Intelligence
Failure mode: App crashes or shows confusing errors without check.
swift
switch SystemLanguageModel.default.availability {
case .available:
// Proceed with implementation
print("✅ Foundation Models available")
case .unavailable(let reason):
// Handle gracefully - show UI message
print("❌ Unavailable: \(reason)")
}原因:Foundation Models需要满足以下条件:
- 支持Apple Intelligence的设备
- 支持的地区
- 用户已启用Apple Intelligence
失败模式:如果不检查,应用会崩溃或显示令人困惑的错误。
2. Identify Use Case
2. 确定使用场景
Ask yourself: What is my primary goal?
| Use Case | Foundation Models? | Alternative |
|---|---|---|
| Summarization | ✅ YES | |
| Extraction (key info from text) | ✅ YES | |
| Classification (categorize content) | ✅ YES | |
| Content tagging | ✅ YES (built-in adapter!) | |
| World knowledge | ❌ NO | ChatGPT, Claude, Gemini |
| Complex reasoning | ❌ NO | Server LLMs |
| Mathematical computation | ❌ NO | Calculator, symbolic math |
Critical: If your use case requires world knowledge or advanced reasoning, stop. Foundation Models is the wrong tool.
自问:我的主要目标是什么?
| 使用场景 | 是否适合Foundation Models? | 替代方案 |
|---|---|---|
| 文本摘要 | ✅ 是 | |
| 文本提取(从文本中提取关键信息) | ✅ 是 | |
| 文本分类(对内容进行分类) | ✅ 是 | |
| 内容打标签 | ✅ 是(内置适配器!) | |
| 通用知识 | ❌ 否 | ChatGPT、Claude、Gemini |
| 复杂推理 | ❌ 否 | 云端大语言模型 |
| 数学计算 | ❌ 否 | 计算器、符号数学工具 |
关键提示:如果你的使用场景需要通用知识或高级推理,请停止。Foundation Models不是合适的工具。
3. Design @Generable Schema
3. 设计@Generable架构
If you need structured output (not just plain text):
Bad approach: Prompt for "JSON" and parse manually
Good approach: Define @Generable type
swift
@Generable
struct SearchSuggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}Why: Constrained decoding guarantees structure. No parsing errors, no hallucinated keys.
如果你需要结构化输出(而非纯文本):
错误做法:提示生成“JSON”并手动解析
正确做法:定义@Generable类型
swift
@Generable
struct SearchSuggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}原因:约束解码可保证结构。无解析错误,无键幻觉。
4. Consider Tools for External Data
4. 考虑使用工具获取外部数据
If your feature needs external information:
- Weather → WeatherKit tool
- Locations → MapKit tool
- Contacts → Contacts API tool
- Calendar → EventKit tool
Don't try to get this information from the model (it will hallucinate).
Do define Tool protocol implementations.
如果你的功能需要外部信息:
- 天气 → WeatherKit工具
- 地点 → MapKit工具
- 联系人 → Contacts API工具
- 日历 → EventKit工具
不要尝试从模型获取这些信息(它会产生幻觉)。
应该定义Tool协议的实现。
5. Plan Streaming for Long Generations
5. 为长生成任务规划流式传输
If generation takes >1 second, use streaming:
swift
let stream = session.streamResponse(
to: prompt,
generating: Itinerary.self
)
for try await partial in stream {
// Update UI incrementally
self.itinerary = partial
}Why: Users see progress immediately, perceived latency drops dramatically.
如果生成过程超过1秒,请使用流式传输:
swift
let stream = session.streamResponse(
to: prompt,
generating: Itinerary.self
)
for try await partial in stream {
// Update UI incrementally
self.itinerary = partial
}原因:用户会立即看到进度,感知延迟大幅降低。
Decision Tree
决策树
Need on-device AI?
│
├─ World knowledge/reasoning?
│ └─ ❌ NOT Foundation Models
│ → Use ChatGPT, Claude, Gemini, etc.
│ → Reason: 3B parameter model, not trained for encyclopedic knowledge
│
├─ Summarization?
│ └─ ✅ YES → Pattern 1 (Basic Session)
│ → Example: Summarize article, condense email
│ → Time: 10-15 minutes
│
├─ Structured extraction?
│ └─ ✅ YES → Pattern 2 (@Generable)
│ → Example: Extract name, date, amount from invoice
│ → Time: 15-20 minutes
│
├─ Content tagging?
│ └─ ✅ YES → Pattern 3 (contentTagging use case)
│ → Example: Tag article topics, extract entities
│ → Time: 10 minutes
│
├─ Need external data?
│ └─ ✅ YES → Pattern 4 (Tool calling)
│ → Example: Fetch weather, query contacts, get locations
│ → Time: 20-30 minutes
│
├─ Long generation?
│ └─ ✅ YES → Pattern 5 (Streaming)
│ → Example: Generate itinerary, create story
│ → Time: 15-20 minutes
│
└─ Dynamic schemas (runtime-defined structure)?
└─ ✅ YES → Pattern 6 (DynamicGenerationSchema)
→ Example: Level creator, user-defined forms
→ Time: 30-40 minutes需要端侧AI?
│
├─ 需要通用知识/复杂推理?
│ └─ ❌ 不适合Foundation Models
│ → 使用ChatGPT、Claude、Gemini等
│ → 原因:30亿参数模型,并非为百科知识训练
│
├─ 文本摘要?
│ └─ ✅ 是 → 模式1(基础会话)
│ → 示例:文章摘要、邮件压缩
│ → 耗时:10-15分钟
│
├─ 结构化提取?
│ └─ ✅ 是 → 模式2(@Generable)
│ → 示例:从发票中提取姓名、日期、金额
│ → 耗时:15-20分钟
│
├─ 内容打标签?
│ └─ ✅ 是 → 模式3(contentTagging场景)
│ → 示例:为文章主题打标签、提取实体
│ → 耗时:10分钟
│
├─ 需要外部数据?
│ └─ ✅ 是 → 模式4(工具调用)
│ → 示例:获取天气、查询联系人、获取地点
│ → 耗时:20-30分钟
│
├─ 长生成任务?
│ └─ ✅ 是 → 模式5(流式传输)
│ → 示例:生成行程、创建故事
│ → 耗时:15-20分钟
│
└─ 动态架构(运行时定义结构)?
└─ ✅ 是 → 模式6(DynamicGenerationSchema)
→ 示例:关卡创建器、用户定义表单
→ 耗时:30-40分钟Pattern 1: Basic Session (~1500 words)
模式1:基础会话(约1500词)
Use when: Simple text generation, summarization, or content analysis.
适用场景:简单文本生成、摘要或内容分析。
Core Concepts
核心概念
LanguageModelSession:
- Stateful — retains transcript of all interactions
- Instructions vs prompts:
- Instructions (from developer): Define model's role, static guidance
- Prompts (from user): Dynamic input for generation
- Model trained to obey instructions over prompts (security feature)
LanguageModelSession:
- 有状态——保留所有交互的对话记录
- 指令 vs 提示词:
- 指令(来自开发者):定义模型角色、静态指导
- 提示词(来自用户):用于生成的动态输入
- 模型会优先遵循指令而非提示词(安全特性)
Implementation
实现代码
swift
import FoundationModels
func respond(userInput: String) async throws -> String {
let session = LanguageModelSession(instructions: """
You are a friendly barista in a pixel art coffee shop.
Respond to the player's question concisely.
"""
)
let response = try await session.respond(to: userInput)
return response.content
}// WWDC 301:1:05
swift
import FoundationModels
func respond(userInput: String) async throws -> String {
let session = LanguageModelSession(instructions: """
You are a friendly barista in a pixel art coffee shop.
Respond to the player's question concisely.
"""
)
let response = try await session.respond(to: userInput)
return response.content
}// WWDC 301:1:05
Key Points
关键点
- Instructions are optional — Reasonable defaults if omitted
- Never interpolate user input into instructions — Security risk (prompt injection)
- Keep instructions concise — Each token adds latency
- 指令是可选的——如果省略,会使用合理的默认值
- 切勿将用户输入插入到指令中——存在提示注入的安全风险
- 保持指令简洁——每个token都会增加延迟
Multi-Turn Interactions
多轮交互
swift
let session = LanguageModelSession()
// First turn
let first = try await session.respond(to: "Write a haiku about fishing")
print(first.content)
// "Silent waters gleam,
// Casting lines in morning mist—
// Hope in every cast."
// Second turn - model remembers context
let second = try await session.respond(to: "Do another one about golf")
print(second.content)
// "Silent morning dew,
// Caddies guide with gentle words—
// Paths of patience tread."
// Inspect full transcript
print(session.transcript)// WWDC 286:17:46
Why this works: Session retains transcript automatically. Model uses context from previous turns.
swift
let session = LanguageModelSession()
// First turn
let first = try await session.respond(to: "Write a haiku about fishing")
print(first.content)
// "Silent waters gleam,
// Casting lines in morning mist—
// Hope in every cast."
// Second turn - model remembers context
let second = try await session.respond(to: "Do another one about golf")
print(second.content)
// "Silent morning dew,
// Caddies guide with gentle words—
// Paths of patience tread."
// Inspect full transcript
print(session.transcript)// WWDC 286:17:46
为什么有效:会话会自动保留对话记录。模型会使用之前轮次的上下文。
Transcript Inspection
对话记录检查
swift
let transcript = session.transcript
// Use for:
// - Debugging generation issues
// - Showing conversation history in UI
// - Exporting chat logsswift
let transcript = session.transcript
// 可用于:
// - 调试生成问题
// - 在UI中显示对话历史
// - 导出聊天日志Error Handling (Basic)
基础错误处理
swift
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Content policy triggered
print("Cannot generate that content")
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
// Language not supported
print("Please use English or another supported language")
}swift
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Content policy triggered
print("Cannot generate that content")
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
// Language not supported
print("Please use English or another supported language")
}When to Use This Pattern
何时使用该模式
✅ Good for:
- Simple Q&A
- Text summarization
- Content analysis
- Single-turn generation
❌ Not good for:
- Structured output (use Pattern 2)
- Long conversations (will hit context limit)
- External data needs (use Pattern 4)
✅ 适合:
- 简单问答
- 文本摘要
- 内容分析
- 单轮生成
❌ 不适合:
- 结构化输出(使用模式2)
- 长对话(会达到上下文限制)
- 需要外部数据(使用模式4)
Time Cost
时间成本
Implementation: 10-15 minutes for basic usage
Debugging: +5-10 minutes if hitting errors
实现:基础用法需要10-15分钟
调试:如果遇到错误,额外需要5-10分钟
Pattern 2: @Generable Structured Output (~2000 words)
模式2:@Generable结构化输出(约2000词)
Use when: You need structured data from model, not just plain text.
适用场景:你需要从模型获取结构化数据,而非纯文本。
The Problem
存在的问题
Without @Generable:
swift
// ❌ BAD - Unreliable
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Might get: {"firstName": "John"} when you expect {"name": "John"}
// Might get invalid JSON entirely
// Must parse manually, prone to crashes不使用@Generable的情况:
swift
// ❌ BAD - Unreliable
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Might get: {"firstName": "John"} when you expect {"name": "John"}
// Might get invalid JSON entirely
// Must parse manually, prone to crashesThe Solution: @Generable
解决方案:@Generable
swift
@Generable
struct Person {
let name: String
let age: Int
}
let session = LanguageModelSession()
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
let person = response.content // Type-safe Person instance!// WWDC 301:8:14
swift
@Generable
struct Person {
let name: String
let age: Int
}
let session = LanguageModelSession()
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
let person = response.content // Type-safe Person instance!// WWDC 301:8:14
How It Works (Constrained Decoding)
工作原理(约束解码)
- macro generates schema at compile-time
@Generable - Schema passed to model automatically
- Model generates tokens constrained by schema
- Framework parses output into Swift type
- Guaranteed structural correctness — No hallucinated keys, no parsing errors
"Constrained decoding masks out invalid tokens. Model can only pick tokens valid according to schema."
- 宏在编译时生成架构
@Generable - 架构自动传递给模型
- 模型生成符合架构约束的token
- 框架将输出解析为Swift类型
- 保证结构正确性——无键幻觉,无解析错误
"约束解码会屏蔽无效token。模型只能选择符合架构的有效token。"
Supported Types
支持的类型
Primitives:
- ,
String,Int,Float,DoubleBool
Arrays:
swift
@Generable
struct SearchSuggestions {
var searchTerms: [String]
}Nested/Composed:
swift
@Generable
struct Itinerary {
var destination: String
var days: [DayPlan] // Composed type
}
@Generable
struct DayPlan {
var activities: [String]
}// WWDC 286:6:18
Enums with Associated Values:
swift
@Generable
struct NPC {
let name: String
let encounter: Encounter
@Generable
enum Encounter {
case orderCoffee(String)
case wantToTalkToManager(complaint: String)
}
}// WWDC 301:10:49
Recursive Types:
swift
@Generable
struct Itinerary {
var destination: String
var relatedItineraries: [Itinerary] // Recursive!
}基本类型:
- ,
String,Int,Float,DoubleBool
数组:
swift
@Generable
struct SearchSuggestions {
var searchTerms: [String]
}嵌套/组合类型:
swift
@Generable
struct Itinerary {
var destination: String
var days: [DayPlan] // Composed type
}
@Generable
struct DayPlan {
var activities: [String]
}// WWDC 286:6:18
带关联值的枚举:
swift
@Generable
struct NPC {
let name: String
let encounter: Encounter
@Generable
enum Encounter {
case orderCoffee(String)
case wantToTalkToManager(complaint: String)
}
}// WWDC 301:10:49
递归类型:
swift
@Generable
struct Itinerary {
var destination: String
var relatedItineraries: [Itinerary] // Recursive!
}@Guide Constraints
@Guide约束
Control generated values with @Guide:
Natural Language Description:
swift
@Generable
struct NPC {
@Guide(description: "A full name with first and last")
let name: String
}Numeric Ranges:
swift
@Generable
struct Character {
@Guide(.range(1...10))
let level: Int
}// WWDC 301:11:20
Array Count:
swift
@Generable
struct Suggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}// WWDC 286:5:32
Maximum Count:
swift
@Generable
struct Result {
@Guide(.maximumCount(3))
let topics: [String]
}Regex Patterns:
swift
@Generable
struct NPC {
@Guide(Regex {
Capture {
ChoiceOf {
"Mr"
"Mrs"
}
}
". "
OneOrMore(.word)
})
let name: String
}
// Output: {name: "Mrs. Brewster"}// WWDC 301:13:40
使用@Guide控制生成的值:
自然语言描述:
swift
@Generable
struct NPC {
@Guide(description: "A full name with first and last")
let name: String
}数值范围:
swift
@Generable
struct Character {
@Guide(.range(1...10))
let level: Int
}// WWDC 301:11:20
数组数量:
swift
@Generable
struct Suggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}// WWDC 286:5:32
最大数量:
swift
@Generable
struct Result {
@Guide(.maximumCount(3))
let topics: [String]
}正则表达式模式:
swift
@Generable
struct NPC {
@Guide(Regex {
Capture {
ChoiceOf {
"Mr"
"Mrs"
}
}
". "
OneOrMore(.word)
})
let name: String
}
// Output: {name: "Mrs. Brewster"}// WWDC 301:13:40
Property Order Matters
属性顺序很重要
Properties generated in declaration order:
swift
@Generable
struct Itinerary {
var destination: String // Generated first
var days: [DayPlan] // Generated second
var summary: String // Generated last
}"You may find model produces best summaries when they're last property."
Why: Later properties can reference earlier ones. Put most important properties first for streaming.
属性会按照声明顺序生成:
swift
@Generable
struct Itinerary {
var destination: String // 首先生成
var days: [DayPlan] // 其次生成
var summary: String // 最后生成
}"你会发现当摘要作为最后一个属性时,模型的输出效果最佳。"
原因:后面的属性可以引用前面的属性。为了流式传输,将最重要的属性放在前面。
Pattern 3: Streaming with PartiallyGenerated (~1500 words)
模式3:使用PartiallyGenerated进行流式传输(约1500词)
Use when: Generation takes >1 second and you want progressive UI updates.
适用场景:生成过程超过1秒,你需要渐进式更新UI。
The Problem
存在的问题
Without streaming:
swift
// User waits 3-5 seconds seeing nothing
let response = try await session.respond(to: prompt, generating: Itinerary.self)
// Then entire result appears at onceUser experience: Feels slow, frozen UI.
不使用流式传输的情况:
swift
// User waits 3-5 seconds seeing nothing
let response = try await session.respond(to: prompt, generating: Itinerary.self)
// Then entire result appears at once用户体验:感觉缓慢,UI冻结。
The Solution: Streaming
解决方案:流式传输
swift
@Generable
struct Itinerary {
var name: String
var days: [DayPlan]
}
let stream = session.streamResponse(
to: "Generate a 3-day itinerary to Mt. Fuji",
generating: Itinerary.self
)
for try await partial in stream {
print(partial) // Incrementally updated
}// WWDC 286:9:40
swift
@Generable
struct Itinerary {
var name: String
var days: [DayPlan]
}
let stream = session.streamResponse(
to: "Generate a 3-day itinerary to Mt. Fuji",
generating: Itinerary.self
)
for try await partial in stream {
print(partial) // 增量更新
}// WWDC 286:9:40
PartiallyGenerated Type
PartiallyGenerated类型
@GenerablePartiallyGeneratedswift
// Compiler generates:
extension Itinerary {
struct PartiallyGenerated {
var name: String? // All properties optional!
var days: [DayPlan]?
}
}Why optional: Properties fill in as model generates them.
@GenerablePartiallyGeneratedswift
// Compiler generates:
extension Itinerary {
struct PartiallyGenerated {
var name: String? // 所有属性都是可选的!
var days: [DayPlan]?
}
}为什么是可选的:属性会随着模型生成逐步填充。
SwiftUI Integration
SwiftUI集成
swift
struct ItineraryView: View {
let session: LanguageModelSession
@State private var itinerary: Itinerary.PartiallyGenerated?
var body: some View {
VStack {
if let name = itinerary?.name {
Text(name)
.font(.title)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
}
}
Button("Generate") {
Task {
let stream = session.streamResponse(
to: "Generate 3-day itinerary to Tokyo",
generating: Itinerary.self
)
for try await partial in stream {
self.itinerary = partial
}
}
}
}
}
}// WWDC 286:10:05
swift
struct ItineraryView: View {
let session: LanguageModelSession
@State private var itinerary: Itinerary.PartiallyGenerated?
var body: some View {
VStack {
if let name = itinerary?.name {
Text(name)
.font(.title)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
}
}
Button("Generate") {
Task {
let stream = session.streamResponse(
to: "Generate 3-day itinerary to Tokyo",
generating: Itinerary.self
)
for try await partial in stream {
self.itinerary = partial
}
}
}
}
}
}// WWDC 286:10:05
Animations & Transitions
动画与过渡
Add polish:
swift
if let name = itinerary?.name {
Text(name)
.transition(.opacity)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
.transition(.slide)
}
}"Get creative with SwiftUI animations to hide latency. Turn waiting into delight."
添加润色:
swift
if let name = itinerary?.name {
Text(name)
.transition(.opacity)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
.transition(.slide)
}
}"发挥创意使用SwiftUI动画来隐藏延迟,将等待过程变得愉悦。"
View Identity
视图标识
Critical for arrays:
swift
// ✅ GOOD - Stable identity
ForEach(days, id: \.id) { day in
DayView(day: day)
}
// ❌ BAD - Identity changes, animations break
ForEach(days.indices, id: \.self) { index in
DayView(day: days[index])
}数组的关键注意事项:
swift
// ✅ GOOD - Stable identity
ForEach(days, id: \.id) { day in
DayView(day: day)
}
// ❌ BAD - Identity changes, animations break
ForEach(days.indices, id: \.self) { index in
DayView(day: days[index])
}Property Order for Streaming UX
为流式传输UX优化属性顺序
swift
// ✅ GOOD - Title appears first, summary last
@Generable
struct Itinerary {
var name: String // Shows first
var days: [DayPlan] // Shows second
var summary: String // Shows last (can reference days)
}
// ❌ BAD - Summary before content
@Generable
struct Itinerary {
var summary: String // Doesn't make sense before days!
var days: [DayPlan]
}// WWDC 286:11:00
swift
// ✅ GOOD - 标题先显示,摘要最后显示
@Generable
struct Itinerary {
var name: String // 首先显示
var days: [DayPlan] // 其次显示
var summary: String // 最后显示(可以引用days)
}
// ❌ BAD - 摘要在内容之前
@Generable
struct Itinerary {
var summary: String // 在days之前显示毫无意义!
var days: [DayPlan]
}// WWDC 286:11:00
When to Use Streaming
何时使用流式传输
✅ Use for:
- Itineraries
- Stories
- Long descriptions
- Multi-section content
❌ Skip for:
- Simple Q&A (< 1 sentence)
- Quick classification
- Content tagging
✅ 适合:
- 行程生成
- 故事创作
- 长描述
- 多章节内容
❌ 不适合:
- 简单问答(少于1句话)
- 快速分类
- 内容打标签
Time Cost
时间成本
Implementation: 15-20 minutes with SwiftUI
Polish (animations): +5-10 minutes
实现:结合SwiftUI需要15-20分钟
润色(动画):额外需要5-10分钟
Pattern 4: Tool Calling (~2000 words)
模式4:工具调用(约2000词)
Use when: Model needs external data (weather, locations, contacts) to generate response.
适用场景:模型需要外部数据(天气、地点、联系人)来生成响应。
The Problem
存在的问题
swift
// ❌ BAD - Model will hallucinate
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
// Output: "It's about 72°F" (completely made up!)Why: 3B parameter model doesn't have real-time weather data.
swift
// ❌ BAD - Model will hallucinate
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
// Output: "It's about 72°F" (completely made up!)原因:30亿参数模型没有实时天气数据。
The Solution: Tool Calling
解决方案:工具调用
Let model autonomously call your code to fetch external data.
swift
import FoundationModels
import WeatherKit
import CoreLocation
struct GetWeatherTool: Tool {
let name = "getWeather"
let description = "Retrieve latest weather for a city"
@Generable
struct Arguments {
@Guide(description: "The city to fetch weather for")
var city: String
}
func call(arguments: Arguments) async throws -> ToolOutput {
let places = try await CLGeocoder().geocodeAddressString(arguments.city)
let weather = try await WeatherService.shared.weather(for: places.first!.location!)
let temp = weather.currentWeather.temperature.value
return ToolOutput("\(arguments.city)'s temperature is \(temp) degrees.")
}
}// WWDC 286:13:42
让模型自主调用你的代码来获取外部数据。
swift
import FoundationModels
import WeatherKit
import CoreLocation
struct GetWeatherTool: Tool {
let name = "getWeather"
let description = "Retrieve latest weather for a city"
@Generable
struct Arguments {
@Guide(description: "The city to fetch weather for")
var city: String
}
func call(arguments: Arguments) async throws -> ToolOutput {
let places = try await CLGeocoder().geocodeAddressString(arguments.city)
let weather = try await WeatherService.shared.weather(for: places.first!.location!)
let temp = weather.currentWeather.temperature.value
return ToolOutput("\(arguments.city)'s temperature is \(temp) degrees.")
}
}// WWDC 286:13:42
Attaching Tool to Session
为会话附加工具
swift
let session = LanguageModelSession(
tools: [GetWeatherTool()],
instructions: "Help user with weather forecasts."
)
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
print(response.content)
// "It's 71°F in Cupertino!"// WWDC 286:15:03
Model autonomously:
- Recognizes it needs weather data
- Calls
GetWeatherTool - Receives real temperature
- Incorporates into natural response
swift
let session = LanguageModelSession(
tools: [GetWeatherTool()],
instructions: "Help user with weather forecasts."
)
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
print(response.content)
// "It's 71°F in Cupertino!"// WWDC 286:15:03
模型会自主:
- 识别到需要天气数据
- 调用
GetWeatherTool - 接收真实温度
- 将其整合到自然语言响应中
Tool Protocol Requirements
Tool协议要求
swift
protocol Tool {
var name: String { get }
var description: String { get }
associatedtype Arguments: Generable
func call(arguments: Arguments) async throws -> ToolOutput
}Name: Short, verb-based (e.g. , )
Description: One sentence explaining purpose
Arguments: Must be (guarantees valid input)
call: Your code — fetch data, process, return
getWeatherfindContact@Generableswift
protocol Tool {
var name: String { get }
var description: String { get }
associatedtype Arguments: Generable
func call(arguments: Arguments) async throws -> ToolOutput
}Name:简短,动词开头(例如、)
Description:一句话说明用途
Arguments:必须是类型(保证输入有效)
call:你的代码——获取数据、处理、返回结果
getWeatherfindContact@GenerableToolOutput
ToolOutput
Two forms:
- Natural language (String):
swift
return ToolOutput("Temperature is 71°F")- Structured (GeneratedContent):
swift
let content = GeneratedContent(properties: ["temperature": 71])
return ToolOutput(content)两种形式:
- 自然语言(字符串):
swift
return ToolOutput("Temperature is 71°F")- 结构化(GeneratedContent):
swift
let content = GeneratedContent(properties: ["temperature": 71])
return ToolOutput(content)Multiple Tools Example
多工具示例
swift
let session = LanguageModelSession(
tools: [
GetWeatherTool(),
FindRestaurantTool(),
FindHotelTool()
],
instructions: "Plan travel itineraries."
)
let response = try await session.respond(
to: "Create a 2-day plan for Tokyo"
)
// Model autonomously decides:
// - Calls FindRestaurantTool for dining
// - Calls FindHotelTool for accommodation
// - Calls GetWeatherTool to suggest activitiesswift
let session = LanguageModelSession(
tools: [
GetWeatherTool(),
FindRestaurantTool(),
FindHotelTool()
],
instructions: "Plan travel itineraries."
)
let response = try await session.respond(
to: "Create a 2-day plan for Tokyo"
)
// Model autonomously decides:
// - Calls FindRestaurantTool for dining
// - Calls FindHotelTool for accommodation
// - Calls GetWeatherTool to suggest activitiesStateful Tools
有状态工具
Tools can maintain state across calls:
swift
class FindContactTool: Tool {
let name = "findContact"
let description = "Find contact from age generation"
var pickedContacts = Set<String>() // State!
@Generable
struct Arguments {
let generation: Generation
@Generable
enum Generation {
case babyBoomers
case genX
case millennial
case genZ
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use Contacts API
var contacts = fetchContacts(for: arguments.generation)
// Remove already picked
contacts.removeAll(where: { pickedContacts.contains($0.name) })
guard let picked = contacts.randomElement() else {
return ToolOutput("No more contacts")
}
pickedContacts.insert(picked.name) // Update state
return ToolOutput(picked.name)
}
}// WWDC 301:21:55
Why class, not struct: Need to mutate state from method.
call工具可以在多次调用之间保持状态:
swift
class FindContactTool: Tool {
let name = "findContact"
let description = "Find contact from age generation"
var pickedContacts = Set<String>() // State!
@Generable
struct Arguments {
let generation: Generation
@Generable
enum Generation {
case babyBoomers
case genX
case millennial
case genZ
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use Contacts API
var contacts = fetchContacts(for: arguments.generation)
// Remove already picked
contacts.removeAll(where: { pickedContacts.contains($0.name) })
guard let picked = contacts.randomElement() else {
return ToolOutput("No more contacts")
}
pickedContacts.insert(picked.name) // Update state
return ToolOutput(picked.name)
}
}// WWDC 301:21:55
为什么用class而不是struct:需要在方法中修改状态。
callTool Calling Flow
工具调用流程
1. Session initialized with tools
2. User prompt: "What's Tokyo's weather?"
3. Model analyzes: "Need weather data"
4. Model generates tool call: getWeather(city: "Tokyo")
5. Framework calls your tool's call() method
6. Your tool fetches real data from API
7. Tool output inserted into transcript
8. Model generates final response using tool output"Model decides autonomously when and how often to call tools. Can call multiple tools per request, even in parallel."
1. 使用工具初始化会话
2. 用户提示:"东京的天气如何?"
3. 模型分析:"需要天气数据"
4. 模型生成工具调用:getWeather(city: "Tokyo")
5. 框架调用你的工具的call()方法
6. 你的工具从API获取真实数据
7. 工具输出被插入到对话记录中
8. 模型使用工具输出生成最终响应"模型会自主决定何时以及调用工具的频率。可以在一次请求中调用多个工具,甚至并行调用。"
Tool Calling Guarantees
工具调用的保证
✅ Guaranteed:
- Valid tool names (no hallucinated tools)
- Valid arguments (via @Generable)
- Structural correctness
❌ Not guaranteed:
- Tool will be called (model might not need it)
- Specific argument values (model decides based on context)
✅ 保证:
- 有效的工具名称(无幻觉工具)
- 有效的参数(通过@Generable)
- 结构正确性
❌ 不保证:
- 工具会被调用(模型可能不需要)
- 特定的参数值(模型会根据上下文决定)
Real-World Example: Itinerary Planner
真实世界示例:行程规划器
swift
struct FindPointsOfInterestTool: Tool {
let name = "findPointsOfInterest"
let description = "Find restaurants, museums, parks near a landmark"
let landmark: String
@Generable
struct Arguments {
let category: Category
@Generable
enum Category {
case restaurant
case museum
case park
case marina
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use MapKit
let request = MKLocalSearch.Request()
request.naturalLanguageQuery = "\(arguments.category) near \(landmark)"
let search = MKLocalSearch(request: request)
let response = try await search.start()
let names = response.mapItems.prefix(5).map { $0.name ?? "" }
return ToolOutput(names.joined(separator: ", "))
}
}From WWDC 259 summary: "Tool fetches points of interest from MapKit. Model uses world knowledge to determine promising categories."
swift
struct FindPointsOfInterestTool: Tool {
let name = "findPointsOfInterest"
let description = "Find restaurants, museums, parks near a landmark"
let landmark: String
@Generable
struct Arguments {
let category: Category
@Generable
enum Category {
case restaurant
case museum
case park
case marina
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use MapKit
let request = MKLocalSearch.Request()
request.naturalLanguageQuery = "\(arguments.category) near \(landmark)"
let search = MKLocalSearch(request: request)
let response = try await search.start()
let names = response.mapItems.prefix(5).map { $0.name ?? "" }
return ToolOutput(names.joined(separator: ", "))
}
}来自WWDC 259摘要:"工具从MapKit获取兴趣点。模型利用通用知识确定有前景的类别。"
When to Use Tools
何时使用工具
✅ Use for:
- Weather data
- Map/location queries
- Contact information
- Calendar events
- External APIs
❌ Don't use for:
- Data model already has
- Information in prompt/instructions
- Simple calculations (model can do these)
✅ 适合:
- 天气数据
- 地图/地点查询
- 联系人信息
- 日历事件
- 外部API
❌ 不适合:
- 数据模型已包含的信息
- 提示词/指令中已有的信息
- 简单计算(模型可以完成)
Time Cost
时间成本
Simple tool: 20-25 minutes
Complex tool with state: 30-40 minutes
简单工具:20-25分钟
带状态的复杂工具:30-40分钟
Pattern 5: Context Management (~1500 words)
模式5:上下文管理(约1500词)
Use when: Multi-turn conversations that might exceed 4096 token limit.
适用场景:可能超过4096token限制的多轮对话。
The Problem
存在的问题
swift
// Long conversation...
for i in 1...100 {
let response = try await session.respond(to: "Question \(i)")
// Eventually...
// Error: exceededContextWindowSize
}Context window: 4096 tokens (input + output combined)
Average: ~3 characters per token in English
Rough calculation:
- 4096 tokens ≈ 12,000 characters
- ≈ 2,000-3,000 words total
Long conversation or verbose prompts/responses → Exceed limit
swift
// Long conversation...
for i in 1...100 {
let response = try await session.respond(to: "Question \(i)")
// Eventually...
// Error: exceededContextWindowSize
}上下文窗口:4096个token(输入+输出总和)
平均:英文中每个token约3个字符
粗略计算:
- 4096个token ≈ 12,000个字符
- ≈ 2,000-3,000个单词
长对话或冗长的提示词/响应 → 超出限制
Handling Context Overflow
处理上下文溢出
Basic: Start fresh session
基础方案:启动新会话
swift
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session, no history
session = LanguageModelSession()
}// WWDC 301:3:37
Problem: Loses entire conversation history.
swift
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session, no history
session = LanguageModelSession()
}// WWDC 301:3:37
问题:丢失整个对话历史。
Better: Condense Transcript
更好的方案:压缩对话记录
swift
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session with condensed history
session = condensedSession(from: session)
}
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let allEntries = previous.transcript.entries
var condensedEntries = [Transcript.Entry]()
// Always include first entry (instructions)
if let first = allEntries.first {
condensedEntries.append(first)
// Include last entry (most recent context)
if allEntries.count > 1, let last = allEntries.last {
condensedEntries.append(last)
}
}
let condensedTranscript = Transcript(entries: condensedEntries)
return LanguageModelSession(transcript: condensedTranscript)
}// WWDC 301:3:55
Why this works:
- Instructions always preserved
- Recent context retained
- Total tokens drastically reduced
swift
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session with condensed history
session = condensedSession(from: session)
}
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let allEntries = previous.transcript.entries
var condensedEntries = [Transcript.Entry]()
// Always include first entry (instructions)
if let first = allEntries.first {
condensedEntries.append(first)
// Include last entry (most recent context)
if allEntries.count > 1, let last = allEntries.last {
condensedEntries.append(last)
}
}
let condensedTranscript = Transcript(entries: condensedEntries)
return LanguageModelSession(transcript: condensedTranscript)
}// WWDC 301:3:55
为什么有效:
- 始终保留指令
- 保留最近的上下文
- 大幅减少总token数
Advanced: Summarize Middle Entries
高级方案:总结中间对话记录
For long conversations where recent context isn't enough:
swift
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let entries = previous.transcript.entries
guard entries.count > 3 else {
return LanguageModelSession(transcript: previous.transcript)
}
// Keep first (instructions) and last (recent)
var condensedEntries = [entries.first!]
// Summarize middle entries
let middleEntries = Array(entries[1..<entries.count-1])
let summaryPrompt = """
Summarize this conversation in 2-3 sentences:
\(middleEntries.map { $0.content }.joined(separator: "\n"))
"""
// Use Foundation Models itself to summarize!
let summarySession = LanguageModelSession()
let summary = try await summarySession.respond(to: summaryPrompt)
condensedEntries.append(Transcript.Entry(content: summary.content))
condensedEntries.append(entries.last!)
return LanguageModelSession(transcript: Transcript(entries: condensedEntries))
}"You could summarize parts of transcript with Foundation Models itself."
对于需要更多上下文的长对话:
swift
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let entries = previous.transcript.entries
guard entries.count > 3 else {
return LanguageModelSession(transcript: previous.transcript)
}
// Keep first (instructions) and last (recent)
var condensedEntries = [entries.first!]
// Summarize middle entries
let middleEntries = Array(entries[1..<entries.count-1])
let summaryPrompt = """
Summarize this conversation in 2-3 sentences:
\(middleEntries.map { $0.content }.joined(separator: "\n"))
"""
// Use Foundation Models itself to summarize!
let summarySession = LanguageModelSession()
let summary = try await summarySession.respond(to: summaryPrompt)
condensedEntries.append(Transcript.Entry(content: summary.content))
condensedEntries.append(entries.last!)
return LanguageModelSession(transcript: Transcript(entries: condensedEntries))
}"你可以使用Foundation Models本身来总结对话记录的部分内容。"
Preventing Context Overflow
防止上下文溢出
1. Keep prompts concise:
swift
// ❌ BAD
let prompt = """
I want you to generate a comprehensive detailed analysis of this article
with multiple sections including summary, key points, sentiment analysis,
main arguments, counter arguments, logical fallacies, and conclusions...
"""
// ✅ GOOD
let prompt = "Summarize this article's key points"2. Use tools for data:
Instead of putting entire dataset in prompt, use tools to fetch on-demand.
3. Break complex tasks into steps:
swift
// ❌ BAD - One massive generation
let response = try await session.respond(
to: "Create 7-day itinerary with hotels, restaurants, activities..."
)
// ✅ GOOD - Multiple smaller generations
let overview = try await session.respond(to: "Create high-level 7-day plan")
for day in 1...7 {
let details = try await session.respond(to: "Detail activities for day \(day)")
}1. 保持提示词简洁:
swift
// ❌ BAD
let prompt = """
I want you to generate a comprehensive detailed analysis of this article
with multiple sections including summary, key points, sentiment analysis,
main arguments, counter arguments, logical fallacies, and conclusions...
"""
// ✅ GOOD
let prompt = "Summarize this article's key points"2. 使用工具获取数据:
不要将整个数据集放入提示词中,使用工具按需获取。
3. 将复杂任务拆分为步骤:
swift
// ❌ BAD - One massive generation
let response = try await session.respond(
to: "Create 7-day itinerary with hotels, restaurants, activities..."
)
// ✅ GOOD - Multiple smaller generations
let overview = try await session.respond(to: "Create high-level 7-day plan")
for day in 1...7 {
let details = try await session.respond(to: "Detail activities for day \(day)")
}Monitoring Context Usage
监控上下文使用情况
"Each token in instructions and prompt adds latency. Longer outputs take longer."
Use Instruments (Foundation Models template) to:
- See token counts
- Identify verbose prompts
- Optimize context usage
"指令和提示词中的每个token都会增加延迟。更长的输出需要更多时间。"
使用Instruments(Foundation Models模板)来:
- 查看token计数
- 识别冗长的提示词
- 优化上下文使用
- 量化改进效果
Time Cost
时间成本
Basic overflow handling: 5-10 minutes
Condensing strategy: 15-20 minutes
Advanced summarization: 30-40 minutes
基础溢出处理:5-10分钟
压缩策略:15-20分钟
高级总结:30-40分钟
Pattern 6: Sampling & Generation Options (~1000 words)
模式6:采样与生成选项(约1000词)
Use when: You need control over output randomness/determinism.
适用场景:你需要控制输出的随机性/确定性。
Understanding Sampling
理解采样
Model generates output one token at a time:
- Creates probability distribution for next token
- Samples from distribution
- Picks token
- Repeats
Default: Random sampling → Different output each time
模型逐个token生成输出:
- 为下一个token创建概率分布
- 从分布中采样
- 选择token
- 重复
默认:随机采样 → 每次输出不同
Deterministic Output (Greedy)
确定性输出(贪婪采样)
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(sampling: .greedy)
)// WWDC 301:6:14
Use cases:
- Repeatable demos
- Testing/debugging
- Consistent results required
Caveat: Only holds for same model version. OS updates may change output.
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(sampling: .greedy)
)// WWDC 301:6:14
适用场景:
- 可重复的演示
- 测试/调试
- 需要一致的结果
注意事项:仅在相同模型版本下有效。系统更新可能会改变输出。
Temperature Control
温度控制
Low variance (conservative, focused):
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 0.5)
)High variance (creative, diverse):
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 2.0)
)// WWDC 301:6:14
Temperature scale:
- : Very focused, predictable
0.1-0.5 - (default): Balanced
1.0 - : Creative, varied
1.5-2.0
Example use cases:
- Low temp: Fact extraction, classification
- High temp: Creative writing, brainstorming
低方差(保守、聚焦):
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 0.5)
)高方差(创意、多样):
swift
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 2.0)
)// WWDC 301:6:14
温度范围:
- : 非常聚焦、可预测
0.1-0.5 - (默认): 平衡
1.0 - : 创意、多样
1.5-2.0
示例场景:
- 低温度:事实提取、分类
- 高温度:创意写作、头脑风暴
When to Adjust Sampling
何时调整采样方式
✅ Greedy for:
- Unit tests
- Demos
- Consistency critical
✅ Low temperature for:
- Factual tasks
- Classification
- Extraction
✅ High temperature for:
- Creative content
- Story generation
- Varied NPC dialog
✅ 贪婪采样适合:
- 单元测试
- 演示
- 对一致性要求高的场景
✅ 低温度适合:
- 事实性任务
- 分类
- 提取
✅ 高温度适合:
- 创意内容
- 故事生成
- 多样的NPC对话
Time Cost
时间成本
Implementation: 2-3 minutes (one line change)
实现:2-3分钟(只需修改一行代码)
Pressure Scenarios
压力场景
Scenario 1: "Just Use ChatGPT API" (~1000 words)
场景1:"直接用ChatGPT API"(约1000词)
Context: You're implementing a new AI feature. PM suggests using ChatGPT API for "better results."
Pressure signals:
- 👔 Authority: PM outranks you
- 💸 Existing integration: Team already uses OpenAI for other features
- ⏰ Speed: "ChatGPT is proven, Foundation Models is new"
Rationalization traps:
- "PM knows best"
- "ChatGPT gives better answers"
- "Faster to implement with existing code"
Why this fails:
-
Privacy violation: User data sent to external server
- Medical notes, financial docs, personal messages
- Violates user expectation of on-device privacy
- Potential GDPR/privacy law issues
-
Cost: Every API call costs money
- Foundation Models is free
- Scale to millions of users = massive costs
-
Offline unavailable: Requires internet
- Airplane mode, poor signal → feature broken
- Foundation Models works offline
-
Latency: Network round-trip adds 500-2000ms
- Foundation Models: On-device, <100ms startup
When ChatGPT IS appropriate:
- World knowledge required (e.g. "Who is the president of France?")
- Complex reasoning (multi-step logic, math proofs)
- Very long context (>4096 tokens)
Mandatory response:
"I understand ChatGPT delivers great results for certain tasks. However,
for this feature, Foundation Models is the right choice for three critical reasons:
1. **Privacy**: This feature processes [medical notes/financial data/personal content].
Users expect this data stays on-device. Sending to external API violates that trust
and may have compliance issues.
2. **Cost**: At scale, ChatGPT API calls cost $X per 1000 requests. Foundation Models
is free. For Y million users, that's $Z annually we can avoid.
3. **Offline capability**: Foundation Models works without internet. Users in airplane
mode or with poor signal still get full functionality.
**When to use ChatGPT**: If this feature required world knowledge or complex reasoning,
ChatGPT would be the right choice. But this is [summarization/extraction/classification],
which is exactly what Foundation Models is optimized for.
**Time estimate**: Foundation Models implementation: 15-20 minutes.
Privacy compliance review for ChatGPT: 2-4 weeks."Time saved: Privacy compliance review vs correct implementation: 2-4 weeks vs 20 minutes
背景:你正在实现一个新的AI功能。产品经理建议使用ChatGPT API以获得“更好的结果”。
压力信号:
- 👔 权威:产品经理的职级比你高
- 💸 现有集成:团队已经在其他功能中使用OpenAI
- ⏰ 速度:"ChatGPT已经成熟,Foundation Models是新技术"
合理化陷阱:
- "产品经理最清楚"
- "ChatGPT的答案更好"
- "用现有代码实现更快"
为什么这种做法会失败:
-
隐私违规:用户数据被发送到外部服务器
- 医疗记录、财务文档、个人消息
- 违反用户对端侧隐私的期望
- 可能违反GDPR或其他隐私法规
-
成本:每次API调用都需要付费
- Foundation Models是免费的
- 扩展到数百万用户会产生巨额成本
-
离线不可用:需要互联网
- 飞行模式、信号差 → 功能失效
- Foundation Models可离线工作
-
延迟:网络往返会增加500-2000ms延迟
- Foundation Models:端侧运行,启动延迟<100ms
何时ChatGPT是合适的:
- 需要通用知识(例如"法国总统是谁?")
- 需要复杂推理(多步骤逻辑、数学证明)
- 需要非常长的上下文(>4096token)
必选回应:
"我理解ChatGPT在某些任务上表现出色。然而,对于这个功能,Foundation Models是更合适的选择,主要有三个关键原因:
1. **隐私**:该功能处理[医疗记录/财务数据/个人内容]。用户期望这些数据保留在设备上。发送到外部API会违背这种信任,还可能引发合规问题。
2. **成本**:大规模使用时,ChatGPT API每1000次请求需要花费$X。而Foundation Models是免费的。对于Y百万用户,我们可以每年节省$Z的成本。
3. **离线能力**:Foundation Models无需互联网即可工作。处于飞行模式或信号差的用户仍然可以使用完整功能。
**何时使用ChatGPT**:如果该功能需要通用知识或复杂推理,ChatGPT会是合适的选择。但当前功能是[摘要/提取/分类],这正是Foundation Models优化的场景。
**时间估算**:Foundation Models实现需要15-20分钟。而ChatGPT的隐私合规审查需要2-4周。"节省的时间:隐私合规审查(2-4周) vs 正确实现(20分钟)
Scenario 2: "Parse JSON Manually" (~1000 words)
场景2:"手动解析JSON"(约1000词)
Context: Teammate suggests prompting for JSON, parsing with JSONDecoder. Claims it's "simple and familiar."
Pressure signals:
- ⏰ Deadline: Ship in 2 days
- 📚 Familiarity: "Everyone knows JSON"
- 🔧 Existing code: Already have JSON parsing utilities
Rationalization traps:
- "JSON is standard"
- "We parse JSON everywhere already"
- "Faster than learning new API"
Why this fails:
-
Hallucinated keys: Model outputswhen you expect
{firstName: "John"}{name: "John"}- JSONDecoder crashes:
keyNotFound - No compile-time safety
- JSONDecoder crashes:
-
Invalid JSON: Model might output:
Here's the person: {name: "John", age: 30}- Not valid JSON (preamble text)
- Parsing fails
-
No type safety: Manual string parsing, prone to errors
Real-world example:
swift
// ❌ BAD - Will fail
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Model outputs: {"firstName": "John Smith", "years": 30}
// Your code expects: {"name": ..., "age": ...}
// CRASH: keyNotFound(name)Debugging time: 2-4 hours finding edge cases, writing parsing hacks
Correct approach:
swift
// ✅ GOOD - 15 minutes, guaranteed to work
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person, always validMandatory response:
"I understand JSON parsing feels familiar, but for LLM output, @Generable is objectively
better for three technical reasons:
1. **Constrained decoding guarantees structure**: Model can ONLY generate valid Person
instances. Impossible to get wrong keys, invalid JSON, or missing fields.
2. **No parsing code needed**: Framework handles parsing automatically. Zero chance of
parsing bugs.
3. **Compile-time safety**: If we change Person struct, compiler catches all issues.
Manual JSON parsing = runtime crashes.
**Real cost**: Manual JSON approach will hit edge cases. Debugging 'keyNotFound' crashes
takes 2-4 hours. @Generable implementation takes 15 minutes and has zero parsing bugs.
**Analogy**: This is like choosing Swift over Objective-C for new code. Both work, but
Swift's type safety prevents entire categories of bugs."Time saved: 4-8 hours debugging vs 15 minutes correct implementation
背景:同事建议提示生成JSON,然后用JSONDecoder解析。声称这“简单且熟悉”。
压力信号:
- ⏰ 截止日期:2天内上线
- 📚 熟悉度:"每个人都懂JSON"
- 🔧 现有代码:已经有JSON解析工具
合理化陷阱:
- "JSON是标准"
- "我们已经在各处解析JSON"
- "比学习新API更快"
为什么这种做法会失败:
-
键幻觉:模型输出,而你期望的是
{firstName: "John"}{name: "John"}- JSONDecoder会崩溃:
keyNotFound - 无编译时安全
- JSONDecoder会崩溃:
-
无效JSON:模型可能输出:
Here's the person: {name: "John", age: 30}- 不是有效的JSON(有前缀文本)
- 解析失败
-
无类型安全:手动字符串解析,容易出错
真实世界示例:
swift
// ❌ BAD - Will fail
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Model outputs: {"firstName": "John Smith", "years": 30}
// Your code expects: {"name": ..., "age": ...}
// CRASH: keyNotFound(name)调试时间:2-4小时寻找边缘案例,编写解析技巧
正确做法:
swift
// ✅ GOOD - 15 minutes, guaranteed to work
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person, always valid必选回应:
"我理解JSON解析感觉很熟悉,但对于大语言模型的输出,@Generable在技术上更优,主要有三个原因:
1. **约束解码保证结构**:模型只能生成有效的Person实例。不可能出现错误的键、无效JSON或缺失字段。
2. **无需解析代码**:框架会自动处理解析。完全没有解析错误的可能。
3. **编译时安全**:如果我们修改Person结构体,编译器会捕获所有问题。手动JSON解析会导致运行时崩溃。
**真实成本**:手动JSON方法会遇到边缘案例。调试'keyNotFound'崩溃需要2-4小时。而@Generable实现只需15分钟,且没有解析错误。
**类比**:这就像为新代码选择Swift而非Objective-C。两者都能工作,但Swift的类型安全可以避免一整类错误。"节省的时间:4-8小时调试 vs 15分钟正确实现
Scenario 3: "One Big Prompt" (~1000 words)
场景3:"一个大提示词"(约1000词)
Context: Feature requires extracting name, date, amount, category from invoice. Teammate suggests one prompt: "Extract all information."
Pressure signals:
- 🏗️ Architecture: "Simpler with one API call"
- ⏰ Speed: "Why make it complicated?"
- 📉 Complexity: "More prompts = more code"
Rationalization traps:
- "Simpler is better"
- "One prompt means less code"
- "Model is smart enough"
Why this fails:
- Context overflow: Complex prompt + large invoice → Exceeds 4096 tokens
- Poor results: Model tries to do too much at once, quality suffers
- Slow generation: One massive response takes 5-8 seconds
- All-or-nothing: If one field fails, entire generation fails
Better approach: Break into tasks + use tools
swift
// ❌ BAD - One massive prompt
let prompt = """
Extract from this invoice:
- Vendor name
- Invoice date
- Total amount
- Line items (description, quantity, price each)
- Payment terms
- Due date
- Tax amount
...
"""
// 4 seconds, poor quality, might exceed context
// ✅ GOOD - Structured extraction with focused prompts
@Generable
struct InvoiceBasics {
let vendor: String
let date: String
let amount: Double
}
let basics = try await session.respond(
to: "Extract vendor, date, and amount",
generating: InvoiceBasics.self
) // 0.5 seconds, axiom-high quality
@Generable
struct LineItem {
let description: String
let quantity: Int
let price: Double
}
let items = try await session.respond(
to: "Extract line items",
generating: [LineItem].self
) // 1 second, axiom-high quality
// Total: 1.5 seconds, better quality, graceful partial failuresMandatory response:
"I understand the appeal of one simple API call. However, this specific task requires
a different approach:
1. **Context limits**: Invoice + complex extraction prompt will likely exceed 4096 token
limit. Multiple focused prompts stay well under limit.
2. **Better quality**: Model performs better with focused tasks. 'Extract vendor name'
gets 95%+ accuracy. 'Extract everything' gets 60-70%.
3. **Faster perceived performance**: Multiple prompts with streaming show progressive
results. Users see vendor name in 0.5s, not waiting 5s for everything.
4. **Graceful degradation**: If line items fail, we still have basics. All-or-nothing
approach means total failure.
**Implementation**: Breaking into 3-4 focused extractions takes 30 minutes. One big
prompt takes 2-3 hours debugging why it hits context limit and produces poor results."Time saved: 2-3 hours debugging vs 30 minutes proper design
背景:功能需要从发票中提取姓名、日期、金额、类别。同事建议使用一个提示词:"提取所有信息。"
压力信号:
- 🏗️ 架构:"一个API调用更简单"
- ⏰ 速度:"为什么要复杂化?"
- 📉 复杂度:"更多提示词意味着更多代码"
合理化陷阱:
- "越简单越好"
- "一个提示词意味着更少的代码"
- "模型足够智能"
为什么这种做法会失败:
- 上下文溢出:复杂提示词 + 大发票 → 超过4096token限制
- 结果质量差:模型试图一次完成太多任务,质量下降
- 生成缓慢:一个庞大的响应需要5-8秒
- 全有或全无:如果一个字段失败,整个生成都会失败
更好的做法:拆分为任务 + 使用工具
swift
// ❌ BAD - One massive prompt
let prompt = """
Extract from this invoice:
- Vendor name
- Invoice date
- Total amount
- Line items (description, quantity, price each)
- Payment terms
- Due date
- Tax amount
...
"""
// 4 seconds, poor quality, might exceed context
// ✅ GOOD - Structured extraction with focused prompts
@Generable
struct InvoiceBasics {
let vendor: String
let date: String
let amount: Double
}
let basics = try await session.respond(
to: "Extract vendor, date, and amount",
generating: InvoiceBasics.self
) // 0.5 seconds, 高质量
@Generable
struct LineItem {
let description: String
let quantity: Int
let price: Double
}
let items = try await session.respond(
to: "Extract line items",
generating: [LineItem].self
) // 1 second, 高质量
// Total: 1.5 seconds, better quality, graceful partial failures必选回应:
"我理解一个简单API调用的吸引力。然而,这个特定任务需要不同的方法:
1. **上下文限制**:发票 + 复杂提取提示词很可能会超过4096token限制。多个聚焦的提示词会保持在限制内。
2. **更好的质量**:模型在处理聚焦任务时表现更好。'提取供应商名称'的准确率可达95%以上。而'提取所有信息'的准确率只有60-70%。
3. **更快的感知性能**:多个提示词结合流式传输可以逐步显示结果。用户会在0.5秒内看到供应商名称,而不是等待5秒才能看到所有内容。
4. **优雅降级**:如果行项目提取失败,我们仍然可以获取基础信息。全有或全无的方法会导致完全失败。
**实现**:拆分为3-4个聚焦的提取任务需要30分钟。而一个大提示词需要2-3小时调试,解决上下文限制和结果质量差的问题。"节省的时间:2-3小时调试 vs 30分钟合理设计
Performance Optimization
性能优化
1. Prewarm Session (~200 words)
1. 预启动会话(约200词)
Problem: First generation takes 1-2 seconds just to load model.
Solution: Create session before user interaction.
swift
class ViewModel: ObservableObject {
private var session: LanguageModelSession?
init() {
// Prewarm on init, not when user taps button
Task {
self.session = LanguageModelSession(instructions: "...")
}
}
func generate(prompt: String) async throws -> String {
let response = try await session!.respond(to: prompt)
return response.content
}
}"Prewarming session before user interaction reduces initial latency."
Time saved: 1-2 seconds off first generation
问题:首次生成需要1-2秒加载模型。
解决方案:在用户交互之前创建会话。
swift
class ViewModel: ObservableObject {
private var session: LanguageModelSession?
init() {
// 在初始化时预启动,而不是用户点击按钮时
Task {
self.session = LanguageModelSession(instructions: "...")
}
}
func generate(prompt: String) async throws -> String {
let response = try await session!.respond(to: prompt)
return response.content
}
}"在用户交互之前预启动会话可以减少初始延迟。"
节省的时间:首次生成减少1-2秒延迟
2. includeSchemaInPrompt: false (~200 words)
2. includeSchemaInPrompt: false(约200词)
Problem: @Generable schemas inserted into prompt, increases token count.
Solution: For subsequent requests with same schema, skip insertion.
swift
let firstResponse = try await session.respond(
to: "Generate first person",
generating: Person.self
// Schema inserted automatically
)
// Subsequent requests with SAME schema
let secondResponse = try await session.respond(
to: "Generate another person",
generating: Person.self,
options: GenerationOptions(includeSchemaInPrompt: false)
)"Setting includeSchemaInPrompt to false decreases token count and latency for subsequent requests."
When to use: Multi-turn with same @Generable type
Time saved: 10-20% latency reduction per request
问题:@Generable架构会被插入到提示词中,增加token计数。
解决方案:对于后续使用相同架构的请求,跳过插入。
swift
let firstResponse = try await session.respond(
to: "Generate first person",
generating: Person.self
// 架构会自动插入
)
// 后续使用相同架构的请求
let secondResponse = try await session.respond(
to: "Generate another person",
generating: Person.self,
options: GenerationOptions(includeSchemaInPrompt: false)
)"将includeSchemaInPrompt设置为false可以减少后续请求的token计数和延迟。"
何时使用:使用相同@Generable类型的多轮对话
节省的时间:每个请求减少10-20%延迟
3. Property Order for Streaming UX (~200 words)
3. 为流式传输UX优化属性顺序(约200词)
Problem: User waits for entire generation.
Solution: Put important properties first, stream to show early.
swift
// ✅ GOOD - Title shows immediately
@Generable
struct Article {
var title: String // Shows in 0.2s
var summary: String // Shows in 0.8s
var fullText: String // Shows in 2.5s
}
// ❌ BAD - Wait for everything
@Generable
struct Article {
var fullText: String // User waits 2.5s
var title: String
var summary: String
}UX impact: Perceived latency drops from 2.5s to 0.2s
问题:用户需要等待整个生成过程。
解决方案:将重要属性放在前面,通过流式传输提前显示。
swift
// ✅ GOOD - Title shows immediately
@Generable
struct Article {
var title: String // 0.2秒内显示
var summary: String // 0.8秒内显示
var fullText: String // 2.5秒内显示
}
// ❌ BAD - Wait for everything
@Generable
struct Article {
var fullText: String // 用户需要等待2.5秒
var title: String
var summary: String
}UX影响:感知延迟从2.5秒降至0.2秒
4. Foundation Models Instrument (~100 words)
4. Foundation Models性能分析工具(约100词)
Use Instruments app with Foundation Models template to:
- Profile latency of each request
- See token counts (input/output)
- Identify optimization opportunities
- Quantify improvements
"New Instruments profiling template lets you observe areas of optimization and quantify improvements."
Access: Instruments → Create → Foundation Models template
使用Instruments应用的Foundation Models模板来:
- 分析每个请求的延迟
- 查看token计数(输入/输出)
- 识别优化机会
- 量化改进效果
"新的Instruments性能分析模板可以让你观察优化空间并量化改进效果。"
访问方式:Instruments → 创建 → Foundation Models模板
Checklist
检查清单
Before shipping Foundation Models features:
在发布Foundation Models功能之前:
Required Checks
必选检查
- Availability checked before creating session
- Using @Generable for structured output (not manual JSON)
- Handling context overflow ()
exceededContextWindowSize - Handling guardrail violations ()
guardrailViolation - Handling unsupported language ()
unsupportedLanguageOrLocale - Streaming for long generations (>1 second)
- Not blocking UI (using for async)
Task {} - Tools for external data (not prompting for weather/locations)
- Prewarmed session if latency-sensitive
- 已检查可用性,然后再创建会话
- 使用@Generable处理结构化输出(而非手动JSON)
- 已处理上下文溢出()
exceededContextWindowSize - 已处理内容规范违规()
guardrailViolation - 已处理不支持的语言()
unsupportedLanguageOrLocale - 长生成任务使用流式传输(>1秒)
- 未阻塞UI(使用处理异步)
Task {} - 使用工具获取外部数据(而非提示获取天气/地点)
- 已预启动会话(如果对延迟敏感)
Best Practices
最佳实践
- Instructions are concise (not verbose)
- Never interpolating user input into instructions
- Property order optimized for streaming UX
- Using appropriate temperature/sampling
- Tested on real device (not just simulator)
- Profiled with Instruments (Foundation Models template)
- Error handling shows graceful UI messages
- Tested offline (airplane mode)
- Tested with long conversations (context handling)
- 指令简洁(不冗长)
- 从未将用户输入插入到指令中
- 属性顺序针对流式传输UX进行了优化
- 使用了合适的温度/采样方式
- 在真实设备上测试(而非仅模拟器)
- 使用Instruments进行性能分析(Foundation Models模板)
- 错误处理显示优雅的UI提示
- 已离线测试(飞行模式)
- 已测试长对话(上下文处理)
Model Capability
模型能力
- Not using for world knowledge
- Not using for complex reasoning
- Use case is: summarization, extraction, classification, or generation
- Have fallback if unavailable (show message, disable feature)
- 未用于通用知识
- 未用于复杂推理
- 使用场景为:摘要、提取、分类或生成
- 有不可用时的回退方案(显示提示、禁用功能)
Resources
资源
WWDC: 286, 259, 301
Skills: axiom-foundation-models-diag, axiom-foundation-models-ref
Last Updated: 2025-12-03
Version: 1.0.0
Target: iOS 26+, macOS 26+, iPadOS 26+, axiom-visionOS 26+
WWDC:286, 259, 301
技能:axiom-foundation-models-diag, axiom-foundation-models-ref
最后更新:2025-12-03
版本:1.0.0
支持平台:iOS 26+, macOS 26+, iPadOS 26+, visionOS 26+