foundation-models-on-device

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

FoundationModels: On-Device LLM (iOS 26)

FoundationModels:设备端LLM(iOS 26)

Patterns for integrating Apple's on-device language model into apps using the FoundationModels framework. Covers text generation, structured output with
@Generable
, custom tool calling, and snapshot streaming — all running on-device for privacy and offline support.
本文介绍如何使用FoundationModels框架将Apple设备端语言模型集成到应用中的实践方案,涵盖文本生成、基于
@Generable
的结构化输出、自定义工具调用以及快照流功能——所有功能均在设备端运行,保障隐私并支持离线使用。

When to Activate

适用场景

  • Building AI-powered features using Apple Intelligence on-device
  • Generating or summarizing text without cloud dependency
  • Extracting structured data from natural language input
  • Implementing custom tool calling for domain-specific AI actions
  • Streaming structured responses for real-time UI updates
  • Need privacy-preserving AI (no data leaves the device)
  • 基于Apple Intelligence在设备端构建AI功能
  • 无需依赖云端即可生成或总结文本
  • 从自然语言输入中提取结构化数据
  • 为特定领域AI操作实现自定义工具调用
  • 流式传输结构化响应以实现实时UI更新
  • 需要隐私保护型AI(数据不会离开设备)

Core Pattern — Availability Check

核心模式——可用性检查

Always check model availability before creating a session:
swift
struct GenerativeView: View {
    private var model = SystemLanguageModel.default

    var body: some View {
        switch model.availability {
        case .available:
            ContentView()
        case .unavailable(.deviceNotEligible):
            Text("Device not eligible for Apple Intelligence")
        case .unavailable(.appleIntelligenceNotEnabled):
            Text("Please enable Apple Intelligence in Settings")
        case .unavailable(.modelNotReady):
            Text("Model is downloading or not ready")
        case .unavailable(let other):
            Text("Model unavailable: \(other)")
        }
    }
}
在创建会话前务必检查模型可用性:
swift
struct GenerativeView: View {
    private var model = SystemLanguageModel.default

    var body: some View {
        switch model.availability {
        case .available:
            ContentView()
        case .unavailable(.deviceNotEligible):
            Text("Device not eligible for Apple Intelligence")
        case .unavailable(.appleIntelligenceNotEnabled):
            Text("Please enable Apple Intelligence in Settings")
        case .unavailable(.modelNotReady):
            Text("Model is downloading or not ready")
        case .unavailable(let other):
            Text("Model unavailable: \(other)")
        }
    }
}

Core Pattern — Basic Session

核心模式——基础会话

swift
// Single-turn: create a new session each time
let session = LanguageModelSession()
let response = try await session.respond(to: "What's a good month to visit Paris?")
print(response.content)

// Multi-turn: reuse session for conversation context
let session = LanguageModelSession(instructions: """
    You are a cooking assistant.
    Provide recipe suggestions based on ingredients.
    Keep suggestions brief and practical.
    """)

let first = try await session.respond(to: "I have chicken and rice")
let followUp = try await session.respond(to: "What about a vegetarian option?")
Key points for instructions:
  • Define the model's role ("You are a mentor")
  • Specify what to do ("Help extract calendar events")
  • Set style preferences ("Respond as briefly as possible")
  • Add safety measures ("Respond with 'I can't help with that' for dangerous requests")
swift
// 单轮会话:每次创建新会话
let session = LanguageModelSession()
let response = try await session.respond(to: "What's a good month to visit Paris?")
print(response.content)

// 多轮会话:复用会话以保留对话上下文
let session = LanguageModelSession(instructions: """
    You are a cooking assistant.
    Provide recipe suggestions based on ingredients.
    Keep suggestions brief and practical.
    """)

let first = try await session.respond(to: "I have chicken and rice")
let followUp = try await session.respond(to: "What about a vegetarian option?")
指令设置要点:
  • 定义模型角色(如“你是一位导师”)
  • 指定任务内容(如“帮助提取日历事件”)
  • 设置风格偏好(如“尽可能简洁地回复”)
  • 添加安全规则(如“对于危险请求,回复‘我无法提供帮助’”)

Core Pattern — Guided Generation with @Generable

核心模式——基于@Generable的引导式生成

Generate structured Swift types instead of raw strings:
生成结构化Swift类型而非原始字符串:

1. Define a Generable Type

1. 定义可生成类型

swift
@Generable(description: "Basic profile information about a cat")
struct CatProfile {
    var name: String

    @Guide(description: "The age of the cat", .range(0...20))
    var age: Int

    @Guide(description: "A one sentence profile about the cat's personality")
    var profile: String
}
swift
@Generable(description: "Basic profile information about a cat")
struct CatProfile {
    var name: String

    @Guide(description: "The age of the cat", .range(0...20))
    var age: Int

    @Guide(description: "A one sentence profile about the cat's personality")
    var profile: String
}

2. Request Structured Output

2. 请求结构化输出

swift
let response = try await session.respond(
    to: "Generate a cute rescue cat",
    generating: CatProfile.self
)

// Access structured fields directly
print("Name: \(response.content.name)")
print("Age: \(response.content.age)")
print("Profile: \(response.content.profile)")
swift
let response = try await session.respond(
    to: "Generate a cute rescue cat",
    generating: CatProfile.self
)

// 直接访问结构化字段
print("Name: \(response.content.name)")
print("Age: \(response.content.age)")
print("Profile: \(response.content.profile)")

Supported @Guide Constraints

支持的@Guide约束

  • .range(0...20)
    — numeric range
  • .count(3)
    — array element count
  • description:
    — semantic guidance for generation
  • .range(0...20)
    — 数值范围
  • .count(3)
    — 数组元素数量
  • description:
    — 生成的语义引导

Core Pattern — Tool Calling

核心模式——工具调用

Let the model invoke custom code for domain-specific tasks:
让模型调用自定义代码以完成特定领域任务:

1. Define a Tool

1. 定义工具

swift
struct RecipeSearchTool: Tool {
    let name = "recipe_search"
    let description = "Search for recipes matching a given term and return a list of results."

    @Generable
    struct Arguments {
        var searchTerm: String
        var numberOfResults: Int
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        let recipes = await searchRecipes(
            term: arguments.searchTerm,
            limit: arguments.numberOfResults
        )
        return .string(recipes.map { "- \($0.name): \($0.description)" }.joined(separator: "\n"))
    }
}
swift
struct RecipeSearchTool: Tool {
    let name = "recipe_search"
    let description = "Search for recipes matching a given term and return a list of results."

    @Generable
    struct Arguments {
        var searchTerm: String
        var numberOfResults: Int
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        let recipes = await searchRecipes(
            term: arguments.searchTerm,
            limit: arguments.numberOfResults
        )
        return .string(recipes.map { "- \($0.name): \($0.description)" }.joined(separator: "\n"))
    }
}

2. Create Session with Tools

2. 创建带工具的会话

swift
let session = LanguageModelSession(tools: [RecipeSearchTool()])
let response = try await session.respond(to: "Find me some pasta recipes")
swift
let session = LanguageModelSession(tools: [RecipeSearchTool()])
let response = try await session.respond(to: "Find me some pasta recipes")

3. Handle Tool Errors

3. 处理工具错误

swift
do {
    let answer = try await session.respond(to: "Find a recipe for tomato soup.")
} catch let error as LanguageModelSession.ToolCallError {
    print(error.tool.name)
    if case .databaseIsEmpty = error.underlyingError as? RecipeSearchToolError {
        // Handle specific tool error
    }
}
swift
do {
    let answer = try await session.respond(to: "Find a recipe for tomato soup.")
} catch let error as LanguageModelSession.ToolCallError {
    print(error.tool.name)
    if case .databaseIsEmpty = error.underlyingError as? RecipeSearchToolError {
        // 处理特定工具错误
    }
}

Core Pattern — Snapshot Streaming

核心模式——快照流

Stream structured responses for real-time UI with
PartiallyGenerated
types:
swift
@Generable
struct TripIdeas {
    @Guide(description: "Ideas for upcoming trips")
    var ideas: [String]
}

let stream = session.streamResponse(
    to: "What are some exciting trip ideas?",
    generating: TripIdeas.self
)

for try await partial in stream {
    // partial: TripIdeas.PartiallyGenerated (all properties Optional)
    print(partial)
}
使用
PartiallyGenerated
类型流式传输结构化响应,实现实时UI更新:
swift
@Generable
struct TripIdeas {
    @Guide(description: "Ideas for upcoming trips")
    var ideas: [String]
}

let stream = session.streamResponse(
    to: "What are some exciting trip ideas?",
    generating: TripIdeas.self
)

for try await partial in stream {
    // partial: TripIdeas.PartiallyGenerated(所有属性为可选类型)
    print(partial)
}

SwiftUI Integration

SwiftUI集成

swift
@State private var partialResult: TripIdeas.PartiallyGenerated?
@State private var errorMessage: String?

var body: some View {
    List {
        ForEach(partialResult?.ideas ?? [], id: \.self) { idea in
            Text(idea)
        }
    }
    .overlay {
        if let errorMessage { Text(errorMessage).foregroundStyle(.red) }
    }
    .task {
        do {
            let stream = session.streamResponse(to: prompt, generating: TripIdeas.self)
            for try await partial in stream {
                partialResult = partial
            }
        } catch {
            errorMessage = error.localizedDescription
        }
    }
}
swift
@State private var partialResult: TripIdeas.PartiallyGenerated?
@State private var errorMessage: String?

var body: some View {
    List {
        ForEach(partialResult?.ideas ?? [], id: \.self) { idea in
            Text(idea)
        }
    }
    .overlay {
        if let errorMessage { Text(errorMessage).foregroundStyle(.red) }
    }
    .task {
        do {
            let stream = session.streamResponse(to: prompt, generating: TripIdeas.self)
            for try await partial in stream {
                partialResult = partial
            }
        } catch {
            errorMessage = error.localizedDescription
        }
    }
}

Key Design Decisions

关键设计决策

DecisionRationale
On-device executionPrivacy — no data leaves the device; works offline
4,096 token limitOn-device model constraint; chunk large data across sessions
Snapshot streaming (not deltas)Structured output friendly; each snapshot is a complete partial state
@Generable
macro
Compile-time safety for structured generation; auto-generates
PartiallyGenerated
type
Single request per session
isResponding
prevents concurrent requests; create multiple sessions if needed
response.content
(not
.output
)
Correct API — always access results via
.content
property
决策理由
设备端执行隐私保护——数据不会离开设备;支持离线使用
4096令牌限制设备端模型约束;可跨会话拆分大型数据
快照流(而非增量流)适配结构化输出;每个快照都是完整的部分状态
@Generable
为结构化生成提供编译时安全保障;自动生成
PartiallyGenerated
类型
单会话单次请求
isResponding
防止并发请求;如需并发可创建多个会话
使用
response.content
(而非
.output
正确的API使用方式——始终通过
.content
属性访问结果

Best Practices

最佳实践

  • Always check
    model.availability
    before creating a session — handle all unavailability cases
  • Use
    instructions
    to guide model behavior — they take priority over prompts
  • Check
    isResponding
    before sending a new request — sessions handle one request at a time
  • Access
    response.content
    for results — not
    .output
  • Break large inputs into chunks — 4,096 token limit applies to instructions + prompt + output combined
  • Use
    @Generable
    for structured output — stronger guarantees than parsing raw strings
  • Use
    GenerationOptions(temperature:)
    to tune creativity (higher = more creative)
  • Monitor with Instruments — use Xcode Instruments to profile request performance
  • 创建会话前务必检查
    model.availability
    ——处理所有不可用场景
  • **使用
    instructions
    **引导模型行为——其优先级高于提示词
  • 发送新请求前检查
    isResponding
    ——会话同一时间仅处理一个请求
  • 通过
    response.content
    访问结果
    ——而非
    .output
  • 将大型输入拆分为多个块——4096令牌限制适用于指令+提示词+输出的总和
  • **使用
    @Generable
    **实现结构化输出——比解析原始字符串更可靠
  • **使用
    GenerationOptions(temperature:)
    **调整创造力(值越高越有创意)
  • 用Instruments监控——使用Xcode Instruments分析请求性能

Anti-Patterns to Avoid

需避免的反模式

  • Creating sessions without checking
    model.availability
    first
  • Sending inputs exceeding the 4,096 token context window
  • Attempting concurrent requests on a single session
  • Using
    .output
    instead of
    .content
    to access response data
  • Parsing raw string responses when
    @Generable
    structured output would work
  • Building complex multi-step logic in a single prompt — break into multiple focused prompts
  • Assuming the model is always available — device eligibility and settings vary
  • 未检查
    model.availability
    就创建会话
  • 发送超过4096令牌上下文窗口的输入
  • 尝试在单个会话上发送并发请求
  • 使用
    .output
    而非
    .content
    访问响应数据
  • @Generable
    结构化输出可用时仍解析原始字符串响应
  • 在单个提示词中构建复杂的多步逻辑——拆分为多个聚焦的提示词
  • 假设模型始终可用——设备兼容性和设置各不相同

When to Use

适用场景

  • On-device text generation for privacy-sensitive apps
  • Structured data extraction from user input (forms, natural language commands)
  • AI-assisted features that must work offline
  • Streaming UI that progressively shows generated content
  • Domain-specific AI actions via tool calling (search, compute, lookup)
  • 隐私敏感型应用中的设备端文本生成
  • 从用户输入(表单、自然语言命令)中提取结构化数据
  • 必须支持离线使用的AI辅助功能
  • 逐步展示生成内容的流式UI
  • 通过工具调用实现特定领域AI操作(搜索、计算、查询)