speech-recognition
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpeech Recognition
语音识别
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers (iOS 10+) and the new API (iOS 26+).
SFSpeechRecognizerSpeechAnalyzer使用苹果的Speech框架将实时音频和预录制音频转换为文本。内容涵盖(iOS 10+)以及iOS 26+新增的 API。
SFSpeechRecognizerSpeechAnalyzerContents
目录
SpeechAnalyzer (iOS 26+)
SpeechAnalyzer(iOS 26+)
SpeechAnalyzerSFSpeechRecognizerAsyncSequenceSpeechTranscriberSpeechAnalyzerSFSpeechRecognizerAsyncSequenceSpeechTranscriberBasic transcription with SpeechAnalyzer
使用SpeechAnalyzer实现基础转录
swift
import Speech
// 1. Create a transcriber module
guard let locale = SpeechTranscriber.supportedLocale(
equivalentTo: Locale.current
) else { return }
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
// 2. Ensure assets are installed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// 3. Create input stream and analyzer
let (inputSequence, inputBuilder) = AsyncStream.makeStream(of: AnalyzerInput.self)
let audioFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
let analyzer = SpeechAnalyzer(modules: [transcriber])
// 4. Feed audio buffers (from AVAudioEngine or file)
Task {
// Append PCM buffers converted to audioFormat
let pcmBuffer: AVAudioPCMBuffer = // ... your audio buffer
inputBuilder.yield(AnalyzerInput(buffer: pcmBuffer))
inputBuilder.finish()
}
// 5. Consume results
Task {
for try await result in transcriber.results {
let text = String(result.text.characters)
print(text)
}
}
// 6. Run analysis
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
// 7. Finalize
if let lastSampleTime {
try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
try analyzer.cancelAndFinishNow()
}swift
import Speech
// 1. 创建转录器模块
guard let locale = SpeechTranscriber.supportedLocale(
equivalentTo: Locale.current
) else { return }
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
// 2. 确保资源包已安装
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// 3. 创建输入流和分析器
let (inputSequence, inputBuilder) = AsyncStream.makeStream(of: AnalyzerInput.self)
let audioFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
let analyzer = SpeechAnalyzer(modules: [transcriber])
// 4. 传入音频缓冲区(来自AVAudioEngine或音频文件)
Task {
// 追加转换为audioFormat格式的PCM缓冲区
let pcmBuffer: AVAudioPCMBuffer = // ... 你的音频缓冲区
inputBuilder.yield(AnalyzerInput(buffer: pcmBuffer))
inputBuilder.finish()
}
// 5. 消费识别结果
Task {
for try await result in transcriber.results {
let text = String(result.text.characters)
print(text)
}
}
// 6. 运行分析
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
// 7. 结束处理
if let lastSampleTime {
try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
try analyzer.cancelAndFinishNow()
}Transcribing an audio file with SpeechAnalyzer
使用SpeechAnalyzer转录音频文件
swift
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
let audioFile = try AVAudioFile(forReading: fileURL)
let analyzer = SpeechAnalyzer(
inputAudioFile: audioFile, modules: [transcriber], finishAfterFile: true
)
for try await result in transcriber.results {
print(String(result.text.characters))
}swift
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
let audioFile = try AVAudioFile(forReading: fileURL)
let analyzer = SpeechAnalyzer(
inputAudioFile: audioFile, modules: [transcriber], finishAfterFile: true
)
for try await result in transcriber.results {
print(String(result.text.characters))
}Key differences from SFSpeechRecognizer
与SFSpeechRecognizer的核心差异
| Feature | SFSpeechRecognizer | SpeechAnalyzer |
|---|---|---|
| Concurrency | Callbacks/delegates | async/await + AsyncSequence |
| Type | | |
| Modules | Monolithic | Composable ( |
| Audio input | | |
| Availability | iOS 10+ | iOS 26+ |
| On-device | | Asset-based via |
| 功能 | SFSpeechRecognizer | SpeechAnalyzer |
|---|---|---|
| 并发模型 | 回调/代理 | async/await + AsyncSequence |
| 类型 | | |
| 模块设计 | 单体架构 | 可组合( |
| 音频输入 | 请求对象调用 | |
| 可用版本 | iOS 10+ | iOS 26+ |
| 端侧识别 | | 通过 |
SFSpeechRecognizer Setup
SFSpeechRecognizer 配置
Creating a recognizer with locale
创建指定locale的识别器
swift
import Speech
// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()
// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
print("Speech recognition not available")
return
}swift
import Speech
// 默认locale(用户当前语言)
let recognizer = SFSpeechRecognizer()
// 指定locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// 检查当前locale是否支持语音识别
guard let recognizer, recognizer.isAvailable else {
print("语音识别不可用")
return
}Monitoring availability changes
监听可用性变化
swift
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// Update UI — disable record button when unavailable
}
}swift
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// 更新UI — 不可用时禁用录音按钮
}
}Authorization
授权
Request both speech recognition and microphone permissions before starting
live transcription. Add these keys to :
Info.plistNSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescription
swift
import Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}开启实时转录前需要同时申请语音识别和麦克风权限。请在中添加以下键:
Info.plistNSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescription
swift
import Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}Live Microphone Transcription
实时麦克风转录
The standard pattern: captures microphone audio → buffers are
appended to → results stream in.
AVAudioEngineSFSpeechAudioBufferRecognitionRequestswift
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// Cancel any in-progress task
recognitionTask?.cancel()
recognitionTask = nil
// Configure audio session
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// Create request
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
self.recognitionRequest = request
// Start recognition task
recognitionTask = recognizer.recognitionTask(with: request) { result, error in
if let result {
let text = result.bestTranscription.formattedString
print("Transcription: \(text)")
if result.isFinal {
self.stopTranscribing()
}
}
if let error {
print("Recognition error: \(error)")
self.stopTranscribing()
}
}
// Install audio tap
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
func stopTranscribing() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
}
}标准实现流程:捕获麦克风音频 → 缓冲区追加到 → 流式返回识别结果。
AVAudioEngineSFSpeechAudioBufferRecognitionRequestswift
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// 取消进行中的识别任务
recognitionTask?.cancel()
recognitionTask = nil
// 配置音频会话
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// 创建识别请求
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
self.recognitionRequest = request
// 启动识别任务
recognitionTask = recognizer.recognitionTask(with: request) { result, error in
if let result {
let text = result.bestTranscription.formattedString
print("转录结果: \(text)")
if result.isFinal {
self.stopTranscribing()
}
}
if let error {
print("识别错误: \(error)")
self.stopTranscribing()
}
}
// 安装音频tap
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
func stopTranscribing() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
}
}Pre-Recorded Audio File Recognition
预录制音频文件识别
Use for audio files on disk:
SFSpeechURLRecognitionRequestswift
func transcribeFile(at url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw SpeechError.unavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
recognizer.recognitionTask(with: request) { result, error in
if let error {
continuation.resume(throwing: error)
} else if let result, result.isFinal {
continuation.resume(
returning: result.bestTranscription.formattedString
)
}
}
}
}使用识别本地存储的音频文件:
SFSpeechURLRecognitionRequestswift
func transcribeFile(at url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw SpeechError.unavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
recognizer.recognitionTask(with: request) { result, error in
if let error {
continuation.resume(throwing: error)
} else if let result, result.isFinal {
continuation.resume(
returning: result.bestTranscription.formattedString
)
}
}
}
}On-Device vs Server Recognition
端侧识别 vs 服务端识别
On-device recognition (iOS 13+) works offline but supports fewer locales:
swift
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
// Check if on-device is supported for this locale
if recognizer.supportsOnDeviceRecognition {
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Force on-device
}Tip: On-device recognition avoids network latency and the one-minute audio limit imposed by server-based recognition. However, accuracy may be lower and not all locales are supported. Checkbefore forcing on-device mode.supportsOnDeviceRecognition
端侧识别(iOS 13+)支持离线使用,但支持的locale较少:
swift
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
// 检查当前locale是否支持端侧识别
if recognizer.supportsOnDeviceRecognition {
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // 强制使用端侧识别
}提示: 端侧识别无需网络,也没有服务端识别的1分钟音频时长限制。但准确率可能更低,且不是所有locale都支持。强制开启端侧识别前请先检查。supportsOnDeviceRecognition
Handling Results
处理识别结果
Partial vs final results
部分结果 vs 最终结果
swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true // default is true
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
if result.isFinal {
// Final transcription — recognition is complete
let final = result.bestTranscription.formattedString
} else {
// Partial result — may change as more audio is processed
let partial = result.bestTranscription.formattedString
}
}swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true // 默认值为true
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
if result.isFinal {
// 最终转录结果 — 识别已完成
let final = result.bestTranscription.formattedString
} else {
// 部分结果 — 后续处理更多音频时可能会变化
let partial = result.bestTranscription.formattedString
}
}Accessing alternative transcriptions and confidence
获取备选转录结果和置信度
swift
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
// Best transcription
let best = result.bestTranscription
// All alternatives (sorted by confidence, descending)
for transcription in result.transcriptions {
for segment in transcription.segments {
print("\(segment.substring): \(segment.confidence)")
}
}
}swift
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
// 最优转录结果
let best = result.bestTranscription
// 所有备选结果(按置信度降序排序)
for transcription in result.transcriptions {
for segment in transcription.segments {
print("\(segment.substring): \(segment.confidence)")
}
}
}Adding punctuation (iOS 16+)
自动添加标点(iOS 16+)
swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = trueswift
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = trueContextual strings
上下文关键词
Improve recognition of domain-specific terms:
swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]提升领域专有术语的识别准确率:
swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]Common Mistakes
常见错误
Not requesting both speech and microphone authorization
未同时申请语音识别和麦克风权限
swift
// ❌ DON'T: Only request speech authorization for live audio
SFSpeechRecognizer.requestAuthorization { status in
// Missing microphone permission — audio engine will fail
self.startRecording()
}
// ✅ DO: Request both permissions before recording
SFSpeechRecognizer.requestAuthorization { status in
guard status == .authorized else { return }
AVAudioSession.sharedInstance().requestRecordPermission { granted in
guard granted else { return }
self.startRecording()
}
}swift
// ❌ 错误做法:实时转录仅申请语音识别权限
SFSpeechRecognizer.requestAuthorization { status in
// 缺少麦克风权限 — 音频引擎会启动失败
self.startRecording()
}
// ✅ 正确做法:录音前同时申请两个权限
SFSpeechRecognizer.requestAuthorization { status in
guard status == .authorized else { return }
AVAudioSession.sharedInstance().requestRecordPermission { granted in
guard granted else { return }
self.startRecording()
}
}Not handling availability changes
未处理可用性变化
swift
// ❌ DON'T: Assume recognizer stays available after initial check
let recognizer = SFSpeechRecognizer()!
// Recognition may fail if network drops or locale changes
// ✅ DO: Monitor availability via delegate
recognizer.delegate = self
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
recordButton.isEnabled = available
}swift
// ❌ 错误做法:初始检查通过后就默认识别器一直可用
let recognizer = SFSpeechRecognizer()!
// 网络断开或locale变化时识别可能失败
// ✅ 正确做法:通过代理监听可用性变化
recognizer.delegate = self
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
recordButton.isEnabled = available
}Not stopping the audio engine when recognition ends
识别结束后未停止音频引擎
swift
// ❌ DON'T: Leave audio engine running after recognition finishes
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true {
// Audio engine still running, wasting resources and battery
}
}
// ✅ DO: Clean up all audio resources
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true || error != nil {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
}
}swift
// ❌ 错误做法:识别完成后让音频引擎继续运行
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true {
// 音频引擎仍在运行,浪费资源和电量
}
}
// ✅ 正确做法:清理所有音频资源
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true || error != nil {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
}
}Assuming on-device recognition is available for all locales
假设所有locale都支持端侧识别
swift
// ❌ DON'T: Force on-device without checking support
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // May silently fail
// ✅ DO: Check support before requiring on-device
if recognizer.supportsOnDeviceRecognition {
request.requiresOnDeviceRecognition = true
} else {
// Fall back to server-based or inform user
}swift
// ❌ 错误做法:未检查支持情况就强制开启端侧识别
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // 可能静默失败
// ✅ 正确做法:开启前检查支持情况
if recognizer.supportsOnDeviceRecognition {
request.requiresOnDeviceRecognition = true
} else {
// 降级为服务端识别或告知用户
}Not handling the one-minute recognition limit
未处理1分钟识别时长限制
swift
// ❌ DON'T: Start one long continuous recognition session
func startRecording() {
// This will be cut off after ~60 seconds (server-based)
}
// ✅ DO: Restart recognition when approaching the limit
func startRecording() {
// Use a timer to restart before the limit
recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) {
[weak self] _ in
self?.restartRecognition()
}
}swift
// ❌ 错误做法:启动一个长时间的连续识别会话
func startRecording() {
// 服务端识别会在约60秒后被强制截断
}
// ✅ 正确做法:接近时长限制时重启识别
func startRecording() {
// 使用定时器在到达限制前重启
recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) {
[weak self] _ in
self?.restartRecognition()
}
}Creating multiple simultaneous recognition tasks
创建多个同时运行的识别任务
swift
// ❌ DON'T: Start a new task without canceling the previous one
func startRecording() {
recognitionTask = recognizer.recognitionTask(with: request) { ... }
// Previous task is still running — undefined behavior
}
// ✅ DO: Cancel existing task before creating a new one
func startRecording() {
recognitionTask?.cancel()
recognitionTask = nil
recognitionTask = recognizer.recognitionTask(with: request) { ... }
}swift
// ❌ 错误做法:未取消前一个任务就启动新任务
func startRecording() {
recognitionTask = recognizer.recognitionTask(with: request) { ... }
// 前一个任务仍在运行 — 会出现未定义行为
}
// ✅ 正确做法:创建新任务前取消现有任务
func startRecording() {
recognitionTask?.cancel()
recognitionTask = nil
recognitionTask = recognizer.recognitionTask(with: request) { ... }
}Review Checklist
检查清单
- is in Info.plist
NSSpeechRecognitionUsageDescription - is in Info.plist (if using live audio)
NSMicrophoneUsageDescription - Authorization is requested before starting recognition
- is set to handle
SFSpeechRecognizerDelegateavailabilityDidChange - Audio engine is stopped and tap removed when recognition ends
- is called when done recording
recognitionRequest.endAudio() - Previous is canceled before starting a new one
recognitionTask - is checked before requiring on-device mode
supportsOnDeviceRecognition - Partial results are handled separately from final () results
isFinal - One-minute limit is accounted for in server-based recognition
- For iOS 26+: assets are installed before using
AssetInventorySpeechAnalyzer - For iOS 26+: is checked
SpeechTranscriber.supportedLocale(equivalentTo:)
- Info.plist中已添加
NSSpeechRecognitionUsageDescription - 如使用实时音频,Info.plist中已添加
NSMicrophoneUsageDescription - 启动识别前已申请所需权限
- 已设置处理
SFSpeechRecognizerDelegate事件availabilityDidChange - 识别结束时已停止音频引擎并移除tap
- 录音结束后已调用
recognitionRequest.endAudio() - 启动新识别任务前已取消之前的
recognitionTask - 强制开启端侧识别前已检查
supportsOnDeviceRecognition - 已区分处理部分结果和最终()结果
isFinal - 服务端识别场景已处理1分钟时长限制
- iOS 26+场景:使用前已安装
SpeechAnalyzer资源包AssetInventory - iOS 26+场景:已检查支持性
SpeechTranscriber.supportedLocale(equivalentTo:)