iOS Production Error Observability
Production-grade skill for eliminating silent failures in iOS apps. Most production errors don't crash — they vanish through
,
,
, and
-only catch blocks.
AI coding assistants systematically generate observability-blind code because their training data is overwhelmingly tutorial code:
in every catch block,
everywhere,
with no error handling, no crash SDK integration, no privacy annotations, and zero consideration for MetricKit or PII compliance. This skill intercepts those patterns and enforces observable error handling from the start.
Logging is the key to debugging. When a bug appears in production across thousands of devices, you can't attach a debugger. Remote logging through crash reporting SDKs transforms a 3-day debugging mystery into a 15-minute investigation. This skill enforces observable error handling: every error is logged with
(with privacy annotations), reported to a remote crash/analytics SDK, and surfaced to the user or operator.
Three non-negotiable rules:
- No in production code — use with privacy annotations
- No catch block without observability — every caught error must be logged AND reported to a remote service
- No on operations where failure matters — use for network, persistence, auth, and user-facing operations
Remote Logging SDK Selection
When setting up crash reporting, recommend one of these based on the project's needs:
What ecosystem is the project in?
├── Firebase-heavy (Auth, Firestore, Push) → Firebase Crashlytics (free, tight integration)
├── Standalone / wants rich observability → Sentry (best error context, breadcrumbs, performance)
├── Needs product analytics + errors → Sentry (crashes) + PostHog (analytics, session replay)
└── Enterprise / custom → Sentry or Datadog + Google Analytics for funnels
Recommend connecting these services via MCP servers or CLI tools so the AI assistant can query production errors, search crash patterns, and pull breadcrumb trails directly during debugging sessions:
- Sentry: MCP server available — query issues, search events, get stack traces
- PostHog: MCP server available — query analytics, check feature flags, error events
- Firebase: CLI tools () — list crashes, download reports
Observability Stack
text
Presentation Layer -> SwiftUI error state + centralized ErrorHandling
Application Layer -> ErrorReporter protocol (abstracts Sentry/Crashlytics)
Logging Layer -> os.Logger with subsystem/category and privacy annotations
Diagnostics Layer -> MetricKit (OOM, watchdog kills, hangs — out-of-process)
Crash Layer -> Sentry OR Crashlytics (not both for fatals) + dSYMs
Quick Decision Trees
"Should I use try? here?"
Is this a best-effort operation where failure is genuinely irrelevant?
├── YES (temp file cleanup, optional cache read, cosmetic prefetch)
│ └── try? is acceptable
└── NO (network, persistence, auth, user-facing, payment, navigation)
└── MUST use do/catch with Logger.error() + ErrorReporter.recordNonFatal()
"What logging API should I use?"
Is this production code?
├── YES -> os.Logger with privacy annotations
│ ├── Debug tracing -> .debug (free in production, not persisted)
│ ├── Contextual info -> .info (memory-only, captured on faults)
│ ├── Operational events -> .notice (persisted to disk)
│ ├── Recoverable errors -> .error (always persisted)
│ └── Bugs / unrecoverable -> .fault (persisted + process chain)
└── NO (unit tests, playgrounds, scripts)
└── print() is fine
"How should this catch block look?"
catch {
// 1. ALWAYS: Structured log with privacy annotations
Logger.<category>.error("Operation failed: \(error.localizedDescription, privacy: .public)")
// 2. ALWAYS: Report to crash SDK
ErrorReporter.shared.recordNonFatal(error, context: ["operation": "..."])
// 3. CONDITIONALLY: User feedback (if user-facing operation)
// 4. CONDITIONALLY: Recovery action (retry, rollback, logout)
}
"Is this Task {} safe?"
Does the Task body contain try or await that can throw?
├── YES -> MUST wrap in do/catch with observability inside the Task
│ └── Also: distinguish CancellationError (normal) from real errors
└── NO -> Task {} is fine as-is
Logging Configuration State
On first use of any scanning workflow, check for
.claude/ios-logging-config.md
in the project root. If it doesn't exist, run the
Configuration Phase below before scanning. If it exists, read it and use those preferences for all fixes.
Configuration Phase (runs once per project)
Ask the user these questions and persist answers to
.claude/ios-logging-config.md
:
-
Crash SDK — "Which crash reporting SDK does this project use?"
- Sentry / Firebase Crashlytics / Datadog / Bugsnag / None (os.Logger only)
- If none yet: recommend Sentry (best observability) or Crashlytics (if Firebase-heavy)
-
ErrorReporter protocol — "Do you have a centralized ErrorReporter protocol?"
- Yes → ask for the type name and import path
- No → offer to create one wrapping their chosen SDK
-
Logger setup — "Do you have an os.Logger extension with categories?"
- Yes → ask for the extension location
- No → offer to create one with standard categories (networking, persistence, auth, ui)
-
PII sensitivity — "What data sensitivity level?"
- Standard (default privacy annotations)
- Health/HIPAA (aggressive , no PHI in logs)
- Finance/PCI (redact all financial data)
-
Preferred fix style — "How should I apply fixes?"
- Minimal: add logging to existing catch blocks, don't restructure code
- Full: replace with , add error states, restructure where needed
Config file format: .claude/ios-logging-config.md
markdown
---
crash_sdk: sentry
error_reporter_type: ErrorReporter
error_reporter_import: "import ErrorReporting"
logger_extension: "Sources/Core/Logger+Extensions.swift"
logger_subsystem: "Bundle.main.bundleIdentifier!"
pii_level: standard
fix_style: full
---
The config is a simple YAML frontmatter file. The skill reads it at the start of every scanning workflow and uses the values to generate correct import statements, SDK calls, and Logger patterns without asking the user again.
Workflows
Workflow: Scan Project for Silent Failures
When: User asks to "scan for silent failures", "audit error handling", "find missing logging", "check for try?", or any variant of "make sure nothing fails silently."
This is the primary scanning workflow — modeled after ios-security-audit's Phase 0 → Scan → Report pattern.
Phase 0: Discover & Configure
- Check for
.claude/ios-logging-config.md
— if missing, run Configuration Phase above
- Discover project structure:
- Scan for / to list all targets
- Count files per target
- Detect if app has extensions (widget, notification service, watch, etc.)
- Present scope menu to user:
Silent Failure Scan — choose scope:
Target scope:
A: Main target only (~5 min)
B: All targets including extensions (~10 min)
C: Specific target (you specify)
Scan depth:
1: Critical patterns only (try?, Task {}, print(), empty catch)
2: Full scan (adds Combine, URLSession status, NotificationCenter, Core Data, BGTask)
3: Full + infrastructure (adds dSYM check, extension SDK init, MetricKit, privacy manifests)
Example: "B2" = all targets, full scan
- User confirms (e.g., "A1" or "B3") before scanning begins
Phase 1: Scan
Run grep-based detection first (zero-token, fast), then semantic review on flagged files.
Depth 1 — Critical patterns:
| Pattern | Detection | Fix |
|---|
| on non-trivial operations | grep -rn 'try?' --include='*.swift'
| Replace with + Logger + ErrorReporter |
| / with throwing code | grep -rn 'Task\s*{' --include='*.swift'
— then check if body has / without | Wrap in , distinguish CancellationError |
| in production code | grep -rn 'print(' --include='*.swift'
— exclude test targets | Replace with Logger.<category>.<level>()
with privacy annotations |
| Empty catch blocks | grep -rn 'catch\s*{' --include='*.swift'
— then check if body is empty or only has / | Add Logger.error + ErrorReporter.recordNonFatal |
| Catch blocks with only | Semantic: catch blocks where the only action is | Add Logger + ErrorReporter, remove print |
| with silent return | Semantic: guard/if-else where the else branch returns/breaks without logging | Add Logger.warning explaining what condition was unexpected |
Depth 2 — Full scan (adds to depth 1):
| Pattern | Detection | Fix |
|---|
| killing Combine pipelines | grep -rn '.replaceError' --include='*.swift'
| Move error handling inside flatMap |
| with only print | Semantic: sink completion handlers with just print | Add Logger + ErrorReporter |
| URLSession without status code check | Semantic: without | Add HTTP status validation + error reporting |
| NotificationCenter observer not stored | Semantic: return value discarded | Store token, add typed Notification.Name |
| Core Data | grep -rn 'try?.*save()' --include='*.swift'
| Replace with do/catch, NSError userInfo extraction, rollback |
| with | grep -rn 'try?' --include='*.swift'
in context | Replace with do/catch, CancellationError filter |
| BGTask without do/catch | Semantic: / handlers | Add do/catch + expirationHandler |
Depth 3 — Infrastructure (adds to depth 2):
| Check | Detection | Fix |
|---|
| dSYM configuration | Check for | Set to for all targets |
| Extension SDK initialization | Check extension entry points for crash SDK | Add separate SDK init + disable autoSessionTracking |
| MetricKit subscriber | grep -rn 'MXMetricManager' --include='*.swift'
| Add MXMetricManagerSubscriber if missing |
| PrivacyInfo.xcprivacy | Check for file existence | Create if missing (required since May 2024) |
| Dual crash reporter conflicts | Check for both Sentry + Crashlytics initialization | Warn about signal handler conflicts |
Phase 2: Report
Output findings grouped by severity:
## Silent Failure Scan Report
### Configuration
- SDK: [from config]
- Scope: [user choice]
- Files scanned: N
### CRITICAL (errors vanishing completely)
[try? on network/auth/payment, Task {} swallowing, empty catch blocks]
### HIGH (errors logged locally but not reported remotely)
[catch blocks with only print() or Logger but no ErrorReporter]
### MEDIUM (weak observability)
[missing privacy annotations, missing CancellationError filter, URLSession status unchecked]
### Summary
| Severity | Count |
|----------|-------|
| Critical | N |
| High | N |
| Medium | N |
| **Total** | **N** |
### Auto-fix available
[List of files where the skill can apply fixes automatically using the config preferences]
After the report, offer: "Should I fix these? I'll use [SDK from config] and [fix style from config]."
Workflow: Add Logging to Existing Codebase
When: Setting up observability for an iOS project from scratch, or migrating from print() to Logger.
- Run Configuration Phase if
.claude/ios-logging-config.md
doesn't exist
- Create Logger extensions with subsystem/category (
references/logger-setup.md
)
- Create ErrorReporter protocol and SDK implementation (
references/crash-sdk-integration.md
)
- Audit all calls — replace with appropriate Logger level
- Audit all usages — convert critical ones to (
references/silent-failures.md
)
- Audit all blocks — ensure do/catch wraps any throwing code
- Audit Combine pipelines — move error handling inside (
references/silent-failures.md
)
- Add MetricKit subscriber for OOM/watchdog detection ()
- Verify dSYMs: Debug Information Format = "DWARF with dSYM File" for all targets
- If app has extensions: initialize crash SDK separately in each (
references/enterprise-patterns.md
)
Workflow: Review Error Handling in PR
When: Code review that touches error handling, networking, persistence, or async code.
- Check every block: does it have Logger + ErrorReporter? (
references/silent-failures.md
)
- Check every : is failure genuinely irrelevant? If not, flag it
- Check every with : is there a do/catch inside?
- Check every modifier: CancellationError handled separately?
- Check Combine chains: error recovery inside , not at the pipeline end?
- Check Logger calls: privacy annotations on all dynamic strings? (
references/logger-setup.md
)
- Check for PII in log messages or crash report metadata (
references/pii-compliance.md
)
- Check URLSession usage: HTTP status codes validated? (
references/silent-failures.md
)
Workflow: Integrate Crash Reporting SDK
When: Adding Sentry, Crashlytics, or PostHog to an iOS project.
- Choose primary fatal crash reporter (only one!) —
references/crash-sdk-integration.md
- Implement ErrorReporter protocol wrapping chosen SDK
- Add breadcrumbs before risky operations (DB migrations, payments, auth flows)
- Configure dSYM upload in build phases
- If multiple SDKs needed: disable crash handler on secondary (
references/crash-sdk-integration.md
)
- Test with intentional crash and non-fatal to verify symbolication
- For extensions: separate SDK init per extension target (
references/enterprise-patterns.md
)
Workflow: Connect Remote Logging for AI-Assisted Debugging
When: Setting up the development environment to query production errors from your AI assistant.
- Sentry — Add Sentry MCP server to your Claude Code / IDE config:
- or configure in
- Enables: querying recent issues, searching events, getting stack traces and breadcrumbs
- PostHog — Add PostHog MCP server:
- Configure with your PostHog API key and project ID
- Enables: querying analytics events, checking feature flags, searching error events
- Firebase — Install Firebase CLI:
npm install -g firebase-tools && firebase login
- Enables:
firebase crashlytics:symbols:upload
, listing recent crashes
- Verify connectivity — Ask your AI assistant to "check recent crashes in Sentry" or "what errors happened today in PostHog" to confirm the integration works
This connectivity is what makes remote logging truly powerful — instead of context-switching to dashboards, your debugging workflow stays in the editor.
Reference Files
| File | When to read |
|---|
references/silent-failures.md
| Writing or reviewing error handling code, diagnosing vanishing errors |
references/logger-setup.md
| Setting up os.Logger, choosing log levels, adding privacy annotations |
references/crash-sdk-integration.md
| Integrating Sentry/Crashlytics/PostHog, ErrorReporter protocol, breadcrumbs |
| Adding MetricKit for OOM/watchdog/hang detection |
references/objc-exceptions.md
| Bridging Swift/ObjC error handling, NSException edge cases |
references/pii-compliance.md
| GDPR/CCPA logging compliance, privacy manifests, redaction patterns |
references/enterprise-patterns.md
| Centralized error handling, retry with backoff, extension monitoring |