Loading...
Loading...
Use when debugging connection timeouts, TLS handshake failures, data not arriving, connection drops, performance issues, or proxy/VPN interference - systematic Network.framework diagnostics with production crisis defense
npx skill4agent add charleswiltgen/axiom axiom-networking-diag// 1. Enable Network.framework logging
// Add to Xcode scheme: Product → Scheme → Edit Scheme → Arguments
// -NWLoggingEnabled 1
// -NWConnectionLoggingEnabled 1
// 2. Check connection state history
connection.stateUpdateHandler = { state in
print("\(Date()): Connection state: \(state)")
// Log every state transition with timestamp
}
// 3. Check TLS configuration
// If using custom TLS parameters:
print("TLS version: \(tlsParameters.minimumTLSProtocolVersion)")
print("Cipher suites: \(tlsParameters.tlsCipherSuites ?? [])")
// 4. Test with packet capture (Charles Proxy or Wireshark)
// On device: Settings → WiFi → (i) → Configure Proxy → Manual
// Charles: Help → SSL Proxying → Install Charles Root Certificate on iOS
// 5. Test on different networks
// - WiFi
// - Cellular (disable WiFi)
// - Airplane Mode → WiFi (test waiting state)
// - VPN active
// - IPv6-only (some cellular carriers)| Observation | Diagnosis | Next Step |
|---|---|---|
| Stuck in .preparing > 5 seconds | DNS failure or network down | Pattern 1a |
| Moves to .waiting immediately | No connectivity (Airplane Mode, no signal) | Pattern 1b |
| .failed with POSIX error 61 | Connection refused (server not listening) | Pattern 1c |
| .failed with POSIX error 50 | Network down (interface disabled) | Pattern 1d |
| .ready then immediate .failed | TLS handshake failure | Pattern 2b |
| .ready, send succeeds, no data arrives | Framing problem or receiver not processing | Pattern 3a |
| Works WiFi, fails cellular | IPv6-only network (hardcoded IPv4) | Pattern 5a |
| Works without VPN, fails with VPN | Proxy interference or DNS override | Pattern 5b |
Network problem?
├─ Connection never reaches .ready?
│ ├─ Stuck in .preparing for >5 seconds?
│ │ ├─ DNS lookup timing out? → Pattern 1a (DNS Failure)
│ │ ├─ Network available but can't reach host? → Pattern 1c (Connection Refused)
│ │ └─ First connection slow, subsequent fast? → Pattern 1e (DNS Caching)
│ │
│ ├─ Moves to .waiting immediately?
│ │ ├─ Airplane Mode or no signal? → Pattern 1b (No Connectivity)
│ │ ├─ Cellular blocked by parameters? → Pattern 1b (Interface Restrictions)
│ │ └─ VPN connecting? → Wait and retry
│ │
│ ├─ .failed with POSIX error 61?
│ │ └─ → Pattern 1c (Connection Refused)
│ │
│ └─ .failed with POSIX error 50?
│ └─ → Pattern 1d (Network Down)
│
├─ Connection reaches .ready, then fails?
│ ├─ Fails immediately after .ready?
│ │ ├─ TLS error -9806? → Pattern 2b (Certificate Validation)
│ │ ├─ TLS error -9801? → Pattern 2b (Protocol Version)
│ │ └─ POSIX error 54? → Pattern 2d (Connection Reset)
│ │
│ ├─ Fails after network change (WiFi → cellular)?
│ │ ├─ No viabilityUpdateHandler? → Pattern 2a (Viability Not Handled)
│ │ ├─ Didn't detect better path? → Pattern 2a (Better Path)
│ │ └─ IPv6 → IPv4 transition? → Pattern 5a (Dual Stack)
│ │
│ ├─ Fails after timeout?
│ │ └─ → Pattern 2c (Receiver Not Responding)
│ │
│ └─ Random disconnects?
│ └─ → Pattern 2d (Network Instability)
│
├─ Data not arriving?
│ ├─ Send succeeds, receive never returns?
│ │ ├─ No message framing? → Pattern 3a (Framing Problem)
│ │ ├─ Wrong byte count? → Pattern 3b (Min/Max Bytes)
│ │ └─ Receiver not calling receive()? → Check receiver code
│ │
│ ├─ Partial data arrives?
│ │ ├─ receive(exactly:) too large? → Pattern 3b (Chunking)
│ │ ├─ Sender closing too early? → Check sender lifecycle
│ │ └─ Buffer overflow? → Pattern 3b (Buffer Management)
│ │
│ ├─ Data corrupted?
│ │ ├─ TLS disabled? → Pattern 3c (No Encryption)
│ │ ├─ Binary vs text encoding? → Check ContentType
│ │ └─ Byte order (endianness)? → Use network byte order
│ │
│ └─ Works sometimes, fails intermittently?
│ └─ → Pattern 3d (Race Condition)
│
├─ Performance degrading?
│ ├─ Latency increasing over time?
│ │ ├─ TCP congestion? → Pattern 4a (Congestion Control)
│ │ ├─ No contentProcessed pacing? → Pattern 4a (Buffering)
│ │ └─ Server overloaded? → Check server metrics
│ │
│ ├─ Throughput decreasing?
│ │ ├─ Network transition WiFi → cellular? → Pattern 4b (Bandwidth Change)
│ │ ├─ Packet loss increasing? → Pattern 4b (Network Quality)
│ │ └─ Multiple streams competing? → Pattern 4b (Prioritization)
│ │
│ ├─ High CPU usage?
│ │ ├─ Not using batch for UDP? → Pattern 4c (Batching)
│ │ ├─ Too many small sends? → Pattern 4c (Coalescing)
│ │ └─ Using sockets instead of Network.framework? → Migrate (30% CPU savings)
│ │
│ └─ Memory growing?
│ ├─ Not releasing connections? → Pattern 4d (Connection Leaks)
│ ├─ Not cancelling on deinit? → Pattern 4d (Lifecycle)
│ └─ Missing [weak self]? → Pattern 4d (Retain Cycles)
│
└─ Works on WiFi, fails on cellular/VPN?
├─ IPv6-only cellular network?
│ ├─ Hardcoded IPv4 address? → Pattern 5a (IPv4 Literal)
│ ├─ getaddrinfo with AF_INET only? → Pattern 5a (Address Family)
│ └─ Works on some carriers, not others? → Pattern 5a (Regional IPv6)
│
├─ Corporate VPN active?
│ ├─ Proxy configuration failing? → Pattern 5b (PAC)
│ ├─ DNS override blocking hostname? → Pattern 5b (DNS)
│ └─ Certificate pinning failing? → Pattern 5b (TLS in VPN)
│
├─ Port blocked by firewall?
│ ├─ Non-standard port? → Pattern 5c (Firewall)
│ ├─ Outbound only? → Pattern 5c (NATing)
│ └─ Works on port 443, not 8080? → Pattern 5c (Port Scanning)
│
└─ Peer-to-peer connection failing?
├─ NAT traversal issue? → Pattern 5d (STUN/TURN)
├─ Symmetric NAT? → Pattern 5d (NAT Type)
└─ Local network only? → Pattern 5d (Bonjour/mDNS)// Enable DNS logging
// -NWLoggingEnabled 1
// Check DNS resolution manually
// Terminal: nslookup example.com
// Terminal: dig example.com
// Logs show:
// "DNS lookup timed out"
// "getaddrinfo failed: 8 (nodename nor servname provided)"// ❌ WRONG — Adding timeout doesn't fix DNS
/*
let parameters = NWParameters.tls
parameters.expiredDNSBehavior = .allow // Doesn't help if DNS never resolves
*/
// ✅ CORRECT — Verify hostname, test DNS manually
// 1. Test DNS manually:
// $ nslookup your-hostname.com
// If this fails, DNS is the problem (not your code)
// 2. If DNS works manually but not in app:
// Check if VPN or enterprise config blocking app DNS
// 3. If hostname doesn't exist:
let connection = NWConnection(
host: NWEndpoint.Host("correct-hostname.com"), // Fix typo
port: 443,
using: .tls
)
// 4. If DNS caching issue (rare):
// Restart device to clear DNS cache
// Or use IP address temporarily while investigating DNS server issuenslookup your-hostname.com-9806-9807-9801# Test TLS manually with openssl
openssl s_client -connect example.com:443 -showcerts
# Check certificate details
openssl s_client -connect example.com:443 | openssl x509 -noout -dates
# notBefore: Jan 1 00:00:00 2024 GMT
# notAfter: Dec 31 23:59:59 2024 GMT ← Check if expired
# Check certificate chain
openssl s_client -connect example.com:443 -showcerts | grep "CN="
# Should show: Subject CN=example.com, Issuer CN=Trusted CA// ❌ WRONG — Never disable certificate validation in production
/*
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(tlsOptions.securityProtocolOptions, { ... }, .main)
// This disables validation → security vulnerability
*/
// ✅ CORRECT — Fix the certificate on server
// 1. Renew expired certificate (Let's Encrypt, DigiCert, etc.)
// 2. Ensure hostname matches (CN=example.com or SAN includes example.com)
// 3. Include intermediate CA certificates on server
// 4. Test with: openssl s_client -connect example.com:443// ⚠️ ONLY for development/staging
#if DEBUG
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (sec_protocol_metadata, sec_trust, sec_protocol_verify_complete) in
// Trust any certificate (DEV ONLY)
sec_protocol_verify_complete(true)
},
.main
)
let parameters = NWParameters(tls: tlsOptions)
let connection = NWConnection(host: "dev-server.example.com", port: 443, using: parameters)
#endif// Production-grade certificate pinning
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (metadata, trust, complete) in
let trust = sec_protocol_metadata_copy_peer_public_key(metadata)
// Compare trust with pinned certificate
let pinnedCertificateData = Data(/* your cert */)
let serverCertificateData = SecCertificateCopyData(trust) as Data
if serverCertificateData == pinnedCertificateData {
complete(true)
} else {
complete(false) // Reject non-pinned certificates
}
},
.main
)openssl s_client -connect example.com:443Verify return code: 0 (ok)// Enable detailed logging
connection.send(content: data, completion: .contentProcessed { error in
if let error = error {
print("Send error: \(error)")
} else {
print("✅ Sent \(data.count) bytes at \(Date())")
}
})
connection.receive(minimumIncompleteLength: 1, maximumLength: 65536) { data, context, isComplete, error in
if let error = error {
print("Receive error: \(error)")
} else if let data = data {
print("✅ Received \(data.count) bytes at \(Date())")
}
}
// Use Charles Proxy or Wireshark to verify bytes on wire// Sender sends 3 messages:
send("Hello") // 5 bytes
send("World") // 5 bytes
send("!") // 1 byte
// Receiver might get:
receive() → "HelloWorld!" // All 11 bytes at once
// Or:
receive() → "Hel" // 3 bytes
receive() → "loWorld!" // 8 bytes
// Message boundaries lost!// NetworkConnection with TLV
let connection = NetworkConnection(
to: .hostPort(host: "example.com", port: 1029)
) {
TLV {
TLS()
}
}
// Send typed messages
enum MessageType: Int {
case chat = 1
case ping = 2
}
let chatData = Data("Hello".utf8)
try await connection.send(chatData, type: MessageType.chat.rawValue)
// Receive typed messages
let (data, metadata) = try await connection.receive()
if metadata.type == MessageType.chat.rawValue {
print("Chat message: \(String(data: data, encoding: .utf8)!)")
}// Sender: Prefix message with UInt32 length
func sendMessage(_ message: Data) {
var length = UInt32(message.count).bigEndian
let lengthData = Data(bytes: &length, count: 4)
connection.send(content: lengthData, completion: .contentProcessed { _ in
connection.send(content: message, completion: .contentProcessed { _ in
print("Sent message with length prefix")
})
})
}
// Receiver: Read length, then read message
func receiveMessage() {
// 1. Read 4-byte length
connection.receive(minimumIncompleteLength: 4, maximumLength: 4) { lengthData, _, _, error in
guard let lengthData = lengthData else { return }
let length = lengthData.withUnsafeBytes { $0.load(as: UInt32.self).bigEndian }
// 2. Read message of exact length
connection.receive(minimumIncompleteLength: Int(length), maximumLength: Int(length)) { messageData, _, _, error in
guard let messageData = messageData else { return }
print("Received complete message: \(messageData.count) bytes")
}
}
}// Monitor send completion time
let sendStart = Date()
connection.send(content: data, completion: .contentProcessed { error in
let elapsed = Date().timeIntervalSince(sendStart)
print("Send completed in \(elapsed)s") // Should be < 0.1s normally
// If > 1s, TCP congestion or receiver not draining fast enough
})
// Profile with Instruments
// Xcode → Product → Profile → Network template
// Check "Bytes Sent" vs "Time" graph
// Should be smooth line, not stepped/stalled// ❌ WRONG — Sending without pacing
/*
for frame in videoFrames {
connection.send(content: frame, completion: .contentProcessed { _ in })
// Buffers all frames immediately → memory spike → congestion
}
*/
// ✅ CORRECT — Pace with contentProcessed callback
func sendFrameWithPacing() {
guard let nextFrame = getNextFrame() else { return }
connection.send(content: nextFrame, completion: .contentProcessed { [weak self] error in
if let error = error {
print("Send error: \(error)")
return
}
// contentProcessed = network stack consumed frame
// NOW send next frame (pacing)
self?.sendFrameWithPacing()
})
}
// Start pacing
sendFrameWithPacing()// NetworkConnection with natural back pressure
func sendFrames() async throws {
for frame in videoFrames {
try await connection.send(frame)
// Suspends automatically if network can't keep up
// Built-in back pressure, no manual pacing needed
}
}# Check if hostname has IPv6
dig AAAA example.com
# Check if device is on IPv6-only network
# Settings → WiFi/Cellular → (i) → IP Address
# If starts with "2001:" or "fe80:" → IPv6
# If "192.168" or "10." → IPv4
# Test with IPv6-only simulator
# Xcode → Devices → (device) → Use as Development Target
# Settings → Developer → Networking → DNS64/NAT64// ❌ WRONG — Hardcoded IPv4
/*
let host = "192.168.1.100" // Fails on IPv6-only cellular
*/
// ❌ WRONG — Forcing IPv4
/*
let parameters = NWParameters.tcp
parameters.requiredInterfaceType = .wifi
parameters.ipOptions.version = .v4 // Fails on IPv6-only
*/
// ✅ CORRECT — Use hostname, let framework handle IPv4/IPv6
let connection = NWConnection(
host: NWEndpoint.Host("example.com"), // Hostname, not IP
port: 443,
using: .tls
)
// Framework automatically:
// 1. Resolves both A (IPv4) and AAAA (IPv6) records
// 2. Tries IPv6 first (if available)
// 3. Falls back to IPv4 (Happy Eyeballs)
// 4. Works on any network (IPv4, IPv6, dual-stack)dig AAAA your-hostname.com// Check what changed in v4.2
git diff v4.1 v4.2 -- NetworkClient.swift
// Most likely culprits:
// - TLS configuration changed
// - Added certificate pinning
// - Changed connection parameters
// - Updated hostname// Check failure pattern:
// - Random 15%? Or specific user segment?
// - Specific iOS version? (check analytics)
// - Specific network? (WiFi vs cellular)
// Enable logging on production builds (emergency flag):
#if PRODUCTION
if UserDefaults.standard.bool(forKey: "EnableNetworkLogging") {
// -NWLoggingEnabled 1
}
#endif
// Ask Customer Support to enable for affected users
// Check logs for specific error code// Found in git diff:
// v4.1:
let parameters = NWParameters.tls
// v4.2:
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv13 // ← SMOKING GUN
let parameters = NWParameters(tls: tlsOptions)// Fix: Support both TLS 1.2 and TLS 1.3
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv12 // ✅ Support older infrastructure
// TLS 1.3 will still be used where supported (automatic negotiation)
let parameters = NWParameters(tls: tlsOptions)# Build hotfix v4.2.1
# Test on affected user's network (critical!)
# Submit to App Store with expedited review request
# Explain: "Production outage affecting 15% of users"Found root cause: v4.2 requires TLS 1.3, but 15% of users on older infrastructure
(enterprise proxies, older load balancers) that only support TLS 1.2.
Fix: Change minimum TLS version to 1.2 (backward compatible, 1.3 still used when available).
ETA: Hotfix v4.2.1 in App Store in 1 hour (expedited review).
Full rollout to users: 24 hours.
Mitigation now: Telling affected users to update immediately when available.Root cause: TLS version requirement changed in v4.2 (TLS 1.3 only).
15% of users behind infrastructure that doesn't support TLS 1.3.
Technical fix: Set tlsOptions.minimumTLSProtocolVersion = .TLSv12
This allows backward compatibility while still using TLS 1.3 where supported.
Testing: Verified fix on user's network (enterprise VPN with old proxy).
Deployment: Hotfix build in progress, ETA 30 minutes to submit.
Prevention: Add TLS compatibility testing to pre-release checklist.Update: We've identified the issue and have a fix deploying within 1 hour.
Affected users: Those on enterprise networks or older ISP infrastructure.
Workaround: None (network level issue).
Expected resolution: v4.2.1 will be available in App Store in 1 hour.
Ask users to update immediately.
Updates: I'll notify you every 30 minutes.| Approach | Time to Resolution | User Impact |
|---|---|---|
| ❌ Panic rollback | 1-2 hours app review + 24 hours user updates = 26 hours | 10K users down for 26 hours |
| ❌ "Add more retries" | Unknown (doesn't fix root cause) | Permanent 15% failure rate |
| ❌ "Works for me" | Days of debugging wrong thing | Frustrated users, bad reviews |
| ✅ Systematic diagnosis | 30 min diagnosis + 20 min fix + 1 hour review = 2 hours | 10K users down for 2 hours |
| Symptom | Likely Cause | First Check | Pattern | Fix Time |
|---|---|---|---|---|
| Stuck in .preparing | DNS failure | | 1a | 10-15 min |
| .waiting immediately | No connectivity | Airplane Mode? | 1b | 5 min |
| .failed POSIX 61 | Connection refused | Server listening? | 1c | 5-10 min |
| .failed POSIX 50 | Network down | Check interface | 1d | 5 min |
| TLS error -9806 | Certificate invalid | | 2b | 15-20 min |
| Data not received | Framing problem | Packet capture | 3a | 20-30 min |
| Partial data | Min/max bytes wrong | Check receive() params | 3b | 10 min |
| Latency increasing | TCP congestion | contentProcessed pacing | 4a | 15-25 min |
| High CPU | No batching | Use connection.batch | 4c | 10 min |
| Memory growing | Connection leaks | Check [weak self] | 4d | 10-15 min |
| Works WiFi, fails cellular | IPv6-only network | | 5a | 10-15 min |
| Works without VPN, fails with VPN | Proxy interference | Test PAC file | 5b | 20-30 min |
| Port blocked | Firewall | Try 443 vs 8080 | 5c | 10 min |
// Add to Xcode scheme BEFORE debugging:
// -NWLoggingEnabled 1
// -NWConnectionLoggingEnabled 1
// Or programmatically:
#if DEBUG
ProcessInfo.processInfo.environment["NW_LOGGING_ENABLED"] = "1"
#endif.failed(let error)if case .failed(let error) = state {
let posixError = (error as NSError).code
switch posixError {
case 61: // ECONNREFUSED
print("Server not listening, check server logs")
case 50: // ENETDOWN
print("Network interface down, check WiFi/cellular")
case 60: // ETIMEDOUT
print("Connection timeout, check firewall/DNS")
default:
print("Connection failed: \(error)")
}
}// Test with Network Link Conditioner:
// 1. 100% Loss — verify .waiting state shows "Waiting for network"
// 2. WiFi → None → WiFi — verify automatic reconnection
// 3. 3% packet loss — verify performance graceful degradation