microservices-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Microservices Patterns

微服务模式

A comprehensive skill for building, deploying, and managing production-grade microservices architectures. This skill covers service mesh patterns, traffic management, resilience engineering, observability, security, and modern microservices best practices using Istio and Kubernetes.
这是一项用于构建、部署和管理生产级微服务架构的综合技能。本技能涵盖基于Istio和Kubernetes的服务网格模式、流量管理、弹性工程、可观测性、安全性以及现代微服务最佳实践。

When to Use This Skill

何时使用本技能

Use this skill when:
  • Architecting microservices-based applications with distributed systems
  • Implementing service mesh infrastructure for service-to-service communication
  • Adding resilience patterns like circuit breakers, retries, and timeouts
  • Managing traffic routing, load balancing, and canary deployments
  • Implementing distributed tracing and observability across microservices
  • Securing microservices with mTLS and authorization policies
  • Troubleshooting cascading failures and service degradation
  • Building fault-tolerant distributed systems
  • Implementing blue-green deployments and A/B testing
  • Managing multi-cluster microservices deployments
  • Implementing chaos engineering and fault injection
  • Migrating from monolithic to microservices architecture
在以下场景中使用本技能:
  • 构建基于微服务的分布式系统应用
  • 为服务间通信实现服务网格基础设施
  • 添加断路器、重试和超时等弹性模式
  • 管理流量路由、负载均衡和金丝雀发布
  • 在微服务间实现分布式追踪与可观测性
  • 通过mTLS和授权策略保障微服务安全
  • 排查级联故障和服务性能下降问题
  • 构建容错分布式系统
  • 实现蓝绿发布与A/B测试
  • 管理多集群微服务部署
  • 实施混沌工程与故障注入
  • 从单体架构迁移到微服务架构

Core Concepts

核心概念

Microservices Architecture

微服务架构

Microservices architecture structures an application as a collection of loosely coupled services:
  • Service Independence: Each service is independently deployable and scalable
  • Domain-Driven Design: Services align with business capabilities
  • Decentralized Data: Each service owns its data store
  • API-First: Services communicate via well-defined APIs
  • Polyglot Persistence: Different services can use different databases
  • Failure Isolation: Service failures don't cascade across the system
微服务架构将应用拆分为一系列松耦合的服务:
  • 服务独立性:每个服务可独立部署和扩展
  • 领域驱动设计:服务与业务能力对齐
  • 去中心化数据:每个服务拥有独立的数据存储
  • API优先:服务通过定义清晰的API进行通信
  • 多语言持久化:不同服务可使用不同数据库
  • 故障隔离:单个服务故障不会扩散到整个系统

Service Mesh Fundamentals

服务网格基础

A service mesh is an infrastructure layer for handling service-to-service communication:
  • Data Plane: Sidecar proxies (Envoy) deployed alongside each service
  • Control Plane: Manages and configures proxies (Istio, Linkerd, Consul)
  • Service Discovery: Automatic service registration and discovery
  • Load Balancing: Intelligent traffic distribution across service instances
  • Observability: Built-in metrics, logs, and distributed tracing
  • Security: mTLS, authentication, and authorization
服务网格是用于处理服务间通信的基础设施层:
  • 数据平面:与每个服务一同部署的Sidecar代理(Envoy)
  • 控制平面:管理和配置代理的组件(Istio、Linkerd、Consul)
  • 服务发现:自动服务注册与发现
  • 负载均衡:智能分配服务实例间的流量
  • 可观测性:内置指标、日志和分布式追踪
  • 安全性:mTLS、身份认证与授权

Istio Architecture

Istio架构

Istio is the most popular service mesh implementation:
Control Plane Components:
  • Istiod: Unified control plane for service discovery, configuration, and certificate management
  • Pilot: Traffic management and service discovery
  • Citadel: Certificate authority for mTLS
  • Galley: Configuration validation and distribution
Data Plane:
  • Envoy Proxy: High-performance sidecar proxy for each service
  • Iptables Rules: Transparent traffic interception
  • Service Proxy: Handles all network traffic for the service
Istio是最流行的服务网格实现:
控制平面组件:
  • Istiod:统一控制平面,负责服务发现、配置和证书管理
  • Pilot:流量管理与服务发现
  • Citadel:mTLS证书颁发机构
  • Galley:配置验证与分发
数据平面:
  • Envoy Proxy:为每个服务部署的高性能Sidecar代理
  • Iptables规则:透明流量拦截
  • 服务代理:处理服务的所有网络流量

Key Service Mesh Patterns

关键服务网格模式

  1. Sidecar Pattern: Proxy deployed alongside application container
  2. Service Discovery: Automatic registration and discovery of services
  3. Traffic Splitting: Route percentage of traffic to different versions
  4. Circuit Breaker: Prevent cascading failures
  5. Retry Logic: Automatic retry with exponential backoff
  6. Timeout Policies: Request timeout configuration
  7. Fault Injection: Chaos testing in production
  8. Rate Limiting: Protect services from overload
  9. mTLS: Mutual TLS for service-to-service encryption
  10. Distributed Tracing: Request flow across services
  1. Sidecar模式:与应用容器一同部署的代理
  2. 服务发现:自动注册与发现服务
  3. 流量拆分:将一定比例的流量路由到不同版本
  4. 断路器:防止级联故障
  5. 重试逻辑:带指数退避的自动重试
  6. 超时策略:请求超时配置
  7. 故障注入:生产环境中的混沌测试
  8. 速率限制:防止服务过载
  9. mTLS:服务间通信的双向TLS加密
  10. 分布式追踪:跨服务的请求流追踪

Traffic Management

流量管理

Virtual Services

虚拟服务

Virtual services define routing rules for traffic within the mesh:
Key Features:
  • HTTP/TCP/TLS Routing: Protocol-specific routing rules
  • Match Conditions: Route based on headers, URIs, methods
  • Weighted Routing: Traffic splitting across versions
  • Redirects and Rewrites: URL manipulation
  • Fault Injection: Delay and abort injection
  • Retries: Automatic retry configuration
  • Timeouts: Request timeout policies
Virtual Service Structure:
yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v3
虚拟服务定义网格内的流量路由规则:
核心特性:
  • HTTP/TCP/TLS路由:基于协议的路由规则
  • 匹配条件:根据请求头、URI、方法路由
  • 加权路由:跨版本的流量拆分
  • 重定向与重写:URL操作
  • 故障注入:延迟与中断注入
  • 重试:自动重试配置
  • 超时:请求超时策略
虚拟服务结构:
yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v3

Destination Rules

目标规则

Destination rules configure policies for traffic after routing:
Key Features:
  • Load Balancing: Round robin, random, least request
  • Connection Pools: Connection limits and timeouts
  • Outlier Detection: Circuit breaker configuration
  • TLS Settings: mTLS mode configuration
  • Subset Definitions: Version-based service subsets
Common Load Balancing Strategies:
  • ROUND_ROBIN: Default, distributes evenly
  • LEAST_REQUEST: Routes to instances with fewest requests
  • RANDOM: Random distribution
  • PASSTHROUGH: Use original destination
目标规则配置路由后的流量策略:
核心特性:
  • 负载均衡:轮询、随机、最少请求
  • 连接池:连接限制与超时
  • 异常实例检测:断路器配置
  • TLS设置:mTLS模式配置
  • 子集定义:基于版本的服务子集
常见负载均衡策略:
  • ROUND_ROBIN:默认策略,均匀分发流量
  • LEAST_REQUEST:路由到请求最少的实例
  • RANDOM:随机分配
  • PASSTHROUGH:使用原始目标地址

Traffic Splitting

流量拆分

Traffic splitting enables gradual rollouts and A/B testing:
Use Cases:
  • Canary Deployments: Route small percentage to new version
  • Blue-Green Deployments: Switch traffic between versions
  • A/B Testing: Split traffic for experimentation
  • Dark Launches: Shadow traffic to new version
Progressive Delivery Pattern:
v1: 100% → 90% → 70% → 50% → 20% → 0%
v2:   0% → 10% → 30% → 50% → 80% → 100%
流量拆分支持逐步发布与A/B测试:
使用场景:
  • 金丝雀发布:将小比例流量路由到新版本
  • 蓝绿发布:在版本间切换流量
  • A/B测试:拆分流量进行实验
  • 暗启动:将影子流量路由到新版本
渐进式发布模式:
v1: 100% → 90% → 70% → 50% → 20% → 0%
v2:   0% → 10% → 30% → 50% → 80% → 100%

Gateway Configuration

网关配置

Gateways manage ingress and egress traffic:
Ingress Gateway:
  • External traffic entry point
  • TLS termination
  • Protocol-specific routing
  • Virtual hosting
Egress Gateway:
  • Control outbound traffic
  • Security policies for external services
  • Traffic monitoring and logging
网关管理入口与出口流量:
入口网关:
  • 外部流量入口
  • TLS终止
  • 基于协议的路由
  • 虚拟主机
出口网关:
  • 控制出站流量
  • 外部服务的安全策略
  • 流量监控与日志

Resilience Patterns

弹性模式

Circuit Breaker Pattern

断路器模式

Circuit breakers prevent cascading failures by detecting and isolating failing services:
States:
  • Closed: Normal operation, requests flow through
  • Open: Service failing, requests fail immediately
  • Half-Open: Testing if service recovered
Configuration Parameters:
  • Consecutive Errors: Errors before opening circuit
  • Interval: Time window for error counting
  • Base Ejection Time: How long to eject failing instances
  • Max Ejection Percentage: Maximum percentage of pool to eject
Benefits:
  • Prevents resource exhaustion
  • Fails fast instead of waiting for timeouts
  • Gives failing services time to recover
  • Monitors service health automatically
断路器通过检测和隔离故障服务来防止级联故障:
状态:
  • 关闭:正常运行,请求正常流转
  • 打开:服务故障,请求直接失败
  • 半开:测试服务是否恢复
配置参数:
  • 连续错误数:触发断路器打开的错误次数
  • 时间窗口:错误计数的时间范围
  • 基础剔除时间:故障实例的剔除时长
  • 最大剔除比例:可剔除的实例池最大比例
优势:
  • 防止资源耗尽
  • 快速失败而非等待超时
  • 为故障服务提供恢复时间
  • 自动监控服务健康状态

Retry Logic

重试逻辑

Automatic retry with intelligent backoff strategies:
Retry Strategies:
  • Fixed Delay: Constant delay between retries
  • Exponential Backoff: Increasing delay between retries
  • Jittered Backoff: Random jitter to prevent thundering herd
Configuration:
  • Attempts: Maximum number of retries
  • Per Try Timeout: Timeout for each attempt
  • Retry On: Conditions triggering retry (5xx, timeout, refused-stream)
  • Backoff: Base interval and maximum interval
Best Practices:
  • Only retry idempotent operations
  • Use exponential backoff with jitter
  • Set maximum retry attempts
  • Monitor retry rates
带智能退避策略的自动重试:
重试策略:
  • 固定延迟:重试间隔固定
  • 指数退避:重试间隔逐渐增加
  • 抖动退避:添加随机抖动防止惊群效应
配置项:
  • 重试次数:最大重试次数
  • 单次超时:每次重试的超时时间
  • 触发条件:触发重试的场景(5xx、超时、连接拒绝)
  • 退避设置:基础间隔与最大间隔
最佳实践:
  • 仅对幂等操作重试
  • 使用带抖动的指数退避
  • 设置最大重试次数
  • 监控重试率

Timeout Policies

超时策略

Timeout policies prevent indefinite waiting:
Timeout Types:
  • Request Timeout: End-to-end request timeout
  • Per Try Timeout: Timeout for each retry attempt
  • Idle Timeout: Connection idle timeout
  • Connection Timeout: Initial connection timeout
Timeout Hierarchy:
Overall Request Timeout
├─ Retry 1 (Per Try Timeout)
├─ Retry 2 (Per Try Timeout)
└─ Retry 3 (Per Try Timeout)
Best Practices:
  • Set timeouts based on SLA requirements
  • Use shorter timeouts for critical paths
  • Configure per-try timeouts lower than overall timeout
  • Monitor timeout rates and adjust
超时策略防止无限等待:
超时类型:
  • 请求总超时:端到端请求超时
  • 单次重试超时:每次重试的超时时间
  • 空闲超时:连接空闲超时
  • 连接超时:初始连接超时
超时层级:
总请求超时
├─ 重试1(单次超时)
├─ 重试2(单次超时)
└─ 重试3(单次超时)
最佳实践:
  • 根据SLA要求设置超时
  • 关键路径使用更短的超时
  • 单次超时设置小于总超时
  • 监控超时率并调整

Bulkhead Pattern

舱壁模式

Bulkheads isolate resources to prevent complete system failure:
Implementation:
  • Thread Pools: Separate thread pools per service
  • Connection Pools: Limited connections per upstream
  • Queue Limits: Bounded queues to prevent memory issues
  • Semaphores: Limit concurrent requests
Configuration:
  • Max Connections: Maximum concurrent connections
  • Max Requests Per Connection: HTTP/2 concurrent requests
  • Max Pending Requests: Queue size for pending requests
  • Connection Timeout: Time to establish connection
舱壁模式通过隔离资源来防止系统完全故障:
实现方式:
  • 线程池:每个服务使用独立线程池
  • 连接池:限制上游服务的连接数
  • 队列限制:通过有界队列防止内存问题
  • 信号量:限制并发请求数
配置项:
  • 最大连接数:最大并发连接数
  • 单连接最大请求数:HTTP/2并发请求数
  • 最大待处理请求数:待处理请求队列大小
  • 连接超时:连接建立超时

Rate Limiting

速率限制

Rate limiting protects services from overload:
Rate Limit Types:
  • Global Rate Limiting: Across all instances
  • Local Rate Limiting: Per instance
  • User-Based: Per user or API key
  • Endpoint-Based: Per API endpoint
Algorithms:
  • Token Bucket: Allows bursts while maintaining average rate
  • Leaky Bucket: Smooths out traffic spikes
  • Fixed Window: Simple time-window based limiting
  • Sliding Window: More accurate than fixed window
速率限制防止服务过载:
速率限制类型:
  • 全局速率限制:跨所有实例
  • 本地速率限制:单实例内
  • 基于用户:按用户或API密钥
  • 基于端点:按API端点
算法:
  • 令牌桶:允许突发流量同时维持平均速率
  • 漏桶:平滑流量峰值
  • 固定窗口:基于时间窗口的简单限制
  • 滑动窗口:比固定窗口更精确

Load Balancing

负载均衡

Service-Level Load Balancing

服务级负载均衡

Istio provides intelligent Layer 7 load balancing:
Load Balancing Algorithms:
  1. Round Robin
    • Default algorithm
    • Equal distribution across instances
    • Simple and predictable
    • Good for homogeneous instances
  2. Least Request
    • Routes to instance with fewest active requests
    • Better for heterogeneous instances
    • Adapts to varying response times
    • Requires request tracking overhead
  3. Random
    • Random instance selection
    • No state required
    • Good for large pools
    • Statistical distribution over time
  4. Consistent Hash
    • Hash-based routing (sticky sessions)
    • Same client → same backend
    • Good for caching scenarios
    • Uses headers, cookies, or source IP
Istio提供智能的7层负载均衡:
负载均衡算法:
  1. 轮询
    • 默认算法
    • 实例间均匀分发
    • 简单可预测
    • 适用于同构实例
  2. 最少请求
    • 路由到活跃请求最少的实例
    • 适用于异构实例
    • 适应不同响应时间
    • 需要请求跟踪开销
  3. 随机
    • 随机选择实例
    • 无需状态
    • 适用于大型实例池
    • 长期统计分布均匀
  4. 一致性哈希
    • 基于哈希的路由(会话粘滞)
    • 同一客户端路由到同一后端
    • 适用于缓存场景
    • 使用请求头、Cookie或源IP

Connection Pool Management

连接池管理

Connection pools control resource usage:
TCP Settings:
  • Max Connections: Total connections to upstream
  • Connect Timeout: Connection establishment timeout
  • TCP Keep Alive: Keep-alive probe configuration
HTTP Settings:
  • HTTP1 Max Pending Requests: Queue size
  • HTTP2 Max Requests: Concurrent streams
  • Max Requests Per Connection: Connection reuse limit
  • Max Retries: Outstanding retry budget
连接池控制资源使用:
TCP设置:
  • 最大连接数:到上游服务的总连接数
  • 连接超时:连接建立超时
  • TCP保活:保活探测配置
HTTP设置:
  • HTTP1最大待处理请求数:队列大小
  • HTTP2最大请求数:并发流数
  • 单连接最大请求数:连接复用限制
  • 最大重试数:未完成重试的预算

Health Checking

健康检查

Active and passive health checking:
Passive Health Checking (Outlier Detection):
  • Monitors actual traffic
  • No additional probe overhead
  • Detects failures automatically
  • Ejects unhealthy instances
Active Health Checking:
  • Explicit health probe requests
  • Independent of traffic
  • Configurable intervals
  • Custom health endpoints
主动与被动健康检查:
被动健康检查(异常实例检测):
  • 监控实际流量
  • 无额外探测开销
  • 自动检测故障
  • 剔除不健康实例
主动健康检查:
  • 显式健康探测请求
  • 独立于流量
  • 可配置间隔
  • 自定义健康端点

Security

安全性

Mutual TLS (mTLS)

mTLS(双向TLS)

mTLS provides encryption and authentication for service-to-service communication:
mTLS Benefits:
  • Encryption: All traffic encrypted in transit
  • Authentication: Services authenticate to each other
  • Authorization: Service identity for policy enforcement
  • Certificate Rotation: Automatic certificate management
mTLS Modes:
  • STRICT: Require mTLS for all traffic
  • PERMISSIVE: Accept both mTLS and plaintext (migration mode)
  • DISABLE: No mTLS enforcement
Certificate Management:
  • Automatic certificate issuance via Citadel
  • Short-lived certificates (24 hours default)
  • Automatic rotation
  • SPIFFE-compliant identities
mTLS为服务间通信提供加密和身份认证:
mTLS优势:
  • 加密:所有传输流量加密
  • 身份认证:服务间相互认证
  • 授权:基于服务身份的策略执行
  • 证书轮转:自动证书管理
mTLS模式:
  • STRICT:所有流量强制使用mTLS
  • PERMISSIVE:同时接受mTLS和明文(迁移模式)
  • DISABLE:不强制mTLS
证书管理:
  • 通过Citadel自动颁发证书
  • 短期证书(默认24小时)
  • 自动轮转
  • 符合SPIFFE标准的身份

Authorization Policies

授权策略

Fine-grained access control between services:
Policy Types:
  • ALLOW: Explicitly allow traffic
  • DENY: Explicitly deny traffic
  • CUSTOM: Custom authorization logic
Match Conditions:
  • Source: Source service identity, namespace, IP
  • Destination: Target service, port, path
  • Request: HTTP methods, headers, parameters
  • JWT Claims: Token-based authorization
Policy Hierarchy:
Namespace-level default → Service-level → Specific paths
服务间的细粒度访问控制:
策略类型:
  • ALLOW:显式允许流量
  • DENY:显式拒绝流量
  • CUSTOM:自定义授权逻辑
匹配条件:
  • :源服务身份、命名空间、IP
  • 目标:目标服务、端口、路径
  • 请求:HTTP方法、请求头、参数
  • JWT声明:基于令牌的授权
策略层级:
命名空间级默认策略 → 服务级策略 → 特定路径策略

Authentication Policies

认证策略

Configure authentication requirements:
Peer Authentication:
  • Service-to-service authentication
  • mTLS mode configuration
  • Per-port settings
Request Authentication:
  • End-user authentication
  • JWT validation
  • Custom authentication providers
  • Token forwarding
配置认证要求:
对等认证:
  • 服务间认证
  • mTLS模式配置
  • 按端口设置
请求认证:
  • 终端用户认证
  • JWT验证
  • 自定义认证提供者
  • 令牌转发

Observability

可观测性

Distributed Tracing

分布式追踪

Track requests across microservices:
Key Concepts:
  • Trace: Complete request journey
  • Span: Individual service operation
  • Parent-Child Relationships: Service call hierarchy
  • Trace Context: Propagated metadata
Tracing Backends:
  • Jaeger: CNCF distributed tracing
  • Zipkin: Twitter's distributed tracing
  • Tempo: Grafana's tracing backend
  • AWS X-Ray: AWS distributed tracing
Trace Sampling:
  • Always Sample: 100% sampling (development)
  • Probabilistic: Sample percentage (e.g., 1%)
  • Rate Limiting: Maximum traces per second
  • Adaptive: Dynamic sampling based on traffic
跨微服务追踪请求:
核心概念:
  • 追踪链路:完整的请求旅程
  • Span:单个服务操作
  • 父子关系:服务调用层级
  • 追踪上下文:传播的元数据
追踪后端:
  • Jaeger:CNCF分布式追踪系统
  • Zipkin:Twitter分布式追踪系统
  • Tempo:Grafana追踪后端
  • AWS X-Ray:AWS分布式追踪服务
追踪采样:
  • 全采样:100%采样(开发环境)
  • 概率采样:按比例采样(如1%)
  • 速率限制:每秒最大追踪数
  • 自适应采样:基于流量动态调整

Metrics Collection

指标收集

Istio provides rich metrics automatically:
Service Metrics:
  • Request Rate: Requests per second
  • Error Rate: Percentage of failed requests
  • Duration: Request latency (p50, p95, p99)
  • Request Size: Request/response payload sizes
Infrastructure Metrics:
  • CPU/Memory: Resource utilization
  • Connection Pool: Pool statistics
  • Circuit Breaker: Circuit state and events
  • Retry/Timeout: Retry and timeout rates
Golden Signals:
  1. Latency: How long requests take
  2. Traffic: Request rate
  3. Errors: Error rate
  4. Saturation: Resource utilization
Istio自动提供丰富的指标:
服务指标:
  • 请求速率:每秒请求数
  • 错误率:失败请求百分比
  • 延迟:请求延迟(p50、p95、p99)
  • 请求大小:请求/响应 payload 大小
基础设施指标:
  • CPU/内存:资源利用率
  • 连接池:连接池统计
  • 断路器:断路器状态与事件
  • 重试/超时:重试与超时率
黄金指标:
  1. 延迟:请求耗时
  2. 流量:请求速率
  3. 错误:错误率
  4. 饱和度:资源利用率

Logging

日志

Structured logging for microservices:
Log Types:
  • Access Logs: Request/response logging
  • Application Logs: Service-specific logs
  • Proxy Logs: Envoy sidecar logs
  • Control Plane Logs: Istio component logs
Access Log Format:
json
{
  "timestamp": "2025-10-18T10:30:00Z",
  "method": "GET",
  "path": "/api/users",
  "status": 200,
  "duration_ms": 45,
  "upstream_service": "user-service-v2",
  "trace_id": "abc123",
  "user_agent": "mobile-app/2.1"
}
微服务的结构化日志:
日志类型:
  • 访问日志:请求/响应日志
  • 应用日志:服务特定日志
  • 代理日志:Envoy Sidecar日志
  • 控制平面日志:Istio组件日志
访问日志格式:
json
{
  "timestamp": "2025-10-18T10:30:00Z",
  "method": "GET",
  "path": "/api/users",
  "status": 200,
  "duration_ms": 45,
  "upstream_service": "user-service-v2",
  "trace_id": "abc123",
  "user_agent": "mobile-app/2.1"
}

Kiali Visualization

Kiali可视化

Kiali provides service mesh observability:
Features:
  • Service Graph: Visual topology of services
  • Traffic Flow: Request flow visualization
  • Health Status: Service health indicators
  • Configuration Validation: Istio config validation
  • Distributed Tracing: Integrated Jaeger traces
Kiali提供服务网格可观测性:
特性:
  • 服务图:服务拓扑可视化
  • 流量流:请求流可视化
  • 健康状态:服务健康指标
  • 配置验证:Istio配置验证
  • 分布式追踪:集成Jaeger追踪

Best Practices

最佳实践

Service Design

服务设计

  1. Single Responsibility: Each service does one thing well
  2. API-First Design: Define APIs before implementation
  3. Idempotency: Design idempotent operations for safety
  4. Versioning: Support multiple API versions
  5. Backward Compatibility: Don't break existing clients
  1. 单一职责:每个服务专注于一件事
  2. API优先设计:先定义API再实现
  3. 幂等性:设计安全的幂等操作
  4. 版本化:支持多API版本
  5. 向后兼容:不破坏现有客户端

Deployment Strategies

部署策略

  1. Blue-Green Deployment
    • Maintain two identical environments
    • Switch traffic atomically
    • Easy rollback
    • Higher resource cost
  2. Canary Deployment
    • Gradual rollout to subset of users
    • Monitor metrics before full rollout
    • Lower risk than big-bang
    • More complex orchestration
  3. Rolling Update
    • Gradual replacement of instances
    • No additional resources needed
    • Kubernetes native support
    • Temporary version coexistence
  4. Dark Launch
    • Route shadow traffic to new version
    • Test with production traffic
    • No user impact
    • Validate before real traffic
  1. 蓝绿发布
    • 维护两个相同环境
    • 原子化切换流量
    • 回滚简单
    • 资源成本较高
  2. 金丝雀发布
    • 逐步向部分用户发布
    • 全量发布前监控指标
    • 风险低于一次性发布
    • 编排更复杂
  3. 滚动更新
    • 逐步替换实例
    • 无需额外资源
    • Kubernetes原生支持
    • 临时版本共存
  4. 暗启动
    • 将影子流量路由到新版本
    • 用生产流量测试
    • 无用户影响
    • 正式发布前验证

Resilience Engineering

弹性工程

  1. Design for Failure: Assume services will fail
  2. Fail Fast: Don't wait for timeouts
  3. Graceful Degradation: Partial functionality better than none
  4. Idempotent Retries: Safe to retry operations
  5. Bulkhead Isolation: Isolate failure domains
  6. Circuit Breakers: Prevent cascading failures
  7. Timeouts Everywhere: Never wait indefinitely
  8. Chaos Engineering: Test failure scenarios
  1. 故障设计:假设服务会故障
  2. 快速失败:不等待超时
  3. 优雅降级:部分功能可用优于完全不可用
  4. 幂等重试:操作可安全重试
  5. 舱壁隔离:隔离故障域
  6. 断路器:防止级联故障
  7. 处处超时:绝不无限等待
  8. 混沌工程:测试故障场景

Configuration Management

配置管理

  1. Namespace Isolation: Separate environments (dev, staging, prod)
  2. GitOps: Store configs in Git
  3. Validation: Validate configs before applying
  4. Incremental Rollout: Test configs in dev first
  5. Version Control: Track all config changes
  6. Documentation: Document configuration decisions
  1. 命名空间隔离:分离环境(开发、 staging、生产)
  2. GitOps:配置存储在Git中
  3. 验证:应用前验证配置
  4. 增量发布:先在开发环境测试配置
  5. 版本控制:跟踪所有配置变更
  6. 文档:记录配置决策

Security Best Practices

安全最佳实践

  1. mTLS by Default: Always encrypt service traffic
  2. Least Privilege: Minimal authorization policies
  3. Network Segmentation: Isolate services by namespace
  4. Secret Management: Never hardcode secrets
  5. Regular Updates: Keep Istio and Envoy updated
  6. Audit Logging: Log all authorization decisions
  1. 默认mTLS:始终加密服务流量
  2. 最小权限:最小化授权策略
  3. 网络分段:按命名空间隔离服务
  4. 密钥管理:绝不硬编码密钥
  5. 定期更新:保持Istio和Envoy更新
  6. 审计日志:记录所有授权决策

Monitoring and Alerting

监控与告警

  1. SLI/SLO/SLA: Define service level objectives
  2. Dashboard Design: Focus on actionable metrics
  3. Alert Fatigue: Only alert on actionable items
  4. Error Budgets: Balance reliability and velocity
  5. Runbooks: Document incident response
  6. Post-Mortems: Learn from failures
  1. SLI/SLO/SLA:定义服务级别目标
  2. 仪表盘设计:聚焦可操作指标
  3. 告警疲劳:仅告警可操作事件
  4. 错误预算:平衡可靠性与迭代速度
  5. 运行手册:记录事件响应流程
  6. 事后分析:从故障中学习

Performance Optimization

性能优化

  1. Connection Pooling: Reuse connections
  2. Request Batching: Batch when possible
  3. Caching: Cache at multiple levels
  4. Compression: Enable response compression
  5. Protocol Selection: HTTP/2 or gRPC for efficiency
  6. Resource Limits: Set appropriate limits
  7. Horizontal Scaling: Scale out, not up
  1. 连接池复用:复用连接
  2. 请求批量:尽可能批量处理
  3. 缓存:多级缓存
  4. 压缩:启用响应压缩
  5. 协议选择:使用HTTP/2或gRPC提升效率
  6. 资源限制:设置合适的资源限制
  7. 水平扩展:向外扩展而非向上扩展

Migration Strategy

迁移策略

Strangler Pattern for monolith migration:
Phase 1: Route some traffic to microservices
Monolith (90%) + Microservices (10%)

Phase 2: Gradually increase microservice traffic
Monolith (70%) + Microservices (30%)

Phase 3: Continue migration
Monolith (40%) + Microservices (60%)

Phase 4: Complete migration
Monolith (0%) + Microservices (100%)
用于单体架构迁移的绞杀者模式
阶段1:将部分流量路由到微服务
单体(90%) + 微服务(10%)

阶段2:逐步增加微服务流量
单体(70%) + 微服务(30%)

阶段3:持续迁移
单体(40%) + 微服务(60%)

阶段4:完成迁移
单体(0%) + 微服务(100%)

Common Patterns

常见模式

Pattern 1: API Gateway Pattern

模式1:API网关模式

Single entry point for all client requests:
Components:
  • External gateway (Istio Ingress)
  • Virtual services for routing
  • Rate limiting and authentication
  • TLS termination
Benefits:
  • Simplified client interface
  • Centralized authentication
  • Protocol translation
  • Request aggregation
所有客户端请求的单一入口:
组件:
  • 外部网关(Istio Ingress)
  • 用于路由的虚拟服务
  • 速率限制与身份认证
  • TLS终止
优势:
  • 简化客户端接口
  • 集中式身份认证
  • 协议转换
  • 请求聚合

Pattern 2: Backend for Frontend (BFF)

模式2:前端后端(BFF)

Dedicated backend for each frontend type:
Use Cases:
  • Mobile app has different needs than web
  • Different data aggregation per client
  • Client-specific optimization
  • Reduced over-fetching
Implementation:
  • Separate BFF service per client type
  • Route by user-agent or subdomain
  • Optimize responses per client
  • Independent scaling
为每种前端类型提供专用后端:
使用场景:
  • 移动应用与Web应用需求不同
  • 按客户端聚合不同数据
  • 客户端特定优化
  • 减少过度获取
实现:
  • 每种客户端类型使用独立BFF服务
  • 按User-Agent或子域名路由
  • 按客户端优化响应
  • 独立扩展

Pattern 3: Saga Pattern

模式3:Saga模式

Distributed transaction management:
Choreography-Based:
  • Services publish events
  • Other services react to events
  • No central coordinator
  • Loose coupling
Orchestration-Based:
  • Central orchestrator
  • Explicit transaction flow
  • Easier to understand
  • Single point of coordination
分布式事务管理:
基于编排:
  • 服务发布事件
  • 其他服务响应事件
  • 无中央协调器
  • 松耦合
基于 choreography:
  • 中央协调器
  • 显式事务流程
  • 更易理解
  • 单一协调点

Pattern 4: CQRS (Command Query Responsibility Segregation)

模式4:CQRS(命令查询职责分离)

Separate read and write models:
Benefits:
  • Optimized read and write paths
  • Independent scaling
  • Different data models
  • Event sourcing compatibility
Implementation:
  • Write service updates data
  • Read service queries optimized views
  • Event bus for synchronization
  • Eventually consistent reads
分离读写模型:
优势:
  • 优化读写路径
  • 独立扩展
  • 不同数据模型
  • 兼容事件溯源
实现:
  • 写服务更新数据
  • 读服务查询优化视图
  • 事件总线同步
  • 最终一致性读

Pattern 5: Service Registry Pattern

模式5:服务注册模式

Dynamic service discovery:
Components:
  • Service registry (Kubernetes DNS)
  • Service registration (automatic)
  • Service discovery (Istio pilot)
  • Health checking
Benefits:
  • Dynamic scaling
  • Automatic failover
  • No hardcoded endpoints
  • Location transparency
动态服务发现:
组件:
  • 服务注册中心(Kubernetes DNS)
  • 服务注册(自动)
  • 服务发现(Istio Pilot)
  • 健康检查
优势:
  • 动态扩展
  • 自动故障转移
  • 无硬编码端点
  • 位置透明

Pattern 6: Sidecar Pattern

模式6:Sidecar模式

Deploy auxiliary functionality alongside service:
Common Sidecars:
  • Envoy proxy (traffic management)
  • Log shipper (centralized logging)
  • Metric collector (monitoring)
  • Secret manager (credential injection)
Benefits:
  • Separation of concerns
  • Polyglot support
  • Consistent functionality
  • Independent updates
与服务一同部署辅助功能:
常见Sidecar:
  • Envoy代理(流量管理)
  • 日志收集器(集中式日志)
  • 指标收集器(监控)
  • 密钥管理器(凭证注入)
优势:
  • 关注点分离
  • 多语言支持
  • 功能一致性
  • 独立更新

Pattern 7: Ambassador Pattern

模式7:大使模式

Proxy for external service access:
Use Cases:
  • Legacy system integration
  • External API rate limiting
  • Protocol translation
  • Caching external responses
Implementation:
  • Sidecar for external calls
  • Circuit breaker for external service
  • Retry logic and timeouts
  • Monitoring and logging
访问外部服务的代理:
使用场景:
  • 遗留系统集成
  • 外部API速率限制
  • 协议转换
  • 缓存外部响应
实现:
  • 用于外部调用的Sidecar
  • 外部服务的断路器
  • 重试逻辑与超时
  • 监控与日志

Pattern 8: Anti-Corruption Layer

模式8:防腐层

Isolate legacy system complexity:
Purpose:
  • Translate between domain models
  • Protect new architecture
  • Gradual migration support
  • Legacy system abstraction
Implementation:
  • Adapter service layer
  • Model translation
  • Protocol conversion
  • Versioning support
隔离遗留系统复杂度:
目的:
  • 领域模型间转换
  • 保护新架构
  • 支持逐步迁移
  • 抽象遗留系统
实现:
  • 适配器服务层
  • 模型转换
  • 协议转换
  • 版本支持

Advanced Techniques

高级技术

Multi-Cluster Service Mesh

多集群服务网格

Extend service mesh across multiple clusters:
Use Cases:
  • Multi-region deployment
  • High availability
  • Disaster recovery
  • Compliance requirements
Implementation:
  • Single control plane or multi-primary
  • Service discovery across clusters
  • Cross-cluster load balancing
  • Consistent policies
跨多个集群扩展服务网格:
使用场景:
  • 多区域部署
  • 高可用
  • 灾难恢复
  • 合规要求
实现:
  • 单控制平面或多主控制平面
  • 跨集群服务发现
  • 跨集群负载均衡
  • 一致的策略

Service Mesh Federation

服务网格联邦

Connect multiple independent service meshes:
Scenarios:
  • Multiple teams/organizations
  • Merger and acquisition
  • Legacy mesh migration
  • Different mesh implementations
连接多个独立服务网格:
场景:
  • 多团队/组织
  • 并购场景
  • 遗留网格迁移
  • 不同网格实现

Chaos Engineering

混沌工程

Proactively test system resilience:
Chaos Experiments:
  • Service failures (pods deleted)
  • Network latency injection
  • Error injection (HTTP 503)
  • Resource constraints (CPU/memory)
  • DNS failures
  • Certificate expiration
Tools:
  • Istio fault injection
  • Chaos Mesh
  • Litmus Chaos
  • Gremlin
主动测试系统弹性:
混沌实验:
  • 服务故障(Pod删除)
  • 网络延迟注入
  • 错误注入(HTTP 503)
  • 资源约束(CPU/内存)
  • DNS故障
  • 证书过期
工具:
  • Istio故障注入
  • Chaos Mesh
  • Litmus Chaos
  • Gremlin

GitOps for Service Mesh

GitOps服务网格

Declarative configuration management:
Workflow:
  1. Config changes in Git
  2. Automated validation
  3. Review and approval
  4. Automated deployment
  5. Continuous monitoring
Benefits:
  • Version control
  • Audit trail
  • Disaster recovery
  • Consistency
声明式配置管理:
工作流:
  1. Git中修改配置
  2. 自动验证
  3. 审核与批准
  4. 自动部署
  5. 持续监控
优势:
  • 版本控制
  • 审计追踪
  • 灾难恢复
  • 一致性

Troubleshooting

故障排查

Common Issues

常见问题

Issue 1: Service Not Accessible
  • Check sidecar injection
  • Verify VirtualService configuration
  • Check DestinationRule subsets
  • Validate service discovery
  • Review authorization policies
Issue 2: High Latency
  • Check retry and timeout settings
  • Review connection pool limits
  • Analyze distributed traces
  • Check resource constraints
  • Review load balancing algorithm
Issue 3: Circuit Breaker Not Working
  • Verify outlier detection config
  • Check error thresholds
  • Review consecutive errors setting
  • Validate base ejection time
  • Monitor ejection metrics
Issue 4: mTLS Failures
  • Check PeerAuthentication mode
  • Verify certificate validity
  • Review authorization policies
  • Check namespace mesh config
  • Validate Citadel operation
Issue 5: Traffic Routing Issues
  • Validate VirtualService hosts
  • Check subset definitions
  • Review match conditions
  • Verify gateway configuration
  • Check service selector labels
问题1:服务不可访问
  • 检查Sidecar注入
  • 验证VirtualService配置
  • 检查DestinationRule子集
  • 验证服务发现
  • 查看授权策略
问题2:高延迟
  • 检查重试与超时设置
  • 查看连接池限制
  • 分析分布式追踪
  • 检查资源约束
  • 查看负载均衡算法
问题3:断路器不工作
  • 验证异常实例检测配置
  • 检查错误阈值
  • 查看连续错误设置
  • 验证基础剔除时间
  • 监控剔除指标
问题4:mTLS故障
  • 检查PeerAuthentication模式
  • 验证证书有效性
  • 查看授权策略
  • 检查命名空间网格配置
  • 验证Citadel运行状态
问题5:流量路由问题
  • 验证VirtualService主机
  • 检查子集定义
  • 查看匹配条件
  • 验证网关配置
  • 检查服务选择器标签

Debugging Tools

调试工具

  1. istioctl: CLI for Istio management
    • istioctl analyze
      : Validate configuration
    • istioctl proxy-status
      : Check proxy sync status
    • istioctl proxy-config
      : View proxy configuration
    • istioctl dashboard
      : Access dashboards
  2. kubectl: Kubernetes management
    • Check pod status
    • View logs
    • Port forwarding
    • Resource inspection
  3. Kiali: Service mesh visualization
    • Service graph
    • Traffic flow
    • Configuration validation
    • Distributed tracing
  4. Jaeger: Distributed tracing
    • Request traces
    • Latency analysis
    • Service dependencies
    • Error identification
  5. Prometheus/Grafana: Metrics and visualization
    • Service metrics
    • Custom dashboards
    • Alerting rules
    • Historical analysis
  1. istioctl:Istio管理CLI
    • istioctl analyze
      :验证配置
    • istioctl proxy-status
      :检查代理同步状态
    • istioctl proxy-config
      :查看代理配置
    • istioctl dashboard
      :访问仪表盘
  2. kubectl:Kubernetes管理工具
    • 检查Pod状态
    • 查看日志
    • 端口转发
    • 资源检查
  3. Kiali:服务网格可视化
    • 服务图
    • 流量流
    • 配置验证
    • 分布式追踪
  4. Jaeger:分布式追踪
    • 请求链路
    • 延迟分析
    • 服务依赖
    • 错误识别
  5. Prometheus/Grafana:指标与可视化
    • 服务指标
    • 自定义仪表盘
    • 告警规则
    • 历史分析

Example Scenarios

示例场景

Scenario 1: E-Commerce Microservices

场景1:电商微服务

Architecture:
  • Frontend (React SPA)
  • API Gateway
  • Product Service
  • Cart Service
  • Order Service
  • Payment Service
  • Inventory Service
  • Notification Service
Traffic Management:
  • Canary deployment for new product search
  • Circuit breaker on payment service
  • Retry logic for inventory checks
  • Timeout policies for external payment API
  • Rate limiting on API gateway
Resilience:
  • Graceful degradation if recommendations fail
  • Bulkhead isolation for payment processing
  • Fallback to cached product data
  • Queue for async notifications
架构:
  • 前端(React SPA)
  • API网关
  • 商品服务
  • 购物车服务
  • 订单服务
  • 支付服务
  • 库存服务
  • 通知服务
流量管理:
  • 新商品搜索功能的金丝雀发布
  • 支付服务的断路器
  • 库存检查的重试逻辑
  • 外部支付API的超时策略
  • API网关的速率限制
弹性:
  • 推荐服务故障时优雅降级
  • 支付处理的舱壁隔离
  • 回退到缓存的商品数据
  • 异步通知队列

Scenario 2: Streaming Platform

场景2:流媒体平台

Architecture:
  • Video Service (transcoding)
  • Metadata Service (content info)
  • Recommendation Service (ML-based)
  • User Service (profiles)
  • CDN Integration
  • Analytics Service
Traffic Management:
  • A/B testing for recommendation algorithm
  • Geographic routing to edge services
  • Load balancing based on server capacity
  • Traffic splitting for new video player
Performance:
  • HTTP/2 for reduced latency
  • Connection pooling for database
  • Caching at multiple levels
  • Adaptive bitrate streaming
架构:
  • 视频服务(转码)
  • 元数据服务(内容信息)
  • 推荐服务(基于ML)
  • 用户服务(资料)
  • CDN集成
  • 分析服务
流量管理:
  • 推荐算法的A/B测试
  • 按地理位置路由到边缘服务
  • 基于服务器容量的负载均衡
  • 新视频播放器的流量拆分
性能:
  • HTTP/2降低延迟
  • 数据库连接池
  • 多级缓存
  • 自适应码率流

Scenario 3: Financial Services Platform

场景3:金融服务平台

Architecture:
  • Account Service
  • Transaction Service
  • Fraud Detection Service
  • Reporting Service
  • External Bank Integration
  • Audit Service
Security:
  • Strict mTLS for all services
  • Fine-grained authorization policies
  • Audit logging for compliance
  • Network segmentation by sensitivity
Resilience:
  • Circuit breaker for external banks
  • Idempotent transaction processing
  • Saga pattern for distributed transactions
  • Event sourcing for audit trail
架构:
  • 账户服务
  • 交易服务
  • 欺诈检测服务
  • 报表服务
  • 外部银行集成
  • 审计服务
安全:
  • 所有服务强制mTLS
  • 细粒度授权策略
  • 合规审计日志
  • 按敏感度网络分段
弹性:
  • 外部银行的断路器
  • 幂等交易处理
  • 分布式事务的Saga模式
  • 审计追踪的事件溯源

Integration Patterns

集成模式

Database per Service

每个服务一个数据库

Each microservice owns its database:
Benefits:
  • Independent scaling
  • Technology choice freedom
  • Failure isolation
  • Clear ownership
Challenges:
  • Distributed transactions
  • Data consistency
  • Query complexity
  • Data duplication
Solutions:
  • Event-driven architecture
  • Saga pattern
  • CQRS
  • API composition
每个微服务拥有独立数据库:
优势:
  • 独立扩展
  • 技术选择自由
  • 故障隔离
  • 清晰的所有权
挑战:
  • 分布式事务
  • 数据一致性
  • 查询复杂度
  • 数据重复
解决方案:
  • 事件驱动架构
  • Saga模式
  • CQRS
  • API组合

Event-Driven Architecture

事件驱动架构

Asynchronous communication via events:
Components:
  • Event producers
  • Event bus (Kafka, RabbitMQ)
  • Event consumers
  • Event store
Patterns:
  • Event notification
  • Event-carried state transfer
  • Event sourcing
  • CQRS
通过事件进行异步通信:
组件:
  • 事件生产者
  • 事件总线(Kafka、RabbitMQ)
  • 事件消费者
  • 事件存储
模式:
  • 事件通知
  • 事件携带状态传输
  • 事件溯源
  • CQRS

API Composition

API组合

Aggregate data from multiple services:
Implementation:
  • API Gateway queries services
  • Parallel service calls
  • Response aggregation
  • Error handling
Optimization:
  • Caching
  • Request batching
  • Partial responses
  • Timeout management
聚合多个服务的数据:
实现:
  • API网关查询多个服务
  • 并行服务调用
  • 响应聚合
  • 错误处理
优化:
  • 缓存
  • 请求批量
  • 部分响应
  • 超时管理

Resources and References

资源与参考

Official Documentation

官方文档

Books and Papers

书籍与论文

  • "Building Microservices" by Sam Newman
  • "Microservices Patterns" by Chris Richardson
  • "Production-Ready Microservices" by Susan Fowler
  • "The Art of Scalability" by Martin Abbott
  • 《Building Microservices》 by Sam Newman
  • 《Microservices Patterns》 by Chris Richardson
  • 《Production-Ready Microservices》 by Susan Fowler
  • 《The Art of Scalability》 by Martin Abbott

Tools and Platforms

工具与平台

  • Istio: Service mesh control plane
  • Linkerd: Lightweight service mesh
  • Consul Connect: HashiCorp service mesh
  • AWS App Mesh: AWS-native service mesh
  • Kiali: Service mesh observability
  • Jaeger: Distributed tracing
  • Prometheus: Metrics collection
  • Grafana: Visualization
  • Istio:服务网格控制平面
  • Linkerd:轻量级服务网格
  • Consul Connect:HashiCorp服务网格
  • AWS App Mesh:AWS原生服务网格
  • Kiali:服务网格可观测性
  • Jaeger:分布式追踪
  • Prometheus:指标收集
  • Grafana:可视化

Community Resources

社区资源

  • Istio Blog: https://istio.io/blog
  • CNCF Slack: #istio channel
  • Stack Overflow: [istio] tag
  • GitHub: istio/istio repository

Skill Version: 1.0.0 Last Updated: October 2025 Skill Category: Microservices, Service Mesh, Cloud Native, DevOps Compatible With: Istio 1.20+, Kubernetes 1.28+, Envoy Proxy Prerequisites: Kubernetes knowledge, containerization, networking basics
  • Istio博客:https://istio.io/blog
  • CNCF Slack:#istio频道
  • Stack Overflow:[istio]标签
  • GitHub:istio/istio仓库

技能版本:1.0.0 最后更新:2025年10月 技能分类:微服务、服务网格、云原生、DevOps 兼容版本:Istio 1.20+、Kubernetes 1.28+、Envoy Proxy 前置要求:Kubernetes知识、容器化基础、网络基础