high-concurrency-scalability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

High Concurrency & Scalability

高并发与可扩展性

When to Use

适用场景

Choose or refactor concurrency models—threads, async/await, actors, coroutines—for target throughput and latency
Reduce lock contention and design low-contention, lock-free, or partitioned data paths
Size connection pools, file descriptors, thread pools, and memory limits per dependency
Design caching layers, TTL strategy, and stampede / thundering-herd mitigation
Plan horizontal scaling, load balancing, session affinity, and stateless vs sticky tradeoffs
Apply backpressure, bounded queues, rate limiting, and bulkheads under overload
Scale the data layer—read replicas, routing, sharding concepts, pool tuning, hot keys
Profile bottlenecks, model capacity, and tie scale triggers to SLOs and error budgets
Define autoscaling signals, warm pools, and cold-start vs cost tradeoffs
Architect multi-region read paths and CDN/edge caching at a design level

选择或重构并发模型——线程、async/await、Actor、协程——以满足目标吞吐量和延迟要求
减少锁竞争，设计低竞争、无锁或分区的数据路径
根据依赖项调整连接池、文件描述符、线程池和内存限制的大小
设计缓存层、TTL策略，以及缓解缓存击穿/惊群问题的方案
规划水平扩展、负载均衡、会话亲和性，以及无状态与粘性会话的权衡
在过载场景下应用背压、有界队列、限流和舱壁模式
扩展数据层——只读副本、路由、分片概念、池调优、热点键处理
分析性能瓶颈，构建容量模型，并将伸缩触发条件与SLO和错误预算绑定
定义自动扩缩容信号、预热池，以及冷启动与成本的权衡
在设计层面构建多区域读取路径和CDN/边缘缓存

When NOT to Use

不适用场景

Decompose monoliths into bounded contexts and inter-service contracts only →
```
microservices-developer
```
Event schemas, broker selection, and messaging topology only →
```
event-driven-architecture
```
General feature delivery, RFCs, or CRUD without scale focus →
```
senior-software-engineer
```
Org-wide SLO program, on-call, incident response, and error-budget policy →
```
site-reliability-engineer
```
Deep flame graphs, load-test harnesses, and p99 regression hunts as the main task →
```
performance-engineer
```
Kubernetes platform golden paths and IDP product work →
```
platform-engineer
```
VPC, managed service provisioning, and landing-zone IaC →
```
cloud-engineer
```
Cloud spend optimization and unit economics only →
```
cloud-economist
```
,
```
finops-analyst
```

仅涉及将单体应用拆分为限界上下文和服务间契约 →
```
microservices-developer
```
仅涉及事件 schema、消息代理选择和消息拓扑 →
```
event-driven-architecture
```
仅涉及通用功能交付、RFC或无伸缩性关注的CRUD →
```
senior-software-engineer
```
仅涉及全组织范围的SLO计划、值班、事件响应和错误预算政策 →
```
site-reliability-engineer
```
主要任务为深度火焰图分析、负载测试工具开发和p99延迟回归排查 →
```
performance-engineer
```
仅涉及Kubernetes平台黄金路径和IDP产品工作 →
```
platform-engineer
```
仅涉及VPC、托管服务配置和着陆区IaC →
```
cloud-engineer
```
仅涉及云支出优化和单位经济分析 →
```
cloud-economist
```
,
```
finops-analyst
```

Related skills

Need	Skill
Service boundaries, sagas, circuit breakers between services	`microservices-developer`
Brokers, topics, event contracts, outbox	`event-driven-architecture`
Profiling, load/soak tests, latency budgets	`performance-engineer`
SLI/SLO programs, incident reliability, toil	`site-reliability-engineer`
Internal platform, K8s abstractions, golden paths	`platform-engineer`
Cloud compute, networking, DR multi-region deploy	`cloud-engineer`
Application design and refactoring	`senior-software-engineer`

需求	技能
服务边界、事务补偿（sagas）、服务间断路器	`microservices-developer`
消息代理、主题、事件契约、发件箱模式	`event-driven-architecture`
性能分析、负载/Soak测试、延迟预算	`performance-engineer`
SLI/SLO计划、事件可靠性、重复性工作（toil）	`site-reliability-engineer`
内部平台、K8s抽象、黄金路径	`platform-engineer`
云计算、网络、多区域灾难恢复部署	`cloud-engineer`
应用设计与重构	`senior-software-engineer`

Core Workflows

核心工作流

1. Scope and constraints

1. 范围与约束

Clarify traffic shape, SLOs, statefulness, and failure modes.

See
references/high_concurrency_scalability_scope.md
.

明确流量形态、SLO、状态性和故障模式。

详见
references/high_concurrency_scalability_scope.md
。

2. Concurrency and synchronization

2. 并发与同步

Pick execution model; partition work; minimize shared mutable state.

See
references/concurrency_models_and_synchronization.md
.

选择执行模型；划分工作；最小化共享可变状态。

详见
references/concurrency_models_and_synchronization.md
。

3. Caching and data-layer scale

3. 缓存与数据层扩展

Cache hierarchy, replica routing, sharding and hot-key mitigation.

See
references/caching_and_data_layer_scale.md
.

缓存层级、副本路由、分片与热点键缓解方案。

详见
references/caching_and_data_layer_scale.md
。

4. Throughput, backpressure, and queues

4. 吞吐量、背压与队列

Bounded queues, shedding, rate limits, and async pipelines.

See
references/throughput_backpressure_and_queues.md
.

有界队列、流量削峰、限流和异步流水线。

详见
references/throughput_backpressure_and_queues.md
。

5. Horizontal scale and load distribution

5. 水平扩展与负载分发

Replicas, LB algorithms, affinity, autoscaling triggers.

See
references/horizontal_scaling_and_load_distribution.md
.

副本、负载均衡算法、亲和性、自动扩缩容触发条件。

详见
references/horizontal_scaling_and_load_distribution.md
。

6. Capacity, observability, and SLO-driven scale

6. 容量、可观测性与基于SLO的伸缩

Metrics, headroom models, scale policies tied to objectives.

See
references/capacity_planning_observability_slo.md
.

指标、预留容量模型、与目标绑定的伸缩策略。

详见
references/capacity_planning_observability_slo.md
。

Outputs

输出成果

Scale brief — workload profile, bottlenecks, target RPS/latency, state assumptions
Concurrency note — model choice, pool sizes, contention risks, partitioning plan
Cache and data plan — layers, TTL, invalidation, replica/shard routing, hot-key mitigations
Overload matrix — backpressure, rate limits, bulkheads, degradation modes
Capacity model — headroom, scale triggers, cold-start impact, cost sensitivity
Observability checklist — saturation, queue depth, pool wait, cache hit rate, tail latency

伸缩简报 — 工作负载概况、瓶颈、目标RPS/延迟、状态假设
并发说明 — 模型选择、池大小、竞争风险、分区方案
缓存与数据计划 — 层级、TTL、失效策略、副本/分片路由、热点键缓解方案
过载处理矩阵 — 背压、限流、舱壁、降级模式
容量模型 — 预留容量、伸缩触发条件、冷启动影响、成本敏感度
可观测性检查清单 — 饱和度、队列深度、池等待时间、缓存命中率、尾部延迟

Principles

原则

Measure saturation—CPU, memory, I/O, pool wait, queue depth—not averages alone
Bound everything—connections, threads, queue length, in-flight requests
Prefer partition over lock—shard by key, actor mailbox, or isolated replica
Design for overload—shed load deliberately; never unbounded retry or queue growth
Scale on SLO signals—error rate and tail latency, not CPU alone
Keep hot paths stateless where possible; isolate stateful tiers explicitly

测量饱和度——关注CPU、内存、I/O、池等待时间、队列深度，而非仅依赖平均值
限制所有资源——连接数、线程数、队列长度、在处理请求数
优先分区而非锁——按键分片、Actor邮箱或独立副本
为过载场景设计——主动削峰；绝不允许无限制重试或队列增长
基于SLO信号伸缩——依据错误率和尾部延迟，而非仅依赖CPU使用率
尽可能保持热路径无状态；明确隔离有状态层级