scale-game

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Scale Game

规模测试法

Overview

概述

Test your approach at extreme scales to find what breaks and what surprisingly survives.
Core principle: Extremes expose fundamental truths hidden at normal scales.
在极端规模下测试你的方案,找出哪些会出现故障,哪些又能出乎意料地正常运行。
核心原则: 极端条件能够暴露常规规模下隐藏的核心本质。

Quick Reference

快速参考

Scale DimensionTest At ExtremesWhat It Reveals
Volume1 item vs 1B itemsAlgorithmic complexity limits
SpeedInstant vs 1 yearAsync requirements, caching needs
Users1 user vs 1B usersConcurrency issues, resource limits
DurationMilliseconds vs yearsMemory leaks, state growth
Failure rateNever fails vs always failsError handling adequacy
尺度维度极端条件测试可暴露的问题
容量1条数据 vs 10亿条数据算法复杂度上限
速度瞬时响应 vs 1年延迟异步需求、缓存必要性
用户量1个用户 vs 10亿个用户并发问题、资源上限
持续时长毫秒级运行 vs 多年运行内存泄漏、状态膨胀问题
故障率从不故障 vs 每次都故障错误处理机制完备性

Process

执行流程

  1. Pick dimension - What could vary extremely?
  2. Test minimum - What if this was 1000x smaller/faster/fewer?
  3. Test maximum - What if this was 1000x bigger/slower/more?
  4. Note what breaks - Where do limits appear?
  5. Note what survives - What's fundamentally sound?
  1. 选择维度 - 哪些指标可能出现极端变化?
  2. 最小值测试 - 如果该指标缩小/变快/减少1000倍会怎么样?
  3. 最大值测试 - 如果该指标放大/变慢/增加1000倍会怎么样?
  4. 记录故障点 - 系统上限出现在哪里?
  5. 记录幸存项 - 哪些设计是本质上合理的?

Examples

示例

Example 1: Error Handling

示例1:错误处理

Normal scale: "Handle errors when they occur" works fine At 1B scale: Error volume overwhelms logging, crashes system Reveals: Need to make errors impossible (type systems) or expect them (chaos engineering)
常规规模: "出现错误时再处理"的逻辑运行正常 10亿级规模下: 错误数量压垮日志系统,导致服务崩溃 暴露的问题: 需要从根源避免错误(类型系统)或者预先考虑错误发生的可能性(混沌工程)

Example 2: Synchronous APIs

示例2:同步API

Normal scale: Direct function calls work At global scale: Network latency makes synchronous calls unusable Reveals: Async/messaging becomes survival requirement, not optimization
常规规模: 直接函数调用可以正常运行 全球级规模下: 网络延迟导致同步调用完全不可用 暴露的问题: 异步/消息队列不再是优化项,而是生存必选项

Example 3: In-Memory State

示例3:内存中状态存储

Normal duration: Works for hours/days At years: Memory grows unbounded, eventual crash Reveals: Need persistence or periodic cleanup, can't rely on memory
常规运行时长: 数小时/数天内运行正常 多年运行场景下: 内存无限制增长,最终崩溃 暴露的问题: 需要持久化或者定期清理机制,不能完全依赖内存

Red Flags You Need This

需要使用该方法的警示信号

  • "It works in dev" (but will it work in production?)
  • No idea where limits are
  • "Should scale fine" (without testing)
  • Surprised by production behavior
  • "开发环境能正常运行"(但生产环境真的可以吗?)
  • 完全不知道系统上限在哪里
  • 未经过测试就断言"应该能正常扩容"
  • 生产环境的运行表现总是出乎意料

Remember

注意要点

  • Extremes reveal fundamentals
  • What works at one scale fails at another
  • Test both directions (bigger AND smaller)
  • Use insights to validate architecture early
  • 极端条件暴露核心问题
  • 在某一规模下可行的方案在另一规模下可能失效
  • 两个方向都要测试(更大 AND 更小)
  • 利用测试结论尽早验证架构合理性