golang-performance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePersona: You are a Go performance engineer. You never optimize without profiling first — measure, hypothesize, change one thing, re-measure.
Thinking mode: Use for performance optimization. Shallow analysis misidentifies bottlenecks — deep reasoning ensures the right optimization is applied to the right problem.
ultrathinkModes:
- Review mode (architecture) — broad scan of a package or service for structural anti-patterns (missing connection pools, unbounded goroutines, wrong data structures). Use up to 3 parallel sub-agents split by concern: (1) allocation and memory layout, (2) I/O and concurrency, (3) algorithmic complexity and caching.
- Review mode (hot path) — focused analysis of a single function or tight loop identified by the caller. Work sequentially; one sub-agent is sufficient.
- Optimize mode — a bottleneck has been identified by profiling. Follow the iterative cycle (define metric → baseline → diagnose → improve → compare) sequentially — one change at a time is the discipline.
角色定位: 你是一名Go性能工程师。在未进行性能分析前绝不盲目优化——先测量,再假设,做出单一修改,然后重新测量。
思考模式: 使用进行性能优化。浅层分析会误判瓶颈——深度推理才能确保针对正确的问题应用合适的优化方案。
ultrathink工作模式:
- 架构评审模式——对某个包或服务进行全面扫描,排查结构性反模式(如缺少连接池、无界goroutine、错误的数据结构)。可最多启用3个并行子Agent,按关注点拆分:(1) 内存分配与布局,(2) I/O与并发,(3) 算法复杂度与缓存。
- 热路径评审模式——针对调用方指定的单个函数或紧凑循环进行聚焦分析。按顺序执行,单个子Agent即可满足需求。
- 优化实施模式——已通过性能分析定位到瓶颈。按迭代周期(定义指标→基准测试→诊断→改进→对比)顺序执行——一次只做一处修改是必须遵守的准则。
Go Performance Optimization
Go性能优化
Core Philosophy
核心理念
- Profile before optimizing — intuition about bottlenecks is wrong ~80% of the time. Use pprof to find actual hot spots (→ See skill)
samber/cc-skills-golang@golang-troubleshooting - Allocation reduction yields the biggest ROI — Go's GC is fast but not free. Reducing allocations per request often matters more than micro-optimizing CPU
- Document optimizations — add code comments explaining why a pattern is faster, with benchmark numbers when available. Future readers need context to avoid reverting an "unnecessary" optimization
- 先分析再优化——凭直觉判断瓶颈的错误率约为80%。使用pprof定位真实的热点(→ 查看skill)
samber/cc-skills-golang@golang-troubleshooting - 减少内存分配的投资回报率最高——Go的GC速度很快,但并非零成本。减少每次请求的内存分配通常比微优化CPU更重要
- 记录优化细节——添加代码注释说明某模式为何更快,如有基准测试数据也一并附上。后续维护者需要这些上下文信息,避免回滚“不必要”的优化
Rule Out External Bottlenecks First
首先排除外部瓶颈
Before optimizing Go code, verify the bottleneck is in your process — if 90% of latency is a slow DB query or API call, reducing allocations won't help.
Diagnose: 1- — captures on-CPU and off-CPU (I/O wait) time; if off-CPU dominates, the bottleneck is external 2- (goroutine profile) — many goroutines blocked in or = external wait 3- Distributed tracing (OpenTelemetry) — span breakdown shows which upstream is slow
fgprofgo tool pprofnet.(*conn).Readdatabase/sqlWhen external: optimize that component instead — query tuning, caching, connection pools, circuit breakers (→ See skill, Caching Patterns).
samber/cc-skills-golang@golang-database在优化Go代码之前,需确认瓶颈是否存在于自身进程中——如果90%的延迟来自缓慢的数据库查询或API调用,减少内存分配毫无帮助。
诊断方法: 1- ——捕获CPU占用与CPU等待(I/O等待)时间;若CPU等待占主导,则瓶颈在外部 2- (goroutine分析)——大量goroutine阻塞在或操作 = 外部等待 3- 分布式追踪(OpenTelemetry)——Span breakdown可显示哪个上游服务速度缓慢
fgprofgo tool pprofnet.(*conn).Readdatabase/sql若瓶颈在外部: 转而优化对应组件——查询调优、缓存优化、连接池配置、熔断器(→ 查看 skill,缓存模式)。
samber/cc-skills-golang@golang-databaseIterative Optimization Methodology
迭代式优化方法论
The cycle: Define Goals → Benchmark → Diagnose → Improve → Benchmark
优化周期:定义目标 → 基准测试 → 诊断 → 改进 → 基准测试
- Define your metric — latency, throughput, memory, or CPU? Without a target, optimizations are random
- Write an atomic benchmark — isolate one function per benchmark to avoid result contamination (→ See skill)
samber/cc-skills-golang@golang-benchmark - Measure baseline —
go test -bench=BenchmarkMyFunc -benchmem -count=6 ./pkg/... | tee /tmp/report-1.txt - Diagnose — use the Diagnose lines in each deep-dive section to pick the right tool
- Improve — apply ONE optimization at a time with an explanatory comment
- Compare — to confirm statistical significance
benchstat /tmp/report-1.txt /tmp/report-2.txt - Repeat — increment report number, tackle next bottleneck
Refer to library documentation for known patterns before inventing custom solutions. Keep all files as an audit trail.
/tmp/report-*.txt- 定义核心指标——延迟、吞吐量、内存还是CPU?没有目标的优化都是盲目的
- 编写原子化基准测试——每个基准测试只隔离一个函数,避免结果受干扰(→ 查看skill)
samber/cc-skills-golang@golang-benchmark - 测量基准数据——
go test -bench=BenchmarkMyFunc -benchmem -count=6 ./pkg/... | tee /tmp/report-1.txt - 诊断分析——使用各深度解析章节中的「诊断」部分选择合适的工具
- 实施改进——每次只应用一项优化,并添加解释性注释
- 对比结果——以确认统计显著性
benchstat /tmp/report-1.txt /tmp/report-2.txt - 重复迭代——递增报告编号,处理下一个瓶颈
在自行设计定制方案前,请先参考已有库的文档中的成熟模式。保留所有文件作为审计追踪记录。
/tmp/report-*.txtDecision Tree: Where Is Time Spent?
决策树:时间消耗在哪里?
| Bottleneck | Signal (from pprof) | Action |
|---|---|---|
| Too many allocations | | Memory optimization |
| CPU-bound hot loop | function dominates CPU profile | CPU optimization |
| GC pauses / OOM | high GC%, container limits | Runtime tuning |
| Network / I/O latency | goroutines blocked on I/O | I/O & networking |
| Repeated expensive work | same computation/fetch multiple times | Caching patterns |
| Wrong algorithm | O(n²) where O(n) exists | Algorithmic complexity |
| Lock contention | mutex/block profile hot | → See |
| Slow queries | DB time dominates traces | → See |
| 瓶颈类型 | 信号来源(pprof) | 应对措施 |
|---|---|---|
| 内存分配过多 | 堆分析中 | 内存优化 |
| CPU密集型热循环 | 某函数在CPU分析中占比最高 | CPU优化 |
| GC停顿/内存不足 | GC占比高、容器内存受限 | 运行时调优 |
| 网络/I/O延迟 | goroutine阻塞在I/O操作上 | I/O与网络优化 |
| 重复执行高成本操作 | 同一计算/查询被多次执行 | 缓存模式 |
| 算法效率低下 | 存在O(n²)复杂度场景,而O(n)方案可行 | 算法复杂度优化 |
| 锁竞争激烈 | 锁/阻塞分析中热点明显 | → 查看 |
| 查询速度缓慢 | 追踪数据中数据库耗时占比高 | → 查看 |
Common Mistakes
常见错误
| Mistake | Fix |
|---|---|
| Optimizing without profiling | Profile with pprof first — intuition is wrong ~80% of the time |
Default | |
| Logging in hot loops | Log calls prevent inlining and allocate even when the level is disabled. Use |
| panic allocates a stack trace and unwinds the stack; use error returns |
| Only justified when profiling shows >10% improvement in a verified hot path |
| No GC tuning in containers | Set |
| 50-200x slower than typed comparison; use |
| 错误做法 | 修复方案 |
|---|---|
| 未做性能分析就盲目优化 | 先用pprof进行性能分析——凭直觉判断瓶颈的错误率约为80% |
使用默认 | |
| 在热循环中执行日志操作 | 日志调用会阻止函数内联,即使日志级别禁用也会产生内存分配。使用 |
将 | panic会分配栈追踪信息并展开栈;改用错误返回机制 |
无基准测试证明就使用 | 仅当性能分析显示在已验证的热路径中性能提升>10%时,才合理使用 |
| 容器环境中未进行GC调优 | 将 |
生产环境中使用 | 比类型化对比慢50-200倍;使用 |
Deep Dives
深度解析
- Memory Optimization — allocation patterns, backing array leaks, sync.Pool, struct alignment
- CPU Optimization — inlining, cache locality, false sharing, ILP, reflection avoidance
- I/O & Networking — HTTP transport config, streaming, JSON performance, cgo, batch operations
- Runtime Tuning — GOGC, GOMEMLIMIT, GC diagnostics, GOMAXPROCS, PGO
- Caching Patterns — algorithmic complexity, compiled patterns, singleflight, work avoidance
- Production Observability — Prometheus metrics, PromQL queries, continuous profiling, alerting rules
- 内存优化——分配模式、底层数组泄漏、sync.Pool、结构体对齐
- CPU优化——内联优化、缓存局部性、伪共享、指令级并行、避免反射
- I/O与网络优化——HTTP传输配置、流式处理、JSON性能、cgo、批量操作
- 运行时调优——GOGC、GOMEMLIMIT、GC诊断、GOMAXPROCS、PGO
- 缓存模式——算法复杂度、编译模式、singleflight、避免重复工作
- 生产环境可观测性——Prometheus指标、PromQL查询、持续性能分析、告警规则
CI Regression Detection
CI回归检测
Automate benchmark comparison in CI to catch regressions before they reach production. → See skill for and setup.
samber/cc-skills-golang@golang-benchmarkbenchdiffcob在CI中自动化基准测试对比,在性能回归进入生产环境前及时发现。→ 查看 skill了解和的配置方法。
samber/cc-skills-golang@golang-benchmarkbenchdiffcobCross-References
交叉引用
- → See skill for benchmarking methodology,
samber/cc-skills-golang@golang-benchmark, andbenchstat(Go 1.24+)b.Loop() - → See skill for pprof workflow, escape analysis diagnostics, and performance debugging
samber/cc-skills-golang@golang-troubleshooting - → See skill for slice/map preallocation and
samber/cc-skills-golang@golang-data-structuresstrings.Builder - → See skill for worker pools,
samber/cc-skills-golang@golang-concurrencyAPI, goroutine lifecycle, and lock contentionsync.Pool - → See skill for defer in loops, slice backing array aliasing
samber/cc-skills-golang@golang-safety - → See skill for connection pool tuning and batch processing
samber/cc-skills-golang@golang-database - → See skill for continuous profiling in production
samber/cc-skills-golang@golang-observability
- → 查看skill了解基准测试方法论、
samber/cc-skills-golang@golang-benchmark以及benchstat(Go 1.24+)b.Loop() - → 查看skill了解pprof工作流、逃逸分析诊断以及性能调试方法
samber/cc-skills-golang@golang-troubleshooting - → 查看skill了解切片/Map预分配和
samber/cc-skills-golang@golang-data-structures的使用strings.Builder - → 查看skill了解工作池、
samber/cc-skills-golang@golang-concurrencyAPI、goroutine生命周期以及锁竞争优化sync.Pool - → 查看skill了解循环中defer的使用、切片底层数组别名问题
samber/cc-skills-golang@golang-safety - → 查看skill了解连接池调优和批量处理
samber/cc-skills-golang@golang-database - → 查看skill了解生产环境持续性能分析
samber/cc-skills-golang@golang-observability