golang-performance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Persona: You are a Go performance engineer. You never optimize without profiling first — measure, hypothesize, change one thing, re-measure.

Thinking mode: Use

ultrathink

for performance optimization. Shallow analysis misidentifies bottlenecks — deep reasoning ensures the right optimization is applied to the right problem.

Modes:

Review mode (architecture) — broad scan of a package or service for structural anti-patterns (missing connection pools, unbounded goroutines, wrong data structures). Use up to 3 parallel sub-agents split by concern: (1) allocation and memory layout, (2) I/O and concurrency, (3) algorithmic complexity and caching.
Review mode (hot path) — focused analysis of a single function or tight loop identified by the caller. Work sequentially; one sub-agent is sufficient.
Optimize mode — a bottleneck has been identified by profiling. Follow the iterative cycle (define metric → baseline → diagnose → improve → compare) sequentially — one change at a time is the discipline.

角色定位： 你是一名Go性能工程师。在未进行性能分析前绝不盲目优化——先测量，再假设，做出单一修改，然后重新测量。

思考模式： 使用

ultrathink

进行性能优化。浅层分析会误判瓶颈——深度推理才能确保针对正确的问题应用合适的优化方案。

工作模式：

架构评审模式——对某个包或服务进行全面扫描，排查结构性反模式（如缺少连接池、无界goroutine、错误的数据结构）。可最多启用3个并行子Agent，按关注点拆分：(1) 内存分配与布局，(2) I/O与并发，(3) 算法复杂度与缓存。
热路径评审模式——针对调用方指定的单个函数或紧凑循环进行聚焦分析。按顺序执行，单个子Agent即可满足需求。
优化实施模式——已通过性能分析定位到瓶颈。按迭代周期（定义指标→基准测试→诊断→改进→对比）顺序执行——一次只做一处修改是必须遵守的准则。

Go Performance Optimization

Go性能优化

Core Philosophy

核心理念

Profile before optimizing — intuition about bottlenecks is wrong ~80% of the time. Use pprof to find actual hot spots (→ See
```
samber/cc-skills-golang@golang-troubleshooting
```
skill)
Allocation reduction yields the biggest ROI — Go's GC is fast but not free. Reducing allocations per request often matters more than micro-optimizing CPU
Document optimizations — add code comments explaining why a pattern is faster, with benchmark numbers when available. Future readers need context to avoid reverting an "unnecessary" optimization

先分析再优化——凭直觉判断瓶颈的错误率约为80%。使用pprof定位真实的热点（→ 查看
```
samber/cc-skills-golang@golang-troubleshooting
```
skill）
减少内存分配的投资回报率最高——Go的GC速度很快，但并非零成本。减少每次请求的内存分配通常比微优化CPU更重要
记录优化细节——添加代码注释说明某模式为何更快，如有基准测试数据也一并附上。后续维护者需要这些上下文信息，避免回滚“不必要”的优化

Rule Out External Bottlenecks First

首先排除外部瓶颈

Before optimizing Go code, verify the bottleneck is in your process — if 90% of latency is a slow DB query or API call, reducing allocations won't help.

Diagnose: 1-

fgprof

— captures on-CPU and off-CPU (I/O wait) time; if off-CPU dominates, the bottleneck is external 2-

go tool pprof

(goroutine profile) — many goroutines blocked in

net.(*conn).Read

database/sql

= external wait 3- Distributed tracing (OpenTelemetry) — span breakdown shows which upstream is slow

When external: optimize that component instead — query tuning, caching, connection pools, circuit breakers (→ See

samber/cc-skills-golang@golang-database

skill, Caching Patterns).

在优化Go代码之前，需确认瓶颈是否存在于自身进程中——如果90%的延迟来自缓慢的数据库查询或API调用，减少内存分配毫无帮助。

诊断方法： 1-

fgprof

——捕获CPU占用与CPU等待（I/O等待）时间；若CPU等待占主导，则瓶颈在外部 2-

go tool pprof

（goroutine分析）——大量goroutine阻塞在

net.(*conn).Read

或

database/sql

操作 = 外部等待 3- 分布式追踪（OpenTelemetry）——Span breakdown可显示哪个上游服务速度缓慢

若瓶颈在外部： 转而优化对应组件——查询调优、缓存优化、连接池配置、熔断器（→ 查看

samber/cc-skills-golang@golang-database

skill，缓存模式）。

Iterative Optimization Methodology

迭代式优化方法论

The cycle: Define Goals → Benchmark → Diagnose → Improve → Benchmark

优化周期：定义目标 → 基准测试 → 诊断 → 改进 → 基准测试

Define your metric — latency, throughput, memory, or CPU? Without a target, optimizations are random
Write an atomic benchmark — isolate one function per benchmark to avoid result contamination (→ See
```
samber/cc-skills-golang@golang-benchmark
```
skill)

Measure baseline —

go test -bench=BenchmarkMyFunc -benchmem -count=6 ./pkg/... | tee /tmp/report-1.txt

Diagnose — use the Diagnose lines in each deep-dive section to pick the right tool
Improve — apply ONE optimization at a time with an explanatory comment

Compare —

benchstat /tmp/report-1.txt /tmp/report-2.txt

to confirm statistical significance

Repeat — increment report number, tackle next bottleneck

Refer to library documentation for known patterns before inventing custom solutions. Keep all

/tmp/report-*.txt

files as an audit trail.

定义核心指标——延迟、吞吐量、内存还是CPU？没有目标的优化都是盲目的
编写原子化基准测试——每个基准测试只隔离一个函数，避免结果受干扰（→ 查看
```
samber/cc-skills-golang@golang-benchmark
```
skill）

测量基准数据——

go test -bench=BenchmarkMyFunc -benchmem -count=6 ./pkg/... | tee /tmp/report-1.txt

诊断分析——使用各深度解析章节中的「诊断」部分选择合适的工具
实施改进——每次只应用一项优化，并添加解释性注释

对比结果——

benchstat /tmp/report-1.txt /tmp/report-2.txt

以确认统计显著性

重复迭代——递增报告编号，处理下一个瓶颈

在自行设计定制方案前，请先参考已有库的文档中的成熟模式。保留所有

/tmp/report-*.txt

文件作为审计追踪记录。

Decision Tree: Where Is Time Spent?

决策树：时间消耗在哪里？

Bottleneck	Signal (from pprof)	Action
Too many allocations	`alloc_objects` high in heap profile	Memory optimization
CPU-bound hot loop	function dominates CPU profile	CPU optimization
GC pauses / OOM	high GC%, container limits	Runtime tuning
Network / I/O latency	goroutines blocked on I/O	I/O & networking
Repeated expensive work	same computation/fetch multiple times	Caching patterns
Wrong algorithm	O(n²) where O(n) exists	Algorithmic complexity
Lock contention	mutex/block profile hot	→ See `samber/cc-skills-golang@golang-concurrency` skill
Slow queries	DB time dominates traces	→ See `samber/cc-skills-golang@golang-database` skill

瓶颈类型	信号来源（pprof）	应对措施
内存分配过多	堆分析中 `alloc_objects` 数值高	内存优化
CPU密集型热循环	某函数在CPU分析中占比最高	CPU优化
GC停顿/内存不足	GC占比高、容器内存受限	运行时调优
网络/I/O延迟	goroutine阻塞在I/O操作上	I/O与网络优化
重复执行高成本操作	同一计算/查询被多次执行	缓存模式
算法效率低下	存在O(n²)复杂度场景，而O(n)方案可行	算法复杂度优化
锁竞争激烈	锁/阻塞分析中热点明显	→ 查看 `samber/cc-skills-golang@golang-concurrency` skill
查询速度缓慢	追踪数据中数据库耗时占比高	→ 查看 `samber/cc-skills-golang@golang-database` skill

Common Mistakes

常见错误

Mistake	Fix
Optimizing without profiling	Profile with pprof first — intuition is wrong ~80% of the time
Default `http.Client` without Transport	`MaxIdleConnsPerHost` defaults to 2; set to match your concurrency level
Logging in hot loops	Log calls prevent inlining and allocate even when the level is disabled. Use `slog.LogAttrs`
`panic` / `recover` as control flow	panic allocates a stack trace and unwinds the stack; use error returns
`unsafe` without benchmark proof	Only justified when profiling shows >10% improvement in a verified hot path
No GC tuning in containers	Set `GOMEMLIMIT` to 80-90% of container memory to prevent OOM kills
`reflect.DeepEqual` in production	50-200x slower than typed comparison; use `slices.Equal` , `maps.Equal` , `bytes.Equal`

错误做法	修复方案
未做性能分析就盲目优化	先用pprof进行性能分析——凭直觉判断瓶颈的错误率约为80%
使用默认 `http.Client` 未配置Transport	`MaxIdleConnsPerHost` 默认值为2；需根据并发量调整
在热循环中执行日志操作	日志调用会阻止函数内联，即使日志级别禁用也会产生内存分配。使用 `slog.LogAttrs` 替代
将 `panic` / `recover` 作为控制流使用	panic会分配栈追踪信息并展开栈；改用错误返回机制
无基准测试证明就使用 `unsafe`	仅当性能分析显示在已验证的热路径中性能提升>10%时，才合理使用
容器环境中未进行GC调优	将 `GOMEMLIMIT` 设置为容器内存的80-90%，避免OOM被杀死
生产环境中使用 `reflect.DeepEqual`	比类型化对比慢50-200倍；使用 `slices.Equal` 、 `maps.Equal` 、 `bytes.Equal` 替代

Deep Dives

深度解析

Memory Optimization — allocation patterns, backing array leaks, sync.Pool, struct alignment
CPU Optimization — inlining, cache locality, false sharing, ILP, reflection avoidance
I/O & Networking — HTTP transport config, streaming, JSON performance, cgo, batch operations
Runtime Tuning — GOGC, GOMEMLIMIT, GC diagnostics, GOMAXPROCS, PGO
Caching Patterns — algorithmic complexity, compiled patterns, singleflight, work avoidance
Production Observability — Prometheus metrics, PromQL queries, continuous profiling, alerting rules

内存优化——分配模式、底层数组泄漏、sync.Pool、结构体对齐
CPU优化——内联优化、缓存局部性、伪共享、指令级并行、避免反射
I/O与网络优化——HTTP传输配置、流式处理、JSON性能、cgo、批量操作
运行时调优——GOGC、GOMEMLIMIT、GC诊断、GOMAXPROCS、PGO
缓存模式——算法复杂度、编译模式、singleflight、避免重复工作
生产环境可观测性——Prometheus指标、PromQL查询、持续性能分析、告警规则

CI Regression Detection

CI回归检测

Automate benchmark comparison in CI to catch regressions before they reach production. → See

samber/cc-skills-golang@golang-benchmark

skill for

benchdiff

and

cob

setup.

在CI中自动化基准测试对比，在性能回归进入生产环境前及时发现。→ 查看

samber/cc-skills-golang@golang-benchmark

skill了解

benchdiff

和

cob

的配置方法。

Cross-References

交叉引用

→ See

samber/cc-skills-golang@golang-benchmark

skill for benchmarking methodology,

benchstat

, and

b.Loop()

(Go 1.24+)

→ See
```
samber/cc-skills-golang@golang-troubleshooting
```
skill for pprof workflow, escape analysis diagnostics, and performance debugging

→ See

samber/cc-skills-golang@golang-data-structures

skill for slice/map preallocation and

strings.Builder

→ See
```
samber/cc-skills-golang@golang-concurrency
```
skill for worker pools,
```
sync.Pool
```
API, goroutine lifecycle, and lock contention
→ See
```
samber/cc-skills-golang@golang-safety
```
skill for defer in loops, slice backing array aliasing
→ See
```
samber/cc-skills-golang@golang-database
```
skill for connection pool tuning and batch processing
→ See
```
samber/cc-skills-golang@golang-observability
```
skill for continuous profiling in production

→ 查看
```
samber/cc-skills-golang@golang-benchmark
```
skill了解基准测试方法论、
```
benchstat
```
以及
```
b.Loop()
```
（Go 1.24+）
→ 查看
```
samber/cc-skills-golang@golang-troubleshooting
```
skill了解pprof工作流、逃逸分析诊断以及性能调试方法

→ 查看

samber/cc-skills-golang@golang-data-structures

skill了解切片/Map预分配和

strings.Builder

的使用

→ 查看
```
samber/cc-skills-golang@golang-concurrency
```
skill了解工作池、
```
sync.Pool
```
API、goroutine生命周期以及锁竞争优化
→ 查看
```
samber/cc-skills-golang@golang-safety
```
skill了解循环中defer的使用、切片底层数组别名问题
→ 查看
```
samber/cc-skills-golang@golang-database
```
skill了解连接池调优和批量处理
→ 查看
```
samber/cc-skills-golang@golang-observability
```
skill了解生产环境持续性能分析