performance-profiler
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePerformance Profiler
性能分析器
Tier: POWERFUL
Category: Engineering
Domain: Performance Engineering
Category: Engineering
Domain: Performance Engineering
层级: POWERFUL
分类: 工程
领域: 性能工程
分类: 工程
领域: 性能工程
Overview
概述
Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.
针对Node.js、Python和Go应用的系统化性能分析。可识别CPU、内存和I/O瓶颈;生成火焰图(flamegraphs);分析包体积;优化数据库查询;检测内存泄漏;并通过k6和Artillery运行负载测试。始终在优化前后进行性能测量。
Core Capabilities
核心能力
- CPU profiling — flamegraphs for Node.js, py-spy for Python, pprof for Go
- Memory profiling — heap snapshots, leak detection, GC pressure
- Bundle analysis — webpack-bundle-analyzer, Next.js bundle analyzer
- Database optimization — EXPLAIN ANALYZE, slow query log, N+1 detection
- Load testing — k6 scripts, Artillery scenarios, ramp-up patterns
- Before/after measurement — establish baseline, profile, optimize, verify
- CPU分析 — Node.js火焰图、Python的py-spy、Go的pprof
- 内存分析 — 堆快照、泄漏检测、GC压力
- 包分析 — webpack-bundle-analyzer、Next.js包分析器
- 数据库优化 — EXPLAIN ANALYZE、慢查询日志、N+1查询检测
- 负载测试 — k6脚本、Artillery场景、流量递增模式
- 优化前后测量 — 建立基准、分析、优化、验证
When to Use
使用场景
- App is slow and you don't know where the bottleneck is
- P99 latency exceeds SLA before a release
- Memory usage grows over time (suspected leak)
- Bundle size increased after adding dependencies
- Preparing for a traffic spike (load test before launch)
- Database queries taking >100ms
- 应用运行缓慢,但不知道瓶颈所在
- 发布前P99延迟超出服务水平协议(SLA)
- 内存使用随时间增长(疑似泄漏)
- 添加依赖后包体积增大
- 准备应对流量峰值(上线前进行负载测试)
- 数据库查询耗时超过100ms
Quick Start
快速开始
bash
undefinedbash
undefinedAnalyze a project for performance risk indicators
分析项目中的性能风险指标
python3 scripts/performance_profiler.py /path/to/project
python3 scripts/performance_profiler.py /path/to/project
JSON output for CI integration
生成JSON输出以集成到CI流程
python3 scripts/performance_profiler.py /path/to/project --json
python3 scripts/performance_profiler.py /path/to/project --json
Custom large-file threshold
自定义大文件阈值
python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256
---python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256
---Golden Rule: Measure First
黄金法则:先测量再优化
bash
undefinedbash
undefinedEstablish baseline BEFORE any optimization
在任何优化前先建立基准
Record: P50, P95, P99 latency | RPS | error rate | memory usage
记录:P50、P95、P99延迟 | 每秒请求数(RPS) | 错误率 | 内存使用
Wrong: "I think the N+1 query is slow, let me fix it"
错误做法:"我觉得N+1查询很慢,我来修复它"
Right: Profile → confirm bottleneck → fix → measure again → verify improvement
正确做法:分析 → 确认瓶颈 → 修复 → 再次测量 → 验证优化效果
---
---Node.js Profiling
Node.js分析
→ See references/profiling-recipes.md for details
→ 详情请查看references/profiling-recipes.md
Before/After Measurement Template
优化前后测量模板
markdown
undefinedmarkdown
undefinedPerformance Optimization: [What You Fixed]
性能优化:[修复内容]
Date: 2026-03-01
Engineer: @username
Ticket: PROJ-123
Engineer: @username
Ticket: PROJ-123
日期: 2026-03-01
工程师: @username
工单: PROJ-123
工程师: @username
工单: PROJ-123
Problem
问题
[1-2 sentences: what was slow, how was it observed]
[1-2句话:哪里慢,如何发现的]
Root Cause
根本原因
[What the profiler revealed]
[分析器揭示的问题]
Baseline (Before)
基准(优化前)
| Metric | Value |
|---|---|
| P50 latency | 480ms |
| P95 latency | 1,240ms |
| P99 latency | 3,100ms |
| RPS @ 50 VUs | 42 |
| Error rate | 0.8% |
| DB queries/req | 23 (N+1) |
Profiler evidence: [link to flamegraph or screenshot]
| 指标 | 数值 |
|---|---|
| P50延迟 | 480ms |
| P95延迟 | 1,240ms |
| P99延迟 | 3,100ms |
| 50虚拟用户下的RPS | 42 |
| 错误率 | 0.8% |
| 每个请求的数据库查询数 | 23(N+1) |
分析器证据:[火焰图或截图链接]
Fix Applied
修复措施
[What changed — code diff or description]
[变更内容 — 代码差异或描述]
After
优化后
| Metric | Before | After | Delta |
|---|---|---|---|
| P50 latency | 480ms | 48ms | -90% |
| P95 latency | 1,240ms | 120ms | -90% |
| P99 latency | 3,100ms | 280ms | -91% |
| RPS @ 50 VUs | 42 | 380 | +804% |
| Error rate | 0.8% | 0% | -100% |
| DB queries/req | 23 | 1 | -96% |
| 指标 | 优化前 | 优化后 | 变化量 |
|---|---|---|---|
| P50延迟 | 480ms | 48ms | -90% |
| P95延迟 | 1,240ms | 120ms | -90% |
| P99延迟 | 3,100ms | 280ms | -91% |
| 50虚拟用户下的RPS | 42 | 380 | +804% |
| 错误率 | 0.8% | 0% | -100% |
| 每个请求的数据库查询数 | 23 | 1 | -96% |
Verification
验证
Load test run: [link to k6 output]
---负载测试运行结果:[k6输出链接]
---Optimization Checklist
优化检查清单
Quick wins (check these first)
快速优化项(优先检查)
Database
□ Missing indexes on WHERE/ORDER BY columns
□ N+1 queries (check query count per request)
□ Loading all columns when only 2-3 needed (SELECT *)
□ No LIMIT on unbounded queries
□ Missing connection pool (creating new connection per request)
Node.js
□ Sync I/O (fs.readFileSync) in hot path
□ JSON.parse/stringify of large objects in hot loop
□ Missing caching for expensive computations
□ No compression (gzip/brotli) on responses
□ Dependencies loaded in request handler (move to module level)
Bundle
□ Moment.js → dayjs/date-fns
□ Lodash (full) → lodash/function imports
□ Static imports of heavy components → dynamic imports
□ Images not optimized / not using next/image
□ No code splitting on routes
API
□ No pagination on list endpoints
□ No response caching (Cache-Control headers)
□ Serial awaits that could be parallel (Promise.all)
□ Fetching related data in a loop instead of JOIN数据库
□ WHERE/ORDER BY列缺少索引
□ N+1查询(检查每个请求的查询次数)
□ 查询所有列而只需要2-3列(SELECT *)
□ 无限制的未分页查询
□ 缺少连接池(每个请求创建新连接)
Node.js
□ 热点路径中使用同步I/O(fs.readFileSync)
□ 热点循环中对大对象进行JSON.parse/stringify
□ 昂贵计算缺少缓存
□ 响应未启用压缩(gzip/brotli)
□ 在请求处理程序中加载依赖(移到模块级别)
包体积
□ Moment.js → dayjs/date-fns
□ 完整Lodash → lodash/函数按需导入
□ 重组件的静态导入 → 动态导入
□ 图片未优化 / 未使用next/image
□ 路由未做代码分割
API
□ 列表接口未分页
□ 响应未设置缓存(Cache-Control头)
□ 可并行的串行await(应使用Promise.all)
□ 循环中获取关联数据而非使用JOINCommon Pitfalls
常见误区
- Optimizing without measuring — you'll optimize the wrong thing
- Testing in development — profile against production-like data volumes
- Ignoring P99 — P50 can look fine while P99 is catastrophic
- Premature optimization — fix correctness first, then performance
- Not re-measuring — always verify the fix actually improved things
- Load testing production — use staging with production-size data
- 未测量就优化 — 你可能会优化错误的点
- 在开发环境测试 — 应针对类生产级别的数据量进行分析
- 忽略P99延迟 — P50可能表现良好,但P99可能出现严重问题
- 过早优化 — 先确保正确性,再优化性能
- 优化后未重新测量 — 始终验证修复是否真的提升了性能
- 在生产环境进行负载测试 — 使用包含生产规模数据的预发布环境
Best Practices
最佳实践
- Baseline first, always — record metrics before touching anything
- One change at a time — isolate the variable to confirm causation
- Profile with realistic data — 10 rows in dev, millions in prod — different bottlenecks
- Set performance budgets — in CI thresholds with k6
p(95) < 200ms - Monitor continuously — add Datadog/Prometheus metrics for key paths
- Cache invalidation strategy — cache aggressively, invalidate precisely
- Document the win — before/after in the PR description motivates the team
- 始终先建立基准 — 在修改任何内容前记录指标
- 一次只做一个变更 — 隔离变量以确认因果关系
- 使用真实数据进行分析 — 开发环境的10行数据与生产环境的数百万行数据会导致不同的瓶颈
- 设置性能预算 — 在CI中通过k6设置阈值,如
p(95) < 200ms - 持续监控 — 为关键路径添加Datadog/Prometheus指标
- 缓存失效策略 — 积极缓存,精准失效
- 记录优化成果 — 在PR描述中加入优化前后的数据,激励团队