performance-profiler

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Performance Profiler

性能分析器

Tier: POWERFUL
Category: Engineering
Domain: Performance Engineering

层级: POWERFUL
分类: 工程
领域: 性能工程

Overview

概述

Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.
针对Node.js、Python和Go应用的系统化性能分析。可识别CPU、内存和I/O瓶颈;生成火焰图(flamegraphs);分析包体积;优化数据库查询;检测内存泄漏;并通过k6和Artillery运行负载测试。始终在优化前后进行性能测量。

Core Capabilities

核心能力

  • CPU profiling — flamegraphs for Node.js, py-spy for Python, pprof for Go
  • Memory profiling — heap snapshots, leak detection, GC pressure
  • Bundle analysis — webpack-bundle-analyzer, Next.js bundle analyzer
  • Database optimization — EXPLAIN ANALYZE, slow query log, N+1 detection
  • Load testing — k6 scripts, Artillery scenarios, ramp-up patterns
  • Before/after measurement — establish baseline, profile, optimize, verify

  • CPU分析 — Node.js火焰图、Python的py-spy、Go的pprof
  • 内存分析 — 堆快照、泄漏检测、GC压力
  • 包分析 — webpack-bundle-analyzer、Next.js包分析器
  • 数据库优化 — EXPLAIN ANALYZE、慢查询日志、N+1查询检测
  • 负载测试 — k6脚本、Artillery场景、流量递增模式
  • 优化前后测量 — 建立基准、分析、优化、验证

When to Use

使用场景

  • App is slow and you don't know where the bottleneck is
  • P99 latency exceeds SLA before a release
  • Memory usage grows over time (suspected leak)
  • Bundle size increased after adding dependencies
  • Preparing for a traffic spike (load test before launch)
  • Database queries taking >100ms

  • 应用运行缓慢,但不知道瓶颈所在
  • 发布前P99延迟超出服务水平协议(SLA)
  • 内存使用随时间增长(疑似泄漏)
  • 添加依赖后包体积增大
  • 准备应对流量峰值(上线前进行负载测试)
  • 数据库查询耗时超过100ms

Quick Start

快速开始

bash
undefined
bash
undefined

Analyze a project for performance risk indicators

分析项目中的性能风险指标

python3 scripts/performance_profiler.py /path/to/project
python3 scripts/performance_profiler.py /path/to/project

JSON output for CI integration

生成JSON输出以集成到CI流程

python3 scripts/performance_profiler.py /path/to/project --json
python3 scripts/performance_profiler.py /path/to/project --json

Custom large-file threshold

自定义大文件阈值

python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256

---
python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256

---

Golden Rule: Measure First

黄金法则:先测量再优化

bash
undefined
bash
undefined

Establish baseline BEFORE any optimization

在任何优化前先建立基准

Record: P50, P95, P99 latency | RPS | error rate | memory usage

记录:P50、P95、P99延迟 | 每秒请求数(RPS) | 错误率 | 内存使用

Wrong: "I think the N+1 query is slow, let me fix it"

错误做法:"我觉得N+1查询很慢,我来修复它"

Right: Profile → confirm bottleneck → fix → measure again → verify improvement

正确做法:分析 → 确认瓶颈 → 修复 → 再次测量 → 验证优化效果


---

---

Node.js Profiling

Node.js分析

→ See references/profiling-recipes.md for details
→ 详情请查看references/profiling-recipes.md

Before/After Measurement Template

优化前后测量模板

markdown
undefined
markdown
undefined

Performance Optimization: [What You Fixed]

性能优化:[修复内容]

Date: 2026-03-01
Engineer: @username
Ticket: PROJ-123
日期: 2026-03-01
工程师: @username
工单: PROJ-123

Problem

问题

[1-2 sentences: what was slow, how was it observed]
[1-2句话:哪里慢,如何发现的]

Root Cause

根本原因

[What the profiler revealed]
[分析器揭示的问题]

Baseline (Before)

基准(优化前)

MetricValue
P50 latency480ms
P95 latency1,240ms
P99 latency3,100ms
RPS @ 50 VUs42
Error rate0.8%
DB queries/req23 (N+1)
Profiler evidence: [link to flamegraph or screenshot]
指标数值
P50延迟480ms
P95延迟1,240ms
P99延迟3,100ms
50虚拟用户下的RPS42
错误率0.8%
每个请求的数据库查询数23(N+1)
分析器证据:[火焰图或截图链接]

Fix Applied

修复措施

[What changed — code diff or description]
[变更内容 — 代码差异或描述]

After

优化后

MetricBeforeAfterDelta
P50 latency480ms48ms-90%
P95 latency1,240ms120ms-90%
P99 latency3,100ms280ms-91%
RPS @ 50 VUs42380+804%
Error rate0.8%0%-100%
DB queries/req231-96%
指标优化前优化后变化量
P50延迟480ms48ms-90%
P95延迟1,240ms120ms-90%
P99延迟3,100ms280ms-91%
50虚拟用户下的RPS42380+804%
错误率0.8%0%-100%
每个请求的数据库查询数231-96%

Verification

验证

Load test run: [link to k6 output]

---
负载测试运行结果:[k6输出链接]

---

Optimization Checklist

优化检查清单

Quick wins (check these first)

快速优化项(优先检查)

Database
□ Missing indexes on WHERE/ORDER BY columns
□ N+1 queries (check query count per request)
□ Loading all columns when only 2-3 needed (SELECT *)
□ No LIMIT on unbounded queries
□ Missing connection pool (creating new connection per request)

Node.js
□ Sync I/O (fs.readFileSync) in hot path
□ JSON.parse/stringify of large objects in hot loop
□ Missing caching for expensive computations
□ No compression (gzip/brotli) on responses
□ Dependencies loaded in request handler (move to module level)

Bundle
□ Moment.js → dayjs/date-fns
□ Lodash (full) → lodash/function imports
□ Static imports of heavy components → dynamic imports
□ Images not optimized / not using next/image
□ No code splitting on routes

API
□ No pagination on list endpoints
□ No response caching (Cache-Control headers)
□ Serial awaits that could be parallel (Promise.all)
□ Fetching related data in a loop instead of JOIN

数据库
□ WHERE/ORDER BY列缺少索引
□ N+1查询(检查每个请求的查询次数)
□ 查询所有列而只需要2-3列(SELECT *)
□ 无限制的未分页查询
□ 缺少连接池(每个请求创建新连接)

Node.js
□ 热点路径中使用同步I/O(fs.readFileSync)
□ 热点循环中对大对象进行JSON.parse/stringify
□ 昂贵计算缺少缓存
□ 响应未启用压缩(gzip/brotli)
□ 在请求处理程序中加载依赖(移到模块级别)

包体积
□ Moment.js → dayjs/date-fns
□ 完整Lodash → lodash/函数按需导入
□ 重组件的静态导入 → 动态导入
□ 图片未优化 / 未使用next/image
□ 路由未做代码分割

API
□ 列表接口未分页
□ 响应未设置缓存(Cache-Control头)
□ 可并行的串行await(应使用Promise.all)
□ 循环中获取关联数据而非使用JOIN

Common Pitfalls

常见误区

  • Optimizing without measuring — you'll optimize the wrong thing
  • Testing in development — profile against production-like data volumes
  • Ignoring P99 — P50 can look fine while P99 is catastrophic
  • Premature optimization — fix correctness first, then performance
  • Not re-measuring — always verify the fix actually improved things
  • Load testing production — use staging with production-size data

  • 未测量就优化 — 你可能会优化错误的点
  • 在开发环境测试 — 应针对类生产级别的数据量进行分析
  • 忽略P99延迟 — P50可能表现良好,但P99可能出现严重问题
  • 过早优化 — 先确保正确性,再优化性能
  • 优化后未重新测量 — 始终验证修复是否真的提升了性能
  • 在生产环境进行负载测试 — 使用包含生产规模数据的预发布环境

Best Practices

最佳实践

  1. Baseline first, always — record metrics before touching anything
  2. One change at a time — isolate the variable to confirm causation
  3. Profile with realistic data — 10 rows in dev, millions in prod — different bottlenecks
  4. Set performance budgets
    p(95) < 200ms
    in CI thresholds with k6
  5. Monitor continuously — add Datadog/Prometheus metrics for key paths
  6. Cache invalidation strategy — cache aggressively, invalidate precisely
  7. Document the win — before/after in the PR description motivates the team
  1. 始终先建立基准 — 在修改任何内容前记录指标
  2. 一次只做一个变更 — 隔离变量以确认因果关系
  3. 使用真实数据进行分析 — 开发环境的10行数据与生产环境的数百万行数据会导致不同的瓶颈
  4. 设置性能预算 — 在CI中通过k6设置阈值,如
    p(95) < 200ms
  5. 持续监控 — 为关键路径添加Datadog/Prometheus指标
  6. 缓存失效策略 — 积极缓存,精准失效
  7. 记录优化成果 — 在PR描述中加入优化前后的数据,激励团队