performance-profiler

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Performance Profiler

性能分析器

Tier: POWERFUL
Category: Engineering
Domain: Performance Engineering

层级： POWERFUL
分类： 工程
领域： 性能工程

Overview

概述

Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.

针对Node.js、Python和Go应用的系统化性能分析。可识别CPU、内存和I/O瓶颈；生成火焰图（flamegraphs）；分析包体积；优化数据库查询；检测内存泄漏；并通过k6和Artillery运行负载测试。始终在优化前后进行性能测量。

Core Capabilities

核心能力

CPU profiling — flamegraphs for Node.js, py-spy for Python, pprof for Go
Memory profiling — heap snapshots, leak detection, GC pressure
Bundle analysis — webpack-bundle-analyzer, Next.js bundle analyzer
Database optimization — EXPLAIN ANALYZE, slow query log, N+1 detection
Load testing — k6 scripts, Artillery scenarios, ramp-up patterns
Before/after measurement — establish baseline, profile, optimize, verify

CPU分析 — Node.js火焰图、Python的py-spy、Go的pprof
内存分析 — 堆快照、泄漏检测、GC压力
包分析 — webpack-bundle-analyzer、Next.js包分析器
数据库优化 — EXPLAIN ANALYZE、慢查询日志、N+1查询检测
负载测试 — k6脚本、Artillery场景、流量递增模式
优化前后测量 — 建立基准、分析、优化、验证

When to Use

使用场景

App is slow and you don't know where the bottleneck is
P99 latency exceeds SLA before a release
Memory usage grows over time (suspected leak)
Bundle size increased after adding dependencies
Preparing for a traffic spike (load test before launch)
Database queries taking >100ms

应用运行缓慢，但不知道瓶颈所在
发布前P99延迟超出服务水平协议（SLA）
内存使用随时间增长（疑似泄漏）
添加依赖后包体积增大
准备应对流量峰值（上线前进行负载测试）
数据库查询耗时超过100ms

Quick Start

快速开始

bash

undefined

bash

undefined

Analyze a project for performance risk indicators

分析项目中的性能风险指标

python3 scripts/performance_profiler.py /path/to/project

JSON output for CI integration

生成JSON输出以集成到CI流程

python3 scripts/performance_profiler.py /path/to/project --json

Custom large-file threshold

自定义大文件阈值

python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256

---

python3 scripts/performance_profiler.py /path/to/project --large-file-threshold-kb 256

---

Golden Rule: Measure First

黄金法则：先测量再优化

bash

undefined

bash

undefined

Establish baseline BEFORE any optimization

在任何优化前先建立基准

Record: P50, P95, P99 latency | RPS | error rate | memory usage

记录：P50、P95、P99延迟 | 每秒请求数（RPS） | 错误率 | 内存使用

Wrong: "I think the N+1 query is slow, let me fix it"

错误做法："我觉得N+1查询很慢，我来修复它"

Right: Profile → confirm bottleneck → fix → measure again → verify improvement

正确做法：分析 → 确认瓶颈 → 修复 → 再次测量 → 验证优化效果

---

---

Node.js Profiling

Node.js分析

→ See references/profiling-recipes.md for details

→ 详情请查看references/profiling-recipes.md

Before/After Measurement Template

优化前后测量模板

markdown

undefined

markdown

undefined

Performance Optimization: [What You Fixed]

性能优化：[修复内容]

Date: 2026-03-01
Engineer: @username
Ticket: PROJ-123

日期： 2026-03-01
工程师： @username
工单： PROJ-123

Problem

问题

[1-2 sentences: what was slow, how was it observed]

[1-2句话：哪里慢，如何发现的]

Root Cause

根本原因

[What the profiler revealed]

[分析器揭示的问题]

Baseline (Before)

基准（优化前）

Metric	Value
P50 latency	480ms
P95 latency	1,240ms
P99 latency	3,100ms
RPS @ 50 VUs	42
Error rate	0.8%
DB queries/req	23 (N+1)

Profiler evidence: [link to flamegraph or screenshot]

指标	数值
P50延迟	480ms
P95延迟	1,240ms
P99延迟	3,100ms
50虚拟用户下的RPS	42
错误率	0.8%
每个请求的数据库查询数	23（N+1）

分析器证据：[火焰图或截图链接]

Fix Applied

修复措施

[What changed — code diff or description]

[变更内容 — 代码差异或描述]

After

优化后

Metric	Before	After	Delta
P50 latency	480ms	48ms	-90%
P95 latency	1,240ms	120ms	-90%
P99 latency	3,100ms	280ms	-91%
RPS @ 50 VUs	42	380	+804%
Error rate	0.8%	0%	-100%
DB queries/req	23	1	-96%

指标	优化前	优化后	变化量
P50延迟	480ms	48ms	-90%
P95延迟	1,240ms	120ms	-90%
P99延迟	3,100ms	280ms	-91%
50虚拟用户下的RPS	42	380	+804%
错误率	0.8%	0%	-100%
每个请求的数据库查询数	23	1	-96%

Verification

验证

Load test run: [link to k6 output]

---

负载测试运行结果：[k6输出链接]

---

Optimization Checklist

优化检查清单

Quick wins (check these first)

快速优化项（优先检查）

Database
□ Missing indexes on WHERE/ORDER BY columns
□ N+1 queries (check query count per request)
□ Loading all columns when only 2-3 needed (SELECT *)
□ No LIMIT on unbounded queries
□ Missing connection pool (creating new connection per request)

Node.js
□ Sync I/O (fs.readFileSync) in hot path
□ JSON.parse/stringify of large objects in hot loop
□ Missing caching for expensive computations
□ No compression (gzip/brotli) on responses
□ Dependencies loaded in request handler (move to module level)

Bundle
□ Moment.js → dayjs/date-fns
□ Lodash (full) → lodash/function imports
□ Static imports of heavy components → dynamic imports
□ Images not optimized / not using next/image
□ No code splitting on routes

API
□ No pagination on list endpoints
□ No response caching (Cache-Control headers)
□ Serial awaits that could be parallel (Promise.all)
□ Fetching related data in a loop instead of JOIN

数据库
□ WHERE/ORDER BY列缺少索引
□ N+1查询（检查每个请求的查询次数）
□ 查询所有列而只需要2-3列（SELECT *）
□ 无限制的未分页查询
□ 缺少连接池（每个请求创建新连接）

Node.js
□ 热点路径中使用同步I/O（fs.readFileSync）
□ 热点循环中对大对象进行JSON.parse/stringify
□ 昂贵计算缺少缓存
□ 响应未启用压缩（gzip/brotli）
□ 在请求处理程序中加载依赖（移到模块级别）

包体积
□ Moment.js → dayjs/date-fns
□ 完整Lodash → lodash/函数按需导入
□ 重组件的静态导入 → 动态导入
□ 图片未优化 / 未使用next/image
□ 路由未做代码分割

API
□ 列表接口未分页
□ 响应未设置缓存（Cache-Control头）
□ 可并行的串行await（应使用Promise.all）
□ 循环中获取关联数据而非使用JOIN

Common Pitfalls

常见误区

Optimizing without measuring — you'll optimize the wrong thing
Testing in development — profile against production-like data volumes
Ignoring P99 — P50 can look fine while P99 is catastrophic
Premature optimization — fix correctness first, then performance
Not re-measuring — always verify the fix actually improved things
Load testing production — use staging with production-size data

未测量就优化 — 你可能会优化错误的点
在开发环境测试 — 应针对类生产级别的数据量进行分析
忽略P99延迟 — P50可能表现良好，但P99可能出现严重问题
过早优化 — 先确保正确性，再优化性能
优化后未重新测量 — 始终验证修复是否真的提升了性能
在生产环境进行负载测试 — 使用包含生产规模数据的预发布环境

Best Practices

最佳实践

Baseline first, always — record metrics before touching anything
One change at a time — isolate the variable to confirm causation
Profile with realistic data — 10 rows in dev, millions in prod — different bottlenecks
Set performance budgets —
```
p(95) < 200ms
```
in CI thresholds with k6
Monitor continuously — add Datadog/Prometheus metrics for key paths
Cache invalidation strategy — cache aggressively, invalidate precisely
Document the win — before/after in the PR description motivates the team

始终先建立基准 — 在修改任何内容前记录指标
一次只做一个变更 — 隔离变量以确认因果关系
使用真实数据进行分析 — 开发环境的10行数据与生产环境的数百万行数据会导致不同的瓶颈
设置性能预算 — 在CI中通过k6设置阈值，如
```
p(95) < 200ms
```
持续监控 — 为关键路径添加Datadog/Prometheus指标
缓存失效策略 — 积极缓存，精准失效
记录优化成果 — 在PR描述中加入优化前后的数据，激励团队