cpu-profiling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CPU Profiling

CPU性能分析

Overview

概述

CPU profiling identifies which functions consume most CPU time, enabling targeted optimization of expensive code paths.

CPU性能分析可识别哪些函数占用最多CPU时间，从而针对性地优化耗时的代码路径。

When to Use

适用场景

High CPU usage
Slow execution
Performance regression
Before optimization
Production monitoring

CPU使用率过高
执行速度缓慢
性能退化
优化前评估
生产环境监控

Instructions

操作指南

1. Profiling Tools

1. 性能分析工具

yaml

Browser Profiling:

Chrome DevTools:
  Steps:
    1. DevTools → Performance
    2. Click record
    3. Perform action
    4. Stop recording
    5. Analyze flame chart
  Metrics:
    - Function call duration
    - Call frequency
    - Total time vs self time

Firefox Profiler:
  - Built-in performance profiler
  - Flame graphs
  - Timeline view
  - Export and share

React Profiler:
  - DevTools → Profiler
  - Component render times
  - Phase: render vs commit
  - Why component re-rendered

---

Node.js Profiling:

node --prof app.js
node --prof-process isolate-*.log > profile.txt

Clinic.js:
  clinic doctor -- node app.js
  clinic flame -- node app.js
  Shows: functions, memory, delays

V8 Inspector:
  node --inspect app.js
  Open chrome://inspect
  Profiler tab
  Take CPU profile

yaml

Browser Profiling:

Chrome DevTools:
  Steps:
    1. DevTools → Performance
    2. Click record
    3. Perform action
    4. Stop recording
    5. Analyze flame chart
  Metrics:
    - Function call duration
    - Call frequency
    - Total time vs self time

Firefox Profiler:
  - Built-in performance profiler
  - Flame graphs
  - Timeline view
  - Export and share

React Profiler:
  - DevTools → Profiler
  - Component render times
  - Phase: render vs commit
  - Why component re-rendered

---

Node.js Profiling:

node --prof app.js
node --prof-process isolate-*.log > profile.txt

Clinic.js:
  clinic doctor -- node app.js
  clinic flame -- node app.js
  Shows: functions, memory, delays

V8 Inspector:
  node --inspect app.js
  Open chrome://inspect
  Profiler tab
  Take CPU profile

2. Analysis & Interpretation

2. 分析与解读

javascript

// Understanding profiles

Flame Graph Reading:
- Wider = more time spent
- Taller = deeper call stack
- Hot path = wide tall bars
- Idle = gaps

Self Time vs Total Time:
- Self: time in function itself
- Total: self + children
- Example:
  main() calls work() for 1s
  work() itself = 0.5s (self)
  work() itself + children = 1s (total)

Hot Spots Identification:
- Find widest bars (most time)
- Check if avoidable
- Check if optimizable
- Profile before/after changes

Example (V8 Analysis):
Function: dataProcessing
  Self time: 500ms (50%)
  Total time: 1000ms
  Calls: 1000 times
  Time per call: 0.5ms
  Optimization: Reduce call frequency

javascript

// Understanding profiles

Flame Graph Reading:
- Wider = more time spent
- Taller = deeper call stack
- Hot path = wide tall bars
- Idle = gaps

Self Time vs Total Time:
- Self: time in function itself
- Total: self + children
- Example:
  main() calls work() for 1s
  work() itself = 0.5s (self)
  work() itself + children = 1s (total)

Hot Spots Identification:
- Find widest bars (most time)
- Check if avoidable
- Check if optimizable
- Profile before/after changes

Example (V8 Analysis):
Function: dataProcessing
  Self time: 500ms (50%)
  Total time: 1000ms
  Calls: 1000 times
  Time per call: 0.5ms
  Optimization: Reduce call frequency

3. Optimization Process

3. 优化流程

yaml

Steps:

1. Establish Baseline
  - Profile current behavior
  - Note hottest functions
  - Record total time
  - Check system resources

2. Identify Bottlenecks
  - Find top 5 time consumers
  - Analyze call frequency
  - Understand what they do
  - Check if necessary

3. Create Hypothesis
  - Why is function slow?
  - Can algorithm improve?
  - Can we cache results?
  - Can we parallelize?

4. Implement Changes
  - Single change at a time
  - Measure impact
  - Profile after change
  - Compare flame graphs

5. Verify Improvement
  - Baseline: 1s
  - After optimization: 500ms
  - Confirmed 50% improvement

---

Common Optimizations:

Algorithm Improvement:
  Before: O(n²) nested loop = 100ms for 1000 items
  After: O(n log n) with sort+search = 10ms
  Impact: 10x faster

Caching:
  Before: Recalculate each call
  After: Cache result, return instantly
  Impact: 1000x faster for repeated calls

Memoization:
  Before: fib(40) recalculates each branch
  After: Cache computed values
  Impact: Exponential to linear

Lazy Evaluation:
  Before: Calculate all values upfront
  After: Calculate only needed values
  Impact: 90%+ reduction for partial results

Parallelization:
  Before: Sequential processing, 1000ms
  After: 4 cores, 250ms
  Impact: 4x faster (8 cores = 8x)

yaml

Steps:

1. Establish Baseline
  - Profile current behavior
  - Note hottest functions
  - Record total time
  - Check system resources

2. Identify Bottlenecks
  - Find top 5 time consumers
  - Analyze call frequency
  - Understand what they do
  - Check if necessary

3. Create Hypothesis
  - Why is function slow?
  - Can algorithm improve?
  - Can we cache results?
  - Can we parallelize?

4. Implement Changes
  - Single change at a time
  - Measure impact
  - Profile after change
  - Compare flame graphs

5. Verify Improvement
  - Baseline: 1s
  - After optimization: 500ms
  - Confirmed 50% improvement

---

Common Optimizations:

Algorithm Improvement:
  Before: O(n²) nested loop = 100ms for 1000 items
  After: O(n log n) with sort+search = 10ms
  Impact: 10x faster

Caching:
  Before: Recalculate each call
  After: Cache result, return instantly
  Impact: 1000x faster for repeated calls

Memoization:
  Before: fib(40) recalculates each branch
  After: Cache computed values
  Impact: Exponential to linear

Lazy Evaluation:
  Before: Calculate all values upfront
  After: Calculate only needed values
  Impact: 90%+ reduction for partial results

Parallelization:
  Before: Sequential processing, 1000ms
  After: 4 cores, 250ms
  Impact: 4x faster (8 cores = 8x)

4. Monitoring & Best Practices

4. 监控与最佳实践

yaml

Monitoring:

Production Profiling:
  - Lightweight sampling profiler
  - 1-5% overhead typical
  - Tools: New Relic, DataDog, Clinic
  - Alert on CPU spikes

Key Metrics:
  - CPU usage % per function
  - Call frequency
  - Time per call
  - GC pause times
  - P95/P99 latency

---

Best Practices:

Before Optimizing:
  [ ] Profile to find actual bottleneck
  [ ] Don't guess (verify with data)
  [ ] Establish baseline
  [ ] Measure improvement

During Optimization:
  [ ] Change one thing at a time
  [ ] Profile after each change
  [ ] Verify improvement
  [ ] Don't prematurely optimize

Premature Optimization:
  - Profile first
  - Hot path only (80/20 rule)
  - Measure impact
  - Consider readability

---

Tools Summary:

Framework: Chrome DevTools, Firefox, Node Profiler
Analysis: Flame graphs, Call trees, Timeline
Monitoring: APM tools, Clinic.js
Comparison: Before/after profiles

---

Red Flags:

- Unexpected high CPU
- GC pauses >100ms
- Function called 1M times per request
- Deep call stacks
- Synchronous I/O in loops
- Repeated calculations
- Memory allocation in hot loop

yaml

Monitoring:

Production Profiling:
  - Lightweight sampling profiler
  - 1-5% overhead typical
  - Tools: New Relic, DataDog, Clinic
  - Alert on CPU spikes

Key Metrics:
  - CPU usage % per function
  - Call frequency
  - Time per call
  - GC pause times
  - P95/P99 latency

---

Best Practices:

Before Optimizing:
  [ ] Profile to find actual bottleneck
  [ ] Don't guess (verify with data)
  [ ] Establish baseline
  [ ] Measure improvement

During Optimization:
  [ ] Change one thing at a time
  [ ] Profile after each change
  [ ] Verify improvement
  [ ] Don't prematurely optimize

Premature Optimization:
  - Profile first
  - Hot path only (80/20 rule)
  - Measure impact
  - Consider readability

---

Tools Summary:

Framework: Chrome DevTools, Firefox, Node Profiler
Analysis: Flame graphs, Call trees, Timeline
Monitoring: APM tools, Clinic.js
Comparison: Before/after profiles

---

Red Flags:

- Unexpected high CPU
- GC pauses >100ms
- Function called 1M times per request
- Deep call stacks
- Synchronous I/O in loops
- Repeated calculations
- Memory allocation in hot loop

Key Points

核心要点

Profile before optimizing (measure, not guess)
Look for wide/tall bars in flame graphs
Distinguish self time vs total time
Optimize top bottlenecks first
Verify improvements with measurement
Consider caching and memoization
Use production profiling for real issues
Algorithm improvements beat micro-optimizations
Measure before and after
Focus on hot paths (80/20 rule)

优化前先进行性能分析（用数据说话，而非猜测）
关注火焰图中宽且高的条形
区分自身时间与总时间
优先优化首要瓶颈
通过测量验证优化效果
考虑使用缓存与记忆化技术
利用生产环境分析排查真实问题
算法优化优于微优化
优化前后均需测量
聚焦热点路径（遵循80/20法则）