java-profiling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseJava/JVM Performance Profiling
Java/JVM 性能剖析
When NOT to Use This Skill
不适用本技能的场景
- Node.js/JavaScript profiling - Use the skill for V8 profiler and heap analysis
nodejs-profiling - Python profiling - Use the skill for cProfile and tracemalloc
python-profiling - Application-level optimization - This is for JVM-level profiling, not algorithm optimization
- Database query tuning - Use database-specific profiling tools
- Frontend performance - Use browser DevTools for client-side profiling
Deep Knowledge: Usewith technology:mcp__documentation__fetch_docsfor comprehensive JFR configuration, GC tuning, and JVM diagnostics.java
- Node.js/JavaScript 性能剖析:请使用技能进行V8剖析器和堆分析
nodejs-profiling - Python 性能剖析:请使用技能进行cProfile和tracemalloc分析
python-profiling - 应用层级优化:本技能仅适用于JVM层级剖析,不涉及算法优化
- 数据库查询调优:请使用数据库专属的剖析工具
- 前端性能优化:请使用浏览器DevTools进行客户端性能剖析
深度知识:调用并指定技术为mcp__documentation__fetch_docs,可获取完整的JFR配置、GC调优和JVM诊断相关文档。java
Java Flight Recorder (JFR)
Java Flight Recorder (JFR)
Starting JFR
启动JFR
bash
undefinedbash
undefinedStart recording with application
Start recording with application
java -XX:+FlightRecorder
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
-jar app.jar
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
-jar app.jar
java -XX:+FlightRecorder
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
-jar app.jar
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
-jar app.jar
Start recording on running JVM
Start recording on running JVM
jcmd <pid> JFR.start duration=60s filename=recording.jfr
jcmd <pid> JFR.start duration=60s filename=recording.jfr
Continuous recording (always-on)
Continuous recording (always-on)
java -XX:+FlightRecorder
-XX:FlightRecorderOptions=stackdepth=256
-XX:StartFlightRecording=disk=true,maxsize=500m,maxage=1d
-jar app.jar
-XX:FlightRecorderOptions=stackdepth=256
-XX:StartFlightRecording=disk=true,maxsize=500m,maxage=1d
-jar app.jar
java -XX:+FlightRecorder
-XX:FlightRecorderOptions=stackdepth=256
-XX:StartFlightRecording=disk=true,maxsize=500m,maxage=1d
-jar app.jar
-XX:FlightRecorderOptions=stackdepth=256
-XX:StartFlightRecording=disk=true,maxsize=500m,maxage=1d
-jar app.jar
Dump current recording
Dump current recording
jcmd <pid> JFR.dump filename=dump.jfr
undefinedjcmd <pid> JFR.dump filename=dump.jfr
undefinedJFR Configuration
JFR配置
undefinedundefinedcustom.jfc
custom.jfc
<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0">
<event name="jdk.CPULoad">
<setting name="enabled">true</setting>
<setting name="period">1 s</setting>
</event>
<event name="jdk.GCHeapSummary">
<setting name="enabled">true</setting>
</event>
<event name="jdk.ObjectAllocationInNewTLAB">
<setting name="enabled">true</setting>
<setting name="stackTrace">true</setting>
</event>
</configuration>
```
<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0">
<event name="jdk.CPULoad">
<setting name="enabled">true</setting>
<setting name="period">1 s</setting>
</event>
<event name="jdk.GCHeapSummary">
<setting name="enabled">true</setting>
</event>
<event name="jdk.ObjectAllocationInNewTLAB">
<setting name="enabled">true</setting>
<setting name="stackTrace">true</setting>
</event>
</configuration>
```
Analyzing JFR Files
分析JFR文件
bash
undefinedbash
undefinedPrint JFR summary
Print JFR summary
jfr summary recording.jfr
jfr summary recording.jfr
Print specific events
Print specific events
jfr print --events jdk.CPULoad recording.jfr
jfr print --events jdk.ExecutionSample --json recording.jfr
jfr print --events jdk.CPULoad recording.jfr
jfr print --events jdk.ExecutionSample --json recording.jfr
Export to JSON
Export to JSON
jfr print --json recording.jfr > recording.json
undefinedjfr print --json recording.jfr > recording.json
undefinedjcmd Diagnostics
jcmd诊断
Process Information
进程信息
bash
undefinedbash
undefinedList all Java processes
List all Java processes
jcmd
jcmd
VM info
VM info
jcmd <pid> VM.version
jcmd <pid> VM.flags
jcmd <pid> VM.system_properties
jcmd <pid> VM.command_line
jcmd <pid> VM.version
jcmd <pid> VM.flags
jcmd <pid> VM.system_properties
jcmd <pid> VM.command_line
Thread dump
Thread dump
jcmd <pid> Thread.print
jcmd <pid> Thread.print
Heap info
Heap info
jcmd <pid> GC.heap_info
jcmd <pid> GC.heap_info
Class histogram
Class histogram
jcmd <pid> GC.class_histogram
undefinedjcmd <pid> GC.class_histogram
undefinedMemory Analysis
内存分析
bash
undefinedbash
undefinedNative memory tracking (requires -XX:NativeMemoryTracking=summary)
Native memory tracking (requires -XX:NativeMemoryTracking=summary)
jcmd <pid> VM.native_memory summary
jcmd <pid> VM.native_memory summary
Heap dump
Heap dump
jcmd <pid> GC.heap_dump /path/to/dump.hprof
jcmd <pid> GC.heap_dump /path/to/dump.hprof
Force GC
Force GC
jcmd <pid> GC.run
undefinedjcmd <pid> GC.run
undefinedGC Tuning
GC调优
GC Selection
GC选择
bash
undefinedbash
undefinedG1GC (default in JDK 9+, recommended for heap > 4GB)
G1GC (default in JDK 9+, recommended for heap > 4GB)
java -XX:+UseG1GC -jar app.jar
java -XX:+UseG1GC -jar app.jar
ZGC (low latency, JDK 15+)
ZGC (low latency, JDK 15+)
java -XX:+UseZGC -jar app.jar
java -XX:+UseZGC -jar app.jar
Shenandoah (low latency, OpenJDK)
Shenandoah (low latency, OpenJDK)
java -XX:+UseShenandoahGC -jar app.jar
java -XX:+UseShenandoahGC -jar app.jar
Parallel GC (throughput)
Parallel GC (throughput)
java -XX:+UseParallelGC -jar app.jar
undefinedjava -XX:+UseParallelGC -jar app.jar
undefinedGC Logging
GC日志
bash
undefinedbash
undefinedJDK 9+ unified logging
JDK 9+ unified logging
java -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10m
-jar app.jar
-jar app.jar
java -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10m
-jar app.jar
-jar app.jar
Common GC flags
Common GC flags
java -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:gc.log
-jar app.jar
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:gc.log
-jar app.jar
undefinedjava -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:gc.log
-jar app.jar
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:gc.log
-jar app.jar
undefinedG1GC Tuning
G1GC调优
bash
java -XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \ # Target pause time
-XX:G1HeapRegionSize=16m \ # Region size
-XX:InitiatingHeapOccupancyPercent=45 \ # Start marking at 45%
-XX:G1ReservePercent=10 \ # Reserve for promotions
-XX:ConcGCThreads=4 \ # Concurrent GC threads
-XX:ParallelGCThreads=8 \ # Parallel GC threads
-jar app.jarbash
java -XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \ # Target pause time
-XX:G1HeapRegionSize=16m \ # Region size
-XX:InitiatingHeapOccupancyPercent=45 \ # Start marking at 45%
-XX:G1ReservePercent=10 \ # Reserve for promotions
-XX:ConcGCThreads=4 \ # Concurrent GC threads
-XX:ParallelGCThreads=8 \ # Parallel GC threads
-jar app.jarMemory Optimization
内存优化
Heap Sizing
堆大小配置
bash
undefinedbash
undefinedSet heap size
Set heap size
java -Xms4g -Xmx4g -jar app.jar # Fixed heap (recommended for production)
java -Xms4g -Xmx4g -jar app.jar # Fixed heap (recommended for production)
Metaspace sizing
Metaspace sizing
java -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -jar app.jar
java -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -jar app.jar
Direct memory
Direct memory
java -XX:MaxDirectMemorySize=256m -jar app.jar
undefinedjava -XX:MaxDirectMemorySize=256m -jar app.jar
undefinedMemory Leak Detection
内存泄漏检测
java
// Common leak patterns
// ❌ Bad: Static collections that grow
private static List<Object> cache = new ArrayList<>();
public void process(Object obj) {
cache.add(obj); // Never removed
}
// ✅ Good: Bounded cache with eviction
private static final Cache<String, Object> cache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.build();
// ❌ Bad: Unclosed resources
public void readFile(String path) {
InputStream is = new FileInputStream(path);
// Missing is.close()
}
// ✅ Good: Try-with-resources
public void readFile(String path) {
try (InputStream is = new FileInputStream(path)) {
// Process
}
}
// ❌ Bad: ThreadLocal not cleaned
private static ThreadLocal<Connection> connHolder = new ThreadLocal<>();
public void process() {
connHolder.set(getConnection());
// Missing connHolder.remove()
}
// ✅ Good: Always clean ThreadLocal
public void process() {
try {
connHolder.set(getConnection());
// Process
} finally {
connHolder.remove();
}
}java
// Common leak patterns
// ❌ Bad: Static collections that grow
private static List<Object> cache = new ArrayList<>();
public void process(Object obj) {
cache.add(obj); // Never removed
}
// ✅ Good: Bounded cache with eviction
private static final Cache<String, Object> cache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.build();
// ❌ Bad: Unclosed resources
public void readFile(String path) {
InputStream is = new FileInputStream(path);
// Missing is.close()
}
// ✅ Good: Try-with-resources
public void readFile(String path) {
try (InputStream is = new FileInputStream(path)) {
// Process
}
}
// ❌ Bad: ThreadLocal not cleaned
private static ThreadLocal<Connection> connHolder = new ThreadLocal<>();
public void process() {
connHolder.set(getConnection());
// Missing connHolder.remove()
}
// ✅ Good: Always clean ThreadLocal
public void process() {
try {
connHolder.set(getConnection());
// Process
} finally {
connHolder.remove();
}
}CPU Profiling
CPU剖析
async-profiler (recommended)
async-profiler(推荐使用)
bash
undefinedbash
undefinedProfile CPU
Profile CPU
./profiler.sh -d 30 -f profile.html <pid>
./profiler.sh -d 30 -f profile.html <pid>
Profile allocations
Profile allocations
./profiler.sh -d 30 -e alloc -f alloc.html <pid>
./profiler.sh -d 30 -e alloc -f alloc.html <pid>
Profile locks
Profile locks
./profiler.sh -d 30 -e lock -f lock.html <pid>
./profiler.sh -d 30 -e lock -f lock.html <pid>
Flame graph output
Flame graph output
./profiler.sh -d 30 -f flamegraph.html -o flamegraph <pid>
undefined./profiler.sh -d 30 -f flamegraph.html -o flamegraph <pid>
undefinedJMC (Java Mission Control)
JMC (Java Mission Control)
bash
undefinedbash
undefinedOpen JFR recording in JMC
Open JFR recording in JMC
jmc recording.jfr
undefinedjmc recording.jfr
undefinedCommon Bottleneck Patterns
常见性能瓶颈模式
Synchronization Issues
同步问题
java
// ❌ Bad: Coarse-grained locking
public synchronized void process(String key, Object value) {
cache.put(key, value);
compute(value);
}
// ✅ Good: Fine-grained locking
private final ConcurrentHashMap<String, Object> cache = new ConcurrentHashMap<>();
public void process(String key, Object value) {
cache.put(key, value); // Lock-free for different keys
compute(value);
}
// ✅ Good: Read-write lock
private final ReadWriteLock lock = new ReentrantReadWriteLock();
public Object get(String key) {
lock.readLock().lock();
try { return cache.get(key); }
finally { lock.readLock().unlock(); }
}java
// ❌ Bad: Coarse-grained locking
public synchronized void process(String key, Object value) {
cache.put(key, value);
compute(value);
}
// ✅ Good: Fine-grained locking
private final ConcurrentHashMap<String, Object> cache = new ConcurrentHashMap<>();
public void process(String key, Object value) {
cache.put(key, value); // Lock-free for different keys
compute(value);
}
// ✅ Good: Read-write lock
private final ReadWriteLock lock = new ReentrantReadWriteLock();
public Object get(String key) {
lock.readLock().lock();
try { return cache.get(key); }
finally { lock.readLock().unlock(); }
}String Operations
字符串操作
java
// ❌ Bad: String concatenation in loop
String result = "";
for (String s : list) {
result += s; // Creates new String each iteration
}
// ✅ Good: StringBuilder
StringBuilder sb = new StringBuilder();
for (String s : list) {
sb.append(s);
}
String result = sb.toString();
// ✅ Good: String.join for simple cases
String result = String.join("", list);java
// ❌ Bad: String concatenation in loop
String result = "";
for (String s : list) {
result += s; // Creates new String each iteration
}
// ✅ Good: StringBuilder
StringBuilder sb = new StringBuilder();
for (String s : list) {
sb.append(s);
}
String result = sb.toString();
// ✅ Good: String.join for simple cases
String result = String.join("", list);Collection Optimization
集合优化
java
// ❌ Bad: ArrayList when size known
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
list.add(getData(i)); // Multiple resizes
}
// ✅ Good: Pre-size collections
List<String> list = new ArrayList<>(10000);
// ✅ Good: Use primitive collections for performance
// Use Eclipse Collections, Trove, or fastutil
IntList list = new IntArrayList(10000);java
// ❌ Bad: ArrayList when size known
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
list.add(getData(i)); // Multiple resizes
}
// ✅ Good: Pre-size collections
List<String> list = new ArrayList<>(10000);
// ✅ Good: Use primitive collections for performance
// Use Eclipse Collections, Trove, or fastutil
IntList list = new IntArrayList(10000);Boxing/Unboxing
装箱/拆箱
java
// ❌ Bad: Autoboxing in hot path
public long sum(List<Long> numbers) {
long sum = 0;
for (Long n : numbers) {
sum += n; // Unboxing each iteration
}
return sum;
}
// ✅ Good: Use primitive streams
public long sum(List<Long> numbers) {
return numbers.stream().mapToLong(Long::longValue).sum();
}
// ✅ Good: Primitive arrays when possible
public long sum(long[] numbers) {
return Arrays.stream(numbers).sum();
}java
// ❌ Bad: Autoboxing in hot path
public long sum(List<Long> numbers) {
long sum = 0;
for (Long n : numbers) {
sum += n; // Unboxing each iteration
}
return sum;
}
// ✅ Good: Use primitive streams
public long sum(List<Long> numbers) {
return numbers.stream().mapToLong(Long::longValue).sum();
}
// ✅ Good: Primitive arrays when possible
public long sum(long[] numbers) {
return Arrays.stream(numbers).sum();
}JIT Optimization
JIT优化
Warm-up
预热
java
// Warm-up critical paths before measuring
public static void main(String[] args) {
// Warm-up phase
for (int i = 0; i < 10_000; i++) {
criticalMethod(i);
}
// Measurement phase
long start = System.nanoTime();
for (int i = 0; i < 100_000; i++) {
criticalMethod(i);
}
long duration = System.nanoTime() - start;
}java
// Warm-up critical paths before measuring
public static void main(String[] args) {
// Warm-up phase
for (int i = 0; i < 10_000; i++) {
criticalMethod(i);
}
// Measurement phase
long start = System.nanoTime();
for (int i = 0; i < 100_000; i++) {
criticalMethod(i);
}
long duration = System.nanoTime() - start;
}JIT Logging
JIT日志
bash
undefinedbash
undefinedPrint JIT compilation
Print JIT compilation
java -XX:+PrintCompilation -jar app.jar
java -XX:+PrintCompilation -jar app.jar
Print inlining decisions
Print inlining decisions
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -jar app.jar
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -jar app.jar
Disable specific optimizations for debugging
Disable specific optimizations for debugging
java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_hashCode -jar app.jar
undefinedjava -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_hashCode -jar app.jar
undefinedProfiling Checklist
剖析检查清单
| Check | Tool | Command |
|---|---|---|
| CPU hotspots | JFR | |
| Memory usage | jcmd | |
| GC behavior | GC logs | |
| Thread contention | JFR | |
| Memory leaks | Heap dump | |
| Class loading | jcmd | |
| 检查项 | 工具 | 命令 |
|---|---|---|
| CPU热点 | JFR | |
| 内存使用情况 | jcmd | |
| GC行为 | GC日志 | |
| 线程竞争 | JFR | |
| 内存泄漏 | 堆转储 | |
| 类加载情况 | jcmd | |
Production Flags
生产环境启动参数
bash
java \
-server \
-Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+UseStringDeduplication \
-XX:+AlwaysPreTouch \
-XX:+DisableExplicitGC \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/heap.hprof \
-Xlog:gc*:file=/var/log/gc.log:time,uptime:filecount=5,filesize=10m \
-jar app.jarbash
java \
-server \
-Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+UseStringDeduplication \
-XX:+AlwaysPreTouch \
-XX:+DisableExplicitGC \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/heap.hprof \
-Xlog:gc*:file=/var/log/gc.log:time,uptime:filecount=5,filesize=10m \
-jar app.jarAnti-Patterns
反模式
| Anti-Pattern | Why It's Wrong | Correct Approach |
|---|---|---|
| Extremely slow, synchronous I/O | Use SLF4J with async appenders |
| Creating many short-lived objects | GC pressure, allocation overhead | Reuse objects, use object pools |
| Thread contention, poor scalability | Use |
String concatenation with | Creates many intermediate strings | Use |
| Autoboxing in loops | Creates wrapper objects | Use primitive types |
| Not sizing collections | Frequent resizing, memory churn | Pre-size with |
| Unpredictable, deprecated | Use try-with-resources or explicit cleanup |
| Ignoring GC logs | Miss performance degradation | Always enable GC logging in production |
| One-size-fits-all heap | Wrong GC pauses for workload | Tune heap based on app behavior |
| Not using connection pooling | Connection creation overhead | Use HikariCP or similar |
| 反模式 | 问题原因 | 正确方案 |
|---|---|---|
热点路径中使用 | 同步IO性能极差 | 使用SLF4J搭配异步追加器 |
| 创建大量短生命周期对象 | 增加GC压力和分配开销 | 复用对象,使用对象池 |
热点路径使用 | 引发线程竞争,扩展性差 | 使用 |
循环中用 | 生成大量中间字符串 | 使用 |
| 循环中自动装箱/拆箱 | 生成大量包装类对象 | 使用基本数据类型 |
| 不为集合预设容量 | 频繁扩容导致内存颠簸 | 初始化时指定容量 |
用 | 执行不可预测,已被废弃 | 使用try-with-resources或显式清理逻辑 |
| 忽略GC日志 | 无法及时发现性能退化 | 生产环境务必开启GC日志 |
| 堆大小配置一刀切 | GC停顿不符合业务负载要求 | 根据应用行为调优堆配置 |
| 不使用连接池 | 连接创建开销大 | 使用HikariCP等连接池组件 |
Quick Troubleshooting
快速问题排查
| Issue | Diagnosis | Solution |
|---|---|---|
| Long GC pauses | Heap too large or wrong GC | Use ZGC/Shenandoah or tune G1GC pause targets |
| Memory leak or undersized heap | Analyze heap dump, increase |
| Too many classes loaded | Increase |
| High CPU usage | Hot loop, inefficient algorithm | CPU profile with JFR or async-profiler |
| Thread contention | Lock competition | Thread dump analysis, reduce lock scope |
| Slow startup | Class loading, initialization | Use AppCDS, lazy initialization |
| Memory leak | Unclosed resources, static collections | Heap dump comparison, find growing objects |
| GC overhead limit exceeded | GC taking > 98% of time | Fix memory leak or increase heap |
| Full GC too frequent | Old gen filling up | Tune heap ratio, fix object tenure issues |
| Application unresponsive | Deadlock or long GC | Thread dump to find deadlock, GC logs |
| 问题 | 诊断方法 | 解决方案 |
|---|---|---|
| GC停顿过长 | 堆过大或GC选型错误 | 使用ZGC/Shenandoah,或调优G1GC停顿目标 |
| 内存泄漏或堆容量不足 | 分析堆转储,必要时调大 |
| 加载类过多 | 调大 |
| CPU使用率过高 | 热点循环、算法效率低 | 用JFR或async-profiler做CPU剖析 |
| 线程竞争 | 锁竞争 | 分析线程转储,缩小锁的作用范围 |
| 启动缓慢 | 类加载、初始化耗时 | 使用AppCDS、懒加载逻辑 |
| 内存泄漏 | 资源未关闭、静态集合无限增长 | 对比堆转储,定位持续增长的对象 |
| GC开销超出限制 | GC占用超过98%的CPU时间 | 修复内存泄漏或调大堆容量 |
| Full GC过于频繁 | 老年代快速占满 | 调优堆比例,修复对象晋升问题 |
| 应用无响应 | 死锁或GC停顿过长 | 线程转储排查死锁,查看GC日志 |
Related Skills
相关技能
- Spring Boot
- Performance Profiling MCP
- Spring Boot
- Performance Profiling MCP