python-profiling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePython Performance Profiling
Python性能分析
When NOT to Use This Skill
本技能不适用场景
- Java/JVM profiling - Use the skill for JFR and GC tuning
java-profiling - Node.js profiling - Use the skill for V8 profiler
nodejs-profiling - NumPy/Pandas optimization - Use library-specific profiling tools and vectorization guides
- Database query optimization - Use database-specific profiling tools
- Web server performance - Use application-level profiling (Django Debug Toolbar, Flask-DebugToolbar)
Deep Knowledge: Usewith technology:mcp__documentation__fetch_docsfor comprehensive profiling guides, optimization techniques, and best practices.python
- Java/JVM性能分析 - 请使用技能处理JFR和GC调优相关问题
java-profiling - Node.js性能分析 - 请使用技能处理V8引擎分析相关问题
nodejs-profiling - NumPy/Pandas优化 - 请使用对应库专属的分析工具和向量化指南
- 数据库查询优化 - 请使用数据库专属的分析工具
- Web服务器性能优化 - 请使用应用层分析工具(Django Debug Toolbar、Flask-DebugToolbar)
深度知识:如需全面的性能分析指南、优化技巧和最佳实践,可以调用工具,指定technology参数为mcp__documentation__fetch_docs获取相关文档。python
cProfile (CPU Profiling)
cProfile(CPU性能分析)
Command Line Usage
命令行使用
bash
undefinedbash
undefinedProfile entire script
Profile entire script
python -m cProfile -o output.prof script.py
python -m cProfile -o output.prof script.py
Sort by cumulative time
Sort by cumulative time
python -m cProfile -s cumtime script.py
python -m cProfile -s cumtime script.py
Sort by total time in function
Sort by total time in function
python -m cProfile -s tottime script.py
python -m cProfile -s tottime script.py
Analyze saved profile
Analyze saved profile
python -m pstats output.prof
undefinedpython -m pstats output.prof
undefinedpstats Analysis
pstats分析
python
import pstatspython
import pstatsLoad and analyze profile
Load and analyze profile
stats = pstats.Stats('output.prof')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions
stats = pstats.Stats('output.prof')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions
Filter by module
Filter by module
stats.print_stats('mymodule')
stats.print_stats('mymodule')
Show callers
Show callers
stats.print_callers('slow_function')
stats.print_callers('slow_function')
Show callees
Show callees
stats.print_callees('main')
undefinedstats.print_callees('main')
undefinedProgrammatic Profiling
编程式分析
python
import cProfile
import pstats
from io import StringIO
def profile_function(func, *args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
# Analyze
stream = StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(10)
print(stream.getvalue())
return resultpython
import cProfile
import pstats
from io import StringIO
def profile_function(func, *args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
# Analyze
stream = StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats('cumulative')
stats.print_stats(10)
print(stream.getvalue())
return resultContext manager
Context manager
from contextlib import contextmanager
@contextmanager
def profile_block(name='profile'):
profiler = cProfile.Profile()
profiler.enable()
try:
yield
finally:
profiler.disable()
profiler.dump_stats(f'{name}.prof')
undefinedfrom contextlib import contextmanager
@contextmanager
def profile_block(name='profile'):
profiler = cProfile.Profile()
profiler.enable()
try:
yield
finally:
profiler.disable()
profiler.dump_stats(f'{name}.prof')
undefinedMemory Profiling
内存分析
tracemalloc (Built-in)
tracemalloc(内置工具)
python
import tracemallocpython
import tracemallocStart tracking
Start tracking
tracemalloc.start()
tracemalloc.start()
Your code here
Your code here
result = process_data()
result = process_data()
Get snapshot
Get snapshot
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("Top 10 memory allocations:")
for stat in top_stats[:10]:
print(stat)
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("Top 10 memory allocations:")
for stat in top_stats[:10]:
print(stat)
Compare snapshots
Compare snapshots
snapshot1 = tracemalloc.take_snapshot()
snapshot1 = tracemalloc.take_snapshot()
... code ...
... code ...
snapshot2 = tracemalloc.take_snapshot()
diff = snapshot2.compare_to(snapshot1, 'lineno')
for stat in diff[:10]:
print(stat)
snapshot2 = tracemalloc.take_snapshot()
diff = snapshot2.compare.compare
for stat in diff[:10]:
print(stat)
Stop tracking
Stop tracking
tracemalloc.stop()
undefinedtracemalloc.stop()
undefinedmemory_profiler (Line-by-line)
memory_profiler(逐行内存分析)
python
undefinedpython
undefinedInstall: pip install memory_profiler
Install: pip install memory_profiler
from memory_profiler import profile
@profile
def my_function():
a = [1] * 1_000_000
b = [2] * 2_000_000
del b
return a
from memory_profiler import profile
@profile
def my_function():
a = [1] * 1_000_000
b = [2] * 2_000_000
del b
return a
Command line usage
Command line usage
python -m memory_profiler script.py
python -m memory_profiler script.py
Profile specific function
Profile specific function
mprof run script.py
mprof run script.py
mprof plot
mprof plot
undefinedundefinedobjgraph (Object References)
objgraph(对象引用分析)
python
undefinedpython
undefinedInstall: pip install objgraph
Install: pip install objgraph
import objgraph
import objgraph
Most common types
Most common types
objgraph.show_most_common_types(limit=20)
objgraph.show_most_common_types(limit=20)
Growth since last call
Growth since last call
objgraph.show_growth()
objgraph.show_growth()
Find reference chain (memory leak detection)
Find reference chain (memory leak detection)
objgraph.show_backrefs([leaked_object], filename='refs.png')
undefinedobjgraph.show_backrefs([leaked_object], filename='refs.png')
undefinedLine Profiler
Line Profiler
python
undefinedpython
undefinedInstall: pip install line_profiler
Install: pip install line_profiler
Decorate functions to profile
Decorate functions to profile
@profile
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
@profile
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
Run with: kernprof -l -v script.py
Run with: kernprof -l -v script.py
undefinedundefinedHigh-Resolution Timing
高精度计时
time Module
time模块
python
import timepython
import timeMonotonic clock (best for measuring durations)
Monotonic clock (best for measuring durations)
start = time.perf_counter()
result = do_work()
duration = time.perf_counter() - start
print(f"Duration: {duration:.4f}s")
start = time.perf_counter()
result = do_work()
duration = time.perf_counter() - start
print(f"Duration: {duration:.4f}s")
Nanosecond precision (Python 3.7+)
Nanosecond precision (Python 3.7+)
start = time.perf_counter_ns()
result = do_work()
duration_ns = time.perf_counter_ns() - start
print(f"Duration: {duration_ns}ns")
undefinedstart = time.perf_counter_ns()
result = do_work()
duration_ns = time.perf_counter_ns() - start
print(f"Duration: {duration_ns}ns")
undefinedtimeit Module
timeit模块
python
import timeitpython
import timeitTime small code snippets
Time small code snippets
duration = timeit.timeit('sum(range(1000))', number=10000)
print(f"Average: {duration / 10000:.6f}s")
duration = timeit.timeit('sum(range(1000))', number=10000)
print(f"Average: {duration / 10000:.6f}s")
Compare implementations
Compare implementations
setup = "data = list(range(10000))"
time1 = timeit.timeit('sum(data)', setup, number=1000)
time2 = timeit.timeit('sum(x for x in data)', setup, number=1000)
print(f"sum(): {time1:.4f}s, generator: {time2:.4f}s")
undefinedsetup = "data = list(range(10000))"
time1 = timeit.timeit('sum(data)', setup, number=1000)
time2 = timeit.timeit('sum(x for x in data)', setup, number=1000)
print(f"sum(): {time1:.4f}s, generator: {time2:.4f}s")
undefinedCommon Bottleneck Patterns
常见性能瓶颈模式
List Operations
列表操作
python
undefinedpython
undefined❌ Bad: Concatenating lists in loop
❌ Bad: Concatenating lists in loop
result = []
for item in items:
result = result + [process(item)] # O(n²)
result = []
for item in items:
result = result + [process(item)] # O(n²)
✅ Good: Use append
✅ Good: Use append
result = []
for item in items:
result.append(process(item)) # O(n)
result = []
for item in items:
result.append(process(item)) # O(n)
✅ Better: List comprehension
✅ Better: List comprehension
result = [process(item) for item in items]
result = [process(item) for item in items]
❌ Bad: Checking membership in list
❌ Bad: Checking membership in list
if item in large_list: # O(n)
pass
if item in large_list: # O(n)
pass
✅ Good: Use set for membership
✅ Good: Use set for membership
large_set = set(large_list)
if item in large_set: # O(1)
pass
undefinedlarge_set = set(large_list)
if item in large_set: # O(1)
pass
undefinedString Operations
字符串操作
python
undefinedpython
undefined❌ Bad: String concatenation in loop
❌ Bad: String concatenation in loop
result = ""
for s in strings:
result += s # Creates new string each time
result = ""
for s in strings:
result += s # Creates new string each time
✅ Good: Use join
✅ Good: Use join
result = "".join(strings)
result = "".join(strings)
❌ Bad: Format in loop
❌ Bad: Format in loop
for item in items:
log(f"Processing {item}")
for item in items:
log(f"Processing {item}")
✅ Good: Lazy formatting
✅ Good: Lazy formatting
import logging
for item in items:
logging.debug("Processing %s", item) # Only formats if needed
undefinedimport logging
for item in items:
logging.debug("Processing %s", item) # Only formats if needed
undefinedDictionary Operations
字典操作
python
undefinedpython
undefined❌ Bad: Repeated key lookup
❌ Bad: Repeated key lookup
if key in d:
value = d[key]
process(value)
if key in d:
value = d[key]
process(value)
✅ Good: Use get or setdefault
✅ Good: Use get or setdefault
value = d.get(key)
if value is not None:
process(value)
value = d.get(key)
if value is not None:
process(value)
❌ Bad: Checking then setting
❌ Bad: Checking then setting
if key not in d:
d[key] = []
d[key].append(value)
if key not in d:
d[key] = []
d[key].append(value)
✅ Good: Use defaultdict
✅ Good: Use defaultdict
from collections import defaultdict
d = defaultdict(list)
d[key].append(value)
undefinedfrom collections import defaultdict
d = defaultdict(list)
d[key].append(value)
undefinedGenerator vs List
生成器 vs 列表
python
undefinedpython
undefined❌ Bad: Creating large intermediate lists
❌ Bad: Creating large intermediate lists
result = sum([x * 2 for x in range(10_000_000)]) # Uses memory
result = sum([x * 2 for x in range(10_000_000)]) # Uses memory
✅ Good: Use generator
✅ Good: Use generator
result = sum(x * 2 for x in range(10_000_000)) # Lazy evaluation
result = sum(x * 2 for x in range(10_000_000)) # Lazy evaluation
Process large files
Process large files
❌ Bad
❌ Bad
data = open('large.csv').readlines() # All in memory
for line in data:
process(line)
data = open('large.csv').readlines() # All in memory
for line in data:
process(line)
✅ Good
✅ Good
with open('large.csv') as f: # Stream line by line
for line in f:
process(line)
undefinedwith open('large.csv') as f: # Stream line by line
for line in f:
process(line)
undefinedNumPy Optimization
NumPy优化
python
import numpy as nppython
import numpy as np❌ Bad: Python loops over arrays
❌ Bad: Python loops over arrays
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
✅ Good: Vectorized operations
✅ Good: Vectorized operations
result = arr * 2 # SIMD operations
result = arr * 2 # SIMD operations
❌ Bad: Creating many temporary arrays
❌ Bad: Creating many temporary arrays
result = (arr1 + arr2) * arr3 / arr4 # 3 temporaries
result = (arr1 + arr2) * arr3 / arr4 # 3 temporaries
✅ Good: In-place operations when possible
✅ Good: In-place operations when possible
result = arr1.copy()
result += arr2
result *= arr3
result /= arr4
result = arr1.copy()
result += arr2
result *= arr3
result /= arr4
Use appropriate dtypes
Use appropriate dtypes
arr = np.array(data, dtype=np.float32) # Half memory of float64
undefinedarr = np.array(data, dtype=np.float32) # Half memory of float64
undefinedAsync Optimization
Async优化
python
import asyncio
import aiohttppython
import asyncio
import aiohttp❌ Bad: Sequential async
❌ Bad: Sequential async
async def fetch_all_sequential(urls):
results = []
async with aiohttp.ClientSession() as session:
for url in urls:
async with session.get(url) as resp:
results.append(await resp.text())
return results
async def fetch_all_sequential(urls):
results = []
async with aiohttp.ClientSession() as session:
for url in urls:
async with session.get(url) as resp:
results.append(await resp.text())
return results
✅ Good: Concurrent async
✅ Good: Concurrent async
async def fetch_all_concurrent(urls):
async with aiohttp.ClientSession() as session:
tasks = [session.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return [await r.text() for r in responses]
async def fetch_all_concurrent(urls):
async with aiohttp.ClientSession() as session:
tasks = [session.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return [await r.text() for r in responses]
✅ Better: With concurrency limit
✅ Better: With concurrency limit
from asyncio import Semaphore
async def fetch_with_limit(urls, limit=10):
semaphore = Semaphore(limit)
async def fetch_one(url):
async with semaphore:
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
return await asyncio.gather(*[fetch_one(url) for url in urls])undefinedfrom asyncio import Semaphore
async def fetch_with_limit(urls, limit=10):
semaphore = Semaphore(limit)
async def fetch_one(url):
async with semaphore:
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.text()
return await asyncio.gather(*[fetch_one(url) for url in urls])undefinedMultiprocessing
多进程
python
from multiprocessing import Pool, cpu_count
from concurrent.futures import ProcessPoolExecutorpython
from multiprocessing import Pool, cpu_count
from concurrent.futures import ProcessPoolExecutorCPU-bound work
CPU-bound work
def cpu_intensive(x):
return sum(i * i for i in range(x))
def cpu_intensive(x):
return sum(i * i for i in range(x))
Using Pool
Using Pool
with Pool(cpu_count()) as pool:
results = pool.map(cpu_intensive, range(100))
with Pool(cpu_count()) as pool:
results = pool.map(cpu_intensive, range(100))
Using ProcessPoolExecutor
Using ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
results = list(executor.map(cpu_intensive, range(100)))
with ProcessPoolExecutor() as executor:
results = list(executor.map(cpu_intensive, range(100)))
Shared memory (Python 3.8+)
Shared memory (Python 3.8+)
from multiprocessing import shared_memory
import numpy as np
from multiprocessing import shared_memory
import numpy as np
Create shared array
Create shared array
shm = shared_memory.SharedMemory(create=True, size=arr.nbytes)
shared_arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
shared_arr[:] = arr[:]
undefinedshm = shared_memory.SharedMemory(create=True, size=arr.nbytes)
shared_arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
shared_arr[:] = arr[:]
undefinedProfiling Checklist
性能分析检查清单
| Check | Tool | Command |
|---|---|---|
| CPU hotspots | cProfile | |
| Line-by-line | line_profiler | |
| Memory usage | tracemalloc | |
| Memory per line | memory_profiler | |
| Object references | objgraph | |
| Quick benchmarks | timeit | |
| 检查项 | 工具 | 命令 |
|---|---|---|
| CPU热点定位 | cProfile | |
| 逐行性能分析 | line_profiler | |
| 内存使用分析 | tracemalloc | |
| 逐行内存占用 | memory_profiler | |
| 对象引用分析 | objgraph | |
| 快速基准测试 | timeit | |
py-spy (Sampling Profiler)
py-spy(采样分析器)
bash
undefinedbash
undefinedInstall: pip install py-spy
Install: pip install py-spy
Record profile
Record profile
py-spy record -o profile.svg -- python script.py
py-spy record -o profile.svg -- python script.py
Top-like view of running process
Top-like view of running process
py-spy top --pid <pid>
py-spy top --pid <pid>
Dump current stack
Dump current stack
py-spy dump --pid <pid>
py-spy dump --pid <pid>
Profile subprocesses
Profile subprocesses
py-spy record --subprocesses -o profile.svg -- python script.py
undefinedpy-spy record --subprocesses -o profile.svg -- python script.py
undefinedProduction Optimization
生产环境优化
python
undefinedpython
undefinedUse slots for memory efficiency
Use slots for memory efficiency
class Point:
slots = ['x', 'y']
def init(self, x, y):
self.x = x
self.y = y
class Point:
slots = ['x', 'y']
def init(self, x, y):
self.x = x
self.y = y
Use lru_cache for memoization
Use lru_cache for memoization
from functools import lru_cache
@lru_cache(maxsize=1000)
def expensive_computation(x):
return x ** 2
from functools import lru_cache
@lru_cache(maxsize=1000)
def expensive_computation(x):
return x ** 2
Use dataclasses with slots (Python 3.10+)
Use dataclasses with slots (Python 3.10+)
from dataclasses import dataclass
@dataclass(slots=True)
class Point:
x: float
y: float
undefinedfrom dataclasses import dataclass
@dataclass(slots=True)
class Point:
x: float
y: float
undefinedAnti-Patterns
反模式
| Anti-Pattern | Why It's Wrong | Correct Approach |
|---|---|---|
Using | O(n²) time complexity | Use |
| List comprehension when generator suffices | Unnecessary memory allocation | Use generator expression for one-time iteration |
| Manual index tracking, error-prone | Use |
| Checking membership in list | O(n) lookup | Use |
| Hard to profile, side effects | Pass parameters, return values |
| Not using NumPy for numerical work | Orders of magnitude slower | Vectorize with NumPy for array operations |
| Premature optimization | Wasted effort, harder to maintain | Profile first, optimize bottlenecks |
Using | Namespace pollution, slower imports | Import specific names |
| Multiple reallocations | Pre-allocate with list comprehension or |
Not using | Higher memory usage | Use |
| 反模式 | 问题原因 | 正确做法 |
|---|---|---|
循环中使用 | O(n²)时间复杂度 | 使用 |
| 可使用生成器的场景下使用列表推导式 | 不必要的内存占用 | 一次性迭代时使用生成器表达式 |
需要索引时使用 | 手动维护索引容易出错 | 使用 |
| 在列表中检查成员关系 | O(n)查找复杂度 | 使用 |
大量使用 | 难以分析,容易产生副作用 | 传递参数,返回计算结果 |
| 数值计算不使用NumPy | 运行速度慢几个数量级 | 数组操作使用NumPy向量化实现 |
| 过早优化 | 浪费精力,代码维护难度提升 | 先分析定位瓶颈,再针对性优化 |
使用 | 命名空间污染,导入速度慢 | 导入需要的具体名称 |
列表长度已知时循环使用 | 多次内存重分配 | 使用列表推导式或 |
大量实例的类不使用 | 内存占用较高 | 实例数量多的类使用 |
Quick Troubleshooting
快速问题排查
| Issue | Diagnosis | Solution |
|---|---|---|
| Slow loops over large data | Python loops are slow | Vectorize with NumPy, use list comprehensions |
| High memory usage | Creating large intermediate objects | Use generators, process in chunks |
| GIL contention | Multi-threading doesn't speed up CPU work | Use |
| Slow imports | Large modules with side effects | Lazy import, reduce module-level code |
| Memory leak | Objects not being garbage collected | Check for circular references, use |
| Recursion too deep | Increase limit with |
| Slow dictionary operations | Hash collisions | Ensure keys are hashable and well-distributed |
| High CPU in profiler | C extensions not showing | Use sampling profiler like |
| Out of memory with large file | Loading entire file | Use |
| Slow JSON parsing | Large JSON file | Use streaming parser (ijson) or pandas |
| 问题 | 诊断方法 | 解决方案 |
|---|---|---|
| 大数据量循环运行慢 | Python循环本身性能低 | 使用NumPy向量化、列表推导式优化 |
| 内存占用过高 | 创建了大量中间对象 | 使用生成器,分块处理数据 |
| GIL竞争 | 多线程无法提升CPU密集型任务速度 | CPU密集型任务使用 |
| 导入速度慢 | 大型模块存在导入副作用 | 懒加载,减少模块顶层代码 |
| 内存泄漏 | 对象无法被垃圾回收 | 检查循环引用,使用 |
| 递归深度超过限制 | 使用 |
| 字典操作慢 | 哈希冲突 | 确保键可哈希且分布均匀 |
| 分析器显示CPU占用高但看不到C扩展调用 | 分析器无法追踪C扩展调用 | 使用 |
| 大文件处理内存不足 | 加载整个文件到内存 | 使用 |
| JSON解析慢 | JSON文件过大 | 使用流式解析器(ijson)或pandas处理 |
Related Skills
相关技能
- FastAPI
- Django
- NumPy/Pandas
- FastAPI
- Django
- NumPy/Pandas