python-performance-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Python Performance Optimization

Python性能优化

Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.

通过系统分析、算法改进、高效数据结构和加速技术，为Python应用的性能分析、优化与加速提供专业指导。

When to Use This Skill

何时使用该技能

Code runs too slowly for production requirements
High CPU usage or memory consumption issues
Need to reduce API response times or batch processing duration
Application fails to scale under load
Optimizing data processing pipelines or scientific computing
Reducing cloud infrastructure costs through efficiency gains
Profile-guided optimization after measuring performance bottlenecks

代码运行速度无法满足生产环境要求
存在CPU占用过高或内存消耗过大的问题
需要缩短API响应时间或批量处理时长
应用在负载下无法实现水平扩展
优化数据处理管道或科学计算流程
通过提升效率降低云基础设施成本
在测量出性能瓶颈后，基于分析结果进行优化

Core Concepts

核心概念

The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.

Optimization Hierarchy (in priority order):

Algorithm complexity - O(n²) → O(n log n) provides exponential gains
Data structure choice - List → Set for lookups (10,000x faster)
Language features - Comprehensions, built-ins, generators
Caching - Memoization for repeated calculations
Compiled extensions - NumPy, Numba, Cython for hot paths
Parallelism - Multiprocessing for CPU-bound work

Key Principle: Algorithmic improvements beat micro-optimizations every time.

黄金法则：永远不要在未做性能分析的情况下进行优化。80%的执行时间都消耗在20%的代码上。

优化优先级层级（按优先级排序）：

算法复杂度 - 从O(n²)优化到O(n log n)能带来指数级的性能提升
数据结构选择 - 将List替换为Set进行查找操作（速度提升10000倍）
语言特性运用 - 推导式、内置函数、生成器
缓存机制 - 为重复计算的结果添加缓存
编译扩展 - 针对热点路径使用NumPy、Numba、Cython
并行处理 - 针对CPU密集型任务使用多进程

关键原则：算法改进的效果永远优于微优化。

Quick Reference

快速参考

Load detailed guides for specific optimization areas:

Task	Load reference
Profile code and find bottlenecks	`skills/python-performance-optimization/references/profiling.md`
Algorithm and data structure optimization	`skills/python-performance-optimization/references/algorithms.md`
Memory optimization and generators	`skills/python-performance-optimization/references/memory.md`
String concatenation and file I/O	`skills/python-performance-optimization/references/string-io.md`
NumPy, Numba, Cython, multiprocessing	`skills/python-performance-optimization/references/acceleration.md`

加载针对特定优化领域的详细指南：

任务	加载参考文档
分析代码并定位瓶颈	`skills/python-performance-optimization/references/profiling.md`
算法与数据结构优化	`skills/python-performance-optimization/references/algorithms.md`
内存优化与生成器使用	`skills/python-performance-optimization/references/memory.md`
字符串拼接与文件I/O	`skills/python-performance-optimization/references/string-io.md`
NumPy、Numba、Cython与多进程	`skills/python-performance-optimization/references/acceleration.md`

Optimization Workflow

优化工作流

Phase 1: Measure

阶段1：测量

Profile with cProfile - Identify slow functions
Line profile hot paths - Find exact slow lines
Memory profile - Check for memory bottlenecks
Benchmark baseline - Record current performance

使用cProfile进行性能分析 - 定位运行缓慢的函数
对热点路径进行行级分析 - 找出具体的慢执行行
内存分析 - 检查内存瓶颈
基准测试 - 记录当前性能指标

Phase 2: Analyze

阶段2：分析

Check algorithm complexity - Is it O(n²) or worse?
Evaluate data structures - Are you using lists for lookups?
Identify repeated work - Can results be cached?
Find I/O bottlenecks - Database queries, file operations

检查算法复杂度 - 是否为O(n²)或更差的复杂度？
评估数据结构 - 是否在使用List进行查找操作？
识别重复计算 - 结果是否可以被缓存？
定位I/O瓶颈 - 数据库查询、文件操作等

Phase 3: Optimize

阶段3：优化

Improve algorithms first - Biggest impact
Use appropriate data structures - Set/dict for O(1) lookups
Apply caching -
```
@lru_cache
```
for expensive functions
Use generators - For large datasets
Leverage NumPy/Numba - For numerical code
Parallelize - Multiprocessing for CPU-bound tasks

优先改进算法 - 带来最大性能提升
使用合适的数据结构 - 用Set/Dict实现O(1)时间复杂度的查找
应用缓存 - 为计算密集型函数使用
```
@lru_cache
```
使用生成器 - 处理大型数据集
借助NumPy/Numba - 针对数值计算代码
并行化处理 - 为CPU密集型任务使用多进程

Phase 4: Validate

阶段4：验证

Re-profile - Verify improvements
Benchmark - Measure speedup quantitatively
Test correctness - Ensure optimizations didn't break functionality
Document - Explain why optimization was needed

重新分析性能 - 验证优化效果
基准测试 - 量化测量性能提升幅度
正确性测试 - 确保优化未破坏原有功能
文档记录 - 说明优化的必要性

Common Optimization Patterns

常见优化模式

Pattern 1: Replace List with Set for Lookups

模式1：将List替换为Set进行查找操作

python

undefined

python

undefined

Slow: O(n) lookup

慢：O(n) 查找时间

if item in large_list: # Bad

if item in large_list: # 不推荐

Fast: O(1) lookup

快：O(1) 查找时间

if item in large_set: # Good

undefined

if item in large_set: # 推荐

undefined

Pattern 2: Use Comprehensions

模式2：使用推导式

python

undefined

python

undefined

Slower

较慢

result = [] for i in range(n): result.append(i * 2)

Faster (35% speedup)

更快（提升35%速度）

result = [i * 2 for i in range(n)]

undefined

result = [i * 2 for i in range(n)]

undefined

Pattern 3: Cache Expensive Calculations

模式3：缓存计算密集型结果

python

from functools import lru_cache

@lru_cache(maxsize=None)
def expensive_function(n):
    # Result cached automatically
    return complex_calculation(n)

python

from functools import lru_cache

@lru_cache(maxsize=None)
def expensive_function(n):
    # 结果会被自动缓存
    return complex_calculation(n)

Pattern 4: Use Generators for Large Data

模式4：使用生成器处理大型数据

python

undefined

python

undefined

Memory inefficient

内存效率低

def read_file(path): return [line for line in open(path)] # Loads entire file

def read_file(path): return [line for line in open(path)] # 加载整个文件到内存

Memory efficient

内存效率高

def read_file(path): for line in open(path): # Streams line by line yield line.strip()

undefined

def read_file(path): for line in open(path): # 逐行流式读取 yield line.strip()

undefined

Pattern 5: Vectorize with NumPy

模式5：使用NumPy向量化计算

python

undefined

python

undefined

Pure Python: ~500ms

纯Python实现：约500ms

result = sum(i**2 for i in range(1000000))

NumPy: ~5ms (100x faster)

NumPy实现：约5ms（速度提升100倍）

import numpy as np result = np.sum(np.arange(1000000)**2)

undefined

import numpy as np result = np.sum(np.arange(1000000)**2)

undefined

Common Mistakes to Avoid

需避免的常见错误

Optimizing before profiling - You'll optimize the wrong code
Using lists for membership tests - Use sets/dicts instead
String concatenation in loops - Use
```
"".join()
```
or
```
StringIO
```
Loading entire files into memory - Use generators
N+1 database queries - Use JOINs or batch queries
Ignoring built-in functions - They're C-optimized and fast
Premature optimization - Focus on algorithmic improvements first
Not benchmarking - Always measure improvements quantitatively

未做性能分析就进行优化 - 你可能会优化错误的代码
使用List进行成员测试 - 改用Set/Dict
在循环中拼接字符串 - 使用
```
"".join()
```
或
```
StringIO
```
将整个文件加载到内存 - 使用生成器
N+1数据库查询问题 - 使用JOIN或批量查询
忽略内置函数 - 它们是C实现的，速度更快
过早优化 - 优先关注算法层面的改进
未进行基准测试 - 始终量化测量优化效果

Decision Tree

决策树

Start here: Profile with cProfile to find bottlenecks

Hot path is algorithm?

Yes → Check complexity, improve algorithm, use better data structures
No → Continue

Hot path is computation?

Numerical loops → NumPy or Numba
CPU-bound → Multiprocessing
Already fast enough → Done

Hot path is memory?

Large data → Generators, streaming
Many objects →
```
__slots__
```
, object pooling
Caching needed →
```
@lru_cache
```
or custom cache

Hot path is I/O?

Database → Batch queries, indexes, connection pooling
Files → Buffering, streaming
Network → Async I/O, request batching

开始：使用cProfile进行性能分析，定位瓶颈

热点路径是算法问题？

是 → 检查复杂度，改进算法，使用更优的数据结构
否 → 继续

热点路径是计算问题？

数值循环 → 使用NumPy或Numba
CPU密集型 → 使用多进程
已满足性能要求 → 结束

热点路径是内存问题？

大型数据 → 生成器、流式处理
大量对象 → 使用
```
__slots__
```
、对象池
需要缓存 → 使用
```
@lru_cache
```
或自定义缓存

热点路径是I/O问题？

数据库 → 批量查询、索引、连接池
文件 → 缓冲、流式处理
网络 → 异步I/O、请求批量处理

Best Practices

最佳实践

Profile before optimizing - Measure to find real bottlenecks
Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
Use appropriate data structures - Set/dict for lookups, not lists
Leverage built-ins - C-implemented built-ins are faster than pure Python
Avoid premature optimization - Optimize hot paths identified by profiling
Use generators for large data - Reduce memory usage with lazy evaluation
Batch operations - Minimize overhead from syscalls and network requests
Cache expensive computations - Use
```
@lru_cache
```
or custom caching
Consider NumPy/Numba - Vectorization and JIT for numerical code
Parallelize CPU-bound work - Use multiprocessing to utilize all cores

优化前先做性能分析 - 通过测量找到真实的瓶颈
优先优化算法 - 从O(n²)到O(n)的优化效果远胜于微优化
使用合适的数据结构 - 用Set/Dict进行查找，而非List
充分利用内置函数 - C实现的内置函数比纯Python代码更快
避免过早优化 - 仅针对性能分析定位出的热点路径进行优化
使用生成器处理大型数据 - 通过惰性计算降低内存占用
批量操作 - 减少系统调用和网络请求的开销
缓存计算密集型任务的结果 - 使用
```
@lru_cache
```
或自定义缓存机制
考虑使用NumPy/Numba - 针对数值计算的向量化和JIT编译
并行化CPU密集型任务 - 使用多进程充分利用所有核心

Resources

资源

Python Performance: https://wiki.python.org/moin/PythonSpeed
cProfile: https://docs.python.org/3/library/profile.html
NumPy: https://numpy.org/doc/stable/user/absolute_beginners.html
Numba: https://numba.pydata.org/
Cython: https://cython.readthedocs.io/
High Performance Python (Book by Gorelick & Ozsvald)

Python性能优化：https://wiki.python.org/moin/PythonSpeed
cProfile：https://docs.python.org/3/library/profile.html
NumPy：https://numpy.org/doc/stable/user/absolute_beginners.html
Numba：https://numba.pydata.org/
Cython：https://cython.readthedocs.io/
《High Performance Python》（Gorelick & Ozsvald 所著书籍）