python-performance-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Python Performance Optimization

Python性能优化

Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices.
本指南全面介绍了如何分析、剖析和优化Python代码以提升性能,包括CPU分析、内存优化以及最佳实践的实施。

When to Use This Skill

何时使用此技能

  • Identifying performance bottlenecks in Python applications
  • Reducing application latency and response times
  • Optimizing CPU-intensive operations
  • Reducing memory consumption and memory leaks
  • Improving database query performance
  • Optimizing I/O operations
  • Speeding up data processing pipelines
  • Implementing high-performance algorithms
  • Profiling production applications
  • 识别Python应用程序中的性能瓶颈
  • 降低应用程序延迟和响应时间
  • 优化CPU密集型操作
  • 减少内存消耗和内存泄漏
  • 提升数据库查询性能
  • 优化I/O操作
  • 加速数据处理流水线
  • 实现高性能算法
  • 分析生产环境中的应用程序

Core Concepts

核心概念

1. Profiling Types

1. 分析类型

  • CPU Profiling: Identify time-consuming functions
  • Memory Profiling: Track memory allocation and leaks
  • Line Profiling: Profile at line-by-line granularity
  • Call Graph: Visualize function call relationships
  • CPU分析:识别耗时的函数
  • 内存分析:跟踪内存分配和泄漏
  • 行级分析:以逐行粒度进行分析
  • 调用图:可视化函数调用关系

2. Performance Metrics

2. 性能指标

  • Execution Time: How long operations take
  • Memory Usage: Peak and average memory consumption
  • CPU Utilization: Processor usage patterns
  • I/O Wait: Time spent on I/O operations
  • 执行时间:操作所需的时长
  • 内存使用:峰值和平均内存消耗
  • CPU利用率:处理器使用模式
  • I/O等待:花费在I/O操作上的时间

3. Optimization Strategies

3. 优化策略

  • Algorithmic: Better algorithms and data structures
  • Implementation: More efficient code patterns
  • Parallelization: Multi-threading/processing
  • Caching: Avoid redundant computation
  • Native Extensions: C/Rust for critical paths
  • 算法层面:更优的算法和数据结构
  • 实现层面:更高效的代码模式
  • 并行化:多线程/多进程
  • 缓存:避免重复计算
  • 原生扩展:关键路径使用C/Rust实现

Quick Start

快速开始

Basic Timing

基础计时

python
import time

def measure_time():
    """Simple timing measurement."""
    start = time.time()

    # Your code here
    result = sum(range(1000000))

    elapsed = time.time() - start
    print(f"Execution time: {elapsed:.4f} seconds")
    return result
python
import time

def measure_time():
    """Simple timing measurement."""
    start = time.time()

    # Your code here
    result = sum(range(1000000))

    elapsed = time.time() - start
    print(f"Execution time: {elapsed:.4f} seconds")
    return result

Better: use timeit for accurate measurements

Better: use timeit for accurate measurements

import timeit
execution_time = timeit.timeit( "sum(range(1000000))", number=100 ) print(f"Average time: {execution_time/100:.6f} seconds")
undefined
import timeit
execution_time = timeit.timeit( "sum(range(1000000))", number=100 ) print(f"Average time: {execution_time/100:.6f} seconds")
undefined

Profiling Tools

分析工具

Pattern 1: cProfile - CPU Profiling

模式1:cProfile - CPU分析

python
import cProfile
import pstats
from pstats import SortKey

def slow_function():
    """Function to profile."""
    total = 0
    for i in range(1000000):
        total += i
    return total

def another_function():
    """Another function."""
    return [i**2 for i in range(100000)]

def main():
    """Main function to profile."""
    result1 = slow_function()
    result2 = another_function()
    return result1, result2
python
import cProfile
import pstats
from pstats import SortKey

def slow_function():
    """Function to profile."""
    total = 0
    for i in range(1000000):
        total += i
    return total

def another_function():
    """Another function."""
    return [i**2 for i in range(100000)]

def main():
    """Main function to profile."""
    result1 = slow_function()
    result2 = another_function()
    return result1, result2

Profile the code

Profile the code

if name == "main": profiler = cProfile.Profile() profiler.enable()
main()

profiler.disable()

# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats(10)  # Top 10 functions

# Save to file for later analysis
stats.dump_stats("profile_output.prof")

**Command-line profiling:**

```bash
if name == "main": profiler = cProfile.Profile() profiler.enable()
main()

profiler.disable()

# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats(10)  # Top 10 functions

# Save to file for later analysis
stats.dump_stats("profile_output.prof")

**命令行分析:**

```bash

Profile a script

Profile a script

python -m cProfile -o output.prof script.py
python -m cProfile -o output.prof script.py

View results

View results

python -m pstats output.prof
python -m pstats output.prof

In pstats:

In pstats:

sort cumtime

sort cumtime

stats 10

stats 10

undefined
undefined

Pattern 2: line_profiler - Line-by-Line Profiling

模式2:line_profiler - 逐行分析

python
undefined
python
undefined

Install: pip install line-profiler

Install: pip install line-profiler

Add @profile decorator (line_profiler provides this)

Add @profile decorator (line_profiler provides this)

@profile def process_data(data): """Process data with line profiling.""" result = [] for item in data: processed = item * 2 result.append(processed) return result
@profile def process_data(data): """Process data with line profiling.""" result = [] for item in data: processed = item * 2 result.append(processed) return result

Run with:

Run with:

kernprof -l -v script.py

kernprof -l -v script.py


**Manual line profiling:**

```python
from line_profiler import LineProfiler

def process_data(data):
    """Function to profile."""
    result = []
    for item in data:
        processed = item * 2
        result.append(processed)
    return result

if __name__ == "__main__":
    lp = LineProfiler()
    lp.add_function(process_data)

    data = list(range(100000))

    lp_wrapper = lp(process_data)
    lp_wrapper(data)

    lp.print_stats()

**手动逐行分析:**

```python
from line_profiler import LineProfiler

def process_data(data):
    """Function to profile."""
    result = []
    for item in data:
        processed = item * 2
        result.append(processed)
    return result

if __name__ == "__main__":
    lp = LineProfiler()
    lp.add_function(process_data)

    data = list(range(100000))

    lp_wrapper = lp(process_data)
    lp_wrapper(data)

    lp.print_stats()

Pattern 3: memory_profiler - Memory Usage

模式3:memory_profiler - 内存使用分析

python
undefined
python
undefined

Install: pip install memory-profiler

Install: pip install memory-profiler

from memory_profiler import profile
@profile def memory_intensive(): """Function that uses lots of memory.""" # Create large list big_list = [i for i in range(1000000)]
# Create large dict
big_dict = {i: i**2 for i in range(100000)}

# Process data
result = sum(big_list)

return result
if name == "main": memory_intensive()
from memory_profiler import profile
@profile def memory_intensive(): """Function that uses lots of memory.""" # Create large list big_list = [i for i in range(1000000)]
# Create large dict
big_dict = {i: i**2 for i in range(100000)}

# Process data
result = sum(big_list)

return result
if name == "main": memory_intensive()

Run with:

Run with:

python -m memory_profiler script.py

python -m memory_profiler script.py

undefined
undefined

Pattern 4: py-spy - Production Profiling

模式4:py-spy - 生产环境分析

bash
undefined
bash
undefined

Install: pip install py-spy

Install: pip install py-spy

Profile a running Python process

Profile a running Python process

py-spy top --pid 12345
py-spy top --pid 12345

Generate flamegraph

Generate flamegraph

py-spy record -o profile.svg --pid 12345
py-spy record -o profile.svg --pid 12345

Profile a script

Profile a script

py-spy record -o profile.svg -- python script.py
py-spy record -o profile.svg -- python script.py

Dump current call stack

Dump current call stack

py-spy dump --pid 12345
undefined
py-spy dump --pid 12345
undefined

Optimization Patterns

优化模式

Pattern 5: List Comprehensions vs Loops

模式5:列表推导式 vs 循环

python
import timeit
python
import timeit

Slow: Traditional loop

Slow: Traditional loop

def slow_squares(n): """Create list of squares using loop.""" result = [] for i in range(n): result.append(i**2) return result
def slow_squares(n): """Create list of squares using loop.""" result = [] for i in range(n): result.append(i**2) return result

Fast: List comprehension

Fast: List comprehension

def fast_squares(n): """Create list of squares using comprehension.""" return [i**2 for i in range(n)]
def fast_squares(n): """Create list of squares using comprehension.""" return [i**2 for i in range(n)]

Benchmark

Benchmark

n = 100000
slow_time = timeit.timeit(lambda: slow_squares(n), number=100) fast_time = timeit.timeit(lambda: fast_squares(n), number=100)
print(f"Loop: {slow_time:.4f}s") print(f"Comprehension: {fast_time:.4f}s") print(f"Speedup: {slow_time/fast_time:.2f}x")
n = 100000
slow_time = timeit.timeit(lambda: slow_squares(n), number=100) fast_time = timeit.timeit(lambda: fast_squares(n), number=100)
print(f"Loop: {slow_time:.4f}s") print(f"Comprehension: {fast_time:.4f}s") print(f"Speedup: {slow_time/fast_time:.2f}x")

Even faster for simple operations: map

Even faster for simple operations: map

def faster_squares(n): """Use map for even better performance.""" return list(map(lambda x: x**2, range(n)))
undefined
def faster_squares(n): """Use map for even better performance.""" return list(map(lambda x: x**2, range(n)))
undefined

Pattern 6: Generator Expressions for Memory

模式6:生成器表达式优化内存

python
import sys

def list_approach():
    """Memory-intensive list."""
    data = [i**2 for i in range(1000000)]
    return sum(data)

def generator_approach():
    """Memory-efficient generator."""
    data = (i**2 for i in range(1000000))
    return sum(data)
python
import sys

def list_approach():
    """Memory-intensive list."""
    data = [i**2 for i in range(1000000)]
    return sum(data)

def generator_approach():
    """Memory-efficient generator."""
    data = (i**2 for i in range(1000000))
    return sum(data)

Memory comparison

Memory comparison

list_data = [i for i in range(1000000)] gen_data = (i for i in range(1000000))
print(f"List size: {sys.getsizeof(list_data)} bytes") print(f"Generator size: {sys.getsizeof(gen_data)} bytes")
list_data = [i for i in range(1000000)] gen_data = (i for i in range(1000000))
print(f"List size: {sys.getsizeof(list_data)} bytes") print(f"Generator size: {sys.getsizeof(gen_data)} bytes")

Generators use constant memory regardless of size

Generators use constant memory regardless of size

undefined
undefined

Pattern 7: String Concatenation

模式7:字符串拼接

python
import timeit

def slow_concat(items):
    """Slow string concatenation."""
    result = ""
    for item in items:
        result += str(item)
    return result

def fast_concat(items):
    """Fast string concatenation with join."""
    return "".join(str(item) for item in items)

def faster_concat(items):
    """Even faster with list."""
    parts = [str(item) for item in items]
    return "".join(parts)

items = list(range(10000))
python
import timeit

def slow_concat(items):
    """Slow string concatenation."""
    result = ""
    for item in items:
        result += str(item)
    return result

def fast_concat(items):
    """Fast string concatenation with join."""
    return "".join(str(item) for item in items)

def faster_concat(items):
    """Even faster with list."""
    parts = [str(item) for item in items]
    return "".join(parts)

items = list(range(10000))

Benchmark

Benchmark

slow = timeit.timeit(lambda: slow_concat(items), number=100) fast = timeit.timeit(lambda: fast_concat(items), number=100) faster = timeit.timeit(lambda: faster_concat(items), number=100)
print(f"Concatenation (+): {slow:.4f}s") print(f"Join (generator): {fast:.4f}s") print(f"Join (list): {faster:.4f}s")
undefined
slow = timeit.timeit(lambda: slow_concat(items), number=100) fast = timeit.timeit(lambda: fast_concat(items), number=100) faster = timeit.timeit(lambda: faster_concat(items), number=100)
print(f"Concatenation (+): {slow:.4f}s") print(f"Join (generator): {fast:.4f}s") print(f"Join (list): {faster:.4f}s")
undefined

Pattern 8: Dictionary Lookups vs List Searches

模式8:字典查找 vs 列表搜索

python
import timeit
python
import timeit

Create test data

Create test data

size = 10000 items = list(range(size)) lookup_dict = {i: i for i in range(size)}
def list_search(items, target): """O(n) search in list.""" return target in items
def dict_search(lookup_dict, target): """O(1) search in dict.""" return target in lookup_dict
target = size - 1 # Worst case for list
size = 10000 items = list(range(size)) lookup_dict = {i: i for i in range(size)}
def list_search(items, target): """O(n) search in list.""" return target in items
def dict_search(lookup_dict, target): """O(1) search in dict.""" return target in lookup_dict
target = size - 1 # Worst case for list

Benchmark

Benchmark

list_time = timeit.timeit( lambda: list_search(items, target), number=1000 ) dict_time = timeit.timeit( lambda: dict_search(lookup_dict, target), number=1000 )
print(f"List search: {list_time:.6f}s") print(f"Dict search: {dict_time:.6f}s") print(f"Speedup: {list_time/dict_time:.0f}x")
undefined
list_time = timeit.timeit( lambda: list_search(items, target), number=1000 ) dict_time = timeit.timeit( lambda: dict_search(lookup_dict, target), number=1000 )
print(f"List search: {list_time:.6f}s") print(f"Dict search: {dict_time:.6f}s") print(f"Speedup: {list_time/dict_time:.0f}x")
undefined

Pattern 9: Local Variable Access

模式9:局部变量访问

python
import timeit
python
import timeit

Global variable (slow)

Global variable (slow)

GLOBAL_VALUE = 100
def use_global(): """Access global variable.""" total = 0 for i in range(10000): total += GLOBAL_VALUE return total
def use_local(): """Use local variable.""" local_value = 100 total = 0 for i in range(10000): total += local_value return total
GLOBAL_VALUE = 100
def use_global(): """Access global variable.""" total = 0 for i in range(10000): total += GLOBAL_VALUE return total
def use_local(): """Use local variable.""" local_value = 100 total = 0 for i in range(10000): total += local_value return total

Local is faster

Local is faster

global_time = timeit.timeit(use_global, number=1000) local_time = timeit.timeit(use_local, number=1000)
print(f"Global access: {global_time:.4f}s") print(f"Local access: {local_time:.4f}s") print(f"Speedup: {global_time/local_time:.2f}x")
undefined
global_time = timeit.timeit(use_global, number=1000) local_time = timeit.timeit(use_local, number=1000)
print(f"Global access: {global_time:.4f}s") print(f"Local access: {local_time:.4f}s") print(f"Speedup: {global_time/local_time:.2f}x")
undefined

Pattern 10: Function Call Overhead

模式10:函数调用开销

python
import timeit

def calculate_inline():
    """Inline calculation."""
    total = 0
    for i in range(10000):
        total += i * 2 + 1
    return total

def helper_function(x):
    """Helper function."""
    return x * 2 + 1

def calculate_with_function():
    """Calculation with function calls."""
    total = 0
    for i in range(10000):
        total += helper_function(i)
    return total
python
import timeit

def calculate_inline():
    """Inline calculation."""
    total = 0
    for i in range(10000):
        total += i * 2 + 1
    return total

def helper_function(x):
    """Helper function."""
    return x * 2 + 1

def calculate_with_function():
    """Calculation with function calls."""
    total = 0
    for i in range(10000):
        total += helper_function(i)
    return total

Inline is faster due to no call overhead

Inline is faster due to no call overhead

inline_time = timeit.timeit(calculate_inline, number=1000) function_time = timeit.timeit(calculate_with_function, number=1000)
print(f"Inline: {inline_time:.4f}s") print(f"Function calls: {function_time:.4f}s")
undefined
inline_time = timeit.timeit(calculate_inline, number=1000) function_time = timeit.timeit(calculate_with_function, number=1000)
print(f"Inline: {inline_time:.4f}s") print(f"Function calls: {function_time:.4f}s")
undefined

Advanced Optimization

高级优化

Pattern 11: NumPy for Numerical Operations

模式11:使用NumPy进行数值运算

python
import timeit
import numpy as np

def python_sum(n):
    """Sum using pure Python."""
    return sum(range(n))

def numpy_sum(n):
    """Sum using NumPy."""
    return np.arange(n).sum()

n = 1000000

python_time = timeit.timeit(lambda: python_sum(n), number=100)
numpy_time = timeit.timeit(lambda: numpy_sum(n), number=100)

print(f"Python: {python_time:.4f}s")
print(f"NumPy: {numpy_time:.4f}s")
print(f"Speedup: {python_time/numpy_time:.2f}x")
python
import timeit
import numpy as np

def python_sum(n):
    """Sum using pure Python."""
    return sum(range(n))

def numpy_sum(n):
    """Sum using NumPy."""
    return np.arange(n).sum()

n = 1000000

python_time = timeit.timeit(lambda: python_sum(n), number=100)
numpy_time = timeit.timeit(lambda: numpy_sum(n), number=100)

print(f"Python: {python_time:.4f}s")
print(f"NumPy: {numpy_time:.4f}s")
print(f"Speedup: {python_time/numpy_time:.2f}x")

Vectorized operations

Vectorized operations

def python_multiply(): """Element-wise multiplication in Python.""" a = list(range(100000)) b = list(range(100000)) return [x * y for x, y in zip(a, b)]
def numpy_multiply(): """Vectorized multiplication in NumPy.""" a = np.arange(100000) b = np.arange(100000) return a * b
py_time = timeit.timeit(python_multiply, number=100) np_time = timeit.timeit(numpy_multiply, number=100)
print(f"\nPython multiply: {py_time:.4f}s") print(f"NumPy multiply: {np_time:.4f}s") print(f"Speedup: {py_time/np_time:.2f}x")
undefined
def python_multiply(): """Element-wise multiplication in Python.""" a = list(range(100000)) b = list(range(100000)) return [x * y for x, y in zip(a, b)]
def numpy_multiply(): """Vectorized multiplication in NumPy.""" a = np.arange(100000) b = np.arange(100000) return a * b
py_time = timeit.timeit(python_multiply, number=100) np_time = timeit.timeit(numpy_multiply, number=100)
print(f"\nPython multiply: {py_time:.4f}s") print(f"NumPy multiply: {np_time:.4f}s") print(f"Speedup: {py_time/np_time:.2f}x")
undefined

Pattern 12: Caching with functools.lru_cache

模式12:使用functools.lru_cache进行缓存

python
from functools import lru_cache
import timeit

def fibonacci_slow(n):
    """Recursive fibonacci without caching."""
    if n < 2:
        return n
    return fibonacci_slow(n-1) + fibonacci_slow(n-2)

@lru_cache(maxsize=None)
def fibonacci_fast(n):
    """Recursive fibonacci with caching."""
    if n < 2:
        return n
    return fibonacci_fast(n-1) + fibonacci_fast(n-2)
python
from functools import lru_cache
import timeit

def fibonacci_slow(n):
    """Recursive fibonacci without caching."""
    if n < 2:
        return n
    return fibonacci_slow(n-1) + fibonacci_slow(n-2)

@lru_cache(maxsize=None)
def fibonacci_fast(n):
    """Recursive fibonacci with caching."""
    if n < 2:
        return n
    return fibonacci_fast(n-1) + fibonacci_fast(n-2)

Massive speedup for recursive algorithms

Massive speedup for recursive algorithms

n = 30
slow_time = timeit.timeit(lambda: fibonacci_slow(n), number=1) fast_time = timeit.timeit(lambda: fibonacci_fast(n), number=1000)
print(f"Without cache (1 run): {slow_time:.4f}s") print(f"With cache (1000 runs): {fast_time:.4f}s")
n = 30
slow_time = timeit.timeit(lambda: fibonacci_slow(n), number=1) fast_time = timeit.timeit(lambda: fibonacci_fast(n), number=1000)
print(f"Without cache (1 run): {slow_time:.4f}s") print(f"With cache (1000 runs): {fast_time:.4f}s")

Cache info

Cache info

print(f"Cache info: {fibonacci_fast.cache_info()}")
undefined
print(f"Cache info: {fibonacci_fast.cache_info()}")
undefined

Pattern 13: Using slots for Memory

模式13:使用__slots__优化内存

python
import sys

class RegularClass:
    """Regular class with __dict__."""
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class SlottedClass:
    """Class with __slots__ for memory efficiency."""
    __slots__ = ['x', 'y', 'z']

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
python
import sys

class RegularClass:
    """Regular class with __dict__."""
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class SlottedClass:
    """Class with __slots__ for memory efficiency."""
    __slots__ = ['x', 'y', 'z']

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

Memory comparison

Memory comparison

regular = RegularClass(1, 2, 3) slotted = SlottedClass(1, 2, 3)
print(f"Regular class size: {sys.getsizeof(regular)} bytes") print(f"Slotted class size: {sys.getsizeof(slotted)} bytes")
regular = RegularClass(1, 2, 3) slotted = SlottedClass(1, 2, 3)
print(f"Regular class size: {sys.getsizeof(regular)} bytes") print(f"Slotted class size: {sys.getsizeof(slotted)} bytes")

Significant savings with many instances

Significant savings with many instances

regular_objects = [RegularClass(i, i+1, i+2) for i in range(10000)] slotted_objects = [SlottedClass(i, i+1, i+2) for i in range(10000)]
print(f"\nMemory for 10000 regular objects: ~{sys.getsizeof(regular) * 10000} bytes") print(f"Memory for 10000 slotted objects: ~{sys.getsizeof(slotted) * 10000} bytes")
undefined
regular_objects = [RegularClass(i, i+1, i+2) for i in range(10000)] slotted_objects = [SlottedClass(i, i+1, i+2) for i in range(10000)]
print(f"\nMemory for 10000 regular objects: ~{sys.getsizeof(regular) * 10000} bytes") print(f"Memory for 10000 slotted objects: ~{sys.getsizeof(slotted) * 10000} bytes")
undefined

Pattern 14: Multiprocessing for CPU-Bound Tasks

模式14:多进程处理CPU密集型任务

python
import multiprocessing as mp
import time

def cpu_intensive_task(n):
    """CPU-intensive calculation."""
    return sum(i**2 for i in range(n))

def sequential_processing():
    """Process tasks sequentially."""
    start = time.time()
    results = [cpu_intensive_task(1000000) for _ in range(4)]
    elapsed = time.time() - start
    return elapsed, results

def parallel_processing():
    """Process tasks in parallel."""
    start = time.time()
    with mp.Pool(processes=4) as pool:
        results = pool.map(cpu_intensive_task, [1000000] * 4)
    elapsed = time.time() - start
    return elapsed, results

if __name__ == "__main__":
    seq_time, seq_results = sequential_processing()
    par_time, par_results = parallel_processing()

    print(f"Sequential: {seq_time:.2f}s")
    print(f"Parallel: {par_time:.2f}s")
    print(f"Speedup: {seq_time/par_time:.2f}x")
python
import multiprocessing as mp
import time

def cpu_intensive_task(n):
    """CPU-intensive calculation."""
    return sum(i**2 for i in range(n))

def sequential_processing():
    """Process tasks sequentially."""
    start = time.time()
    results = [cpu_intensive_task(1000000) for _ in range(4)]
    elapsed = time.time() - start
    return elapsed, results

def parallel_processing():
    """Process tasks in parallel."""
    start = time.time()
    with mp.Pool(processes=4) as pool:
        results = pool.map(cpu_intensive_task, [1000000] * 4)
    elapsed = time.time() - start
    return elapsed, results

if __name__ == "__main__":
    seq_time, seq_results = sequential_processing()
    par_time, par_results = parallel_processing()

    print(f"Sequential: {seq_time:.2f}s")
    print(f"Parallel: {par_time:.2f}s")
    print(f"Speedup: {seq_time/par_time:.2f}x")

Pattern 15: Async I/O for I/O-Bound Tasks

模式15:异步I/O处理I/O密集型任务

python
import asyncio
import aiohttp
import time
import requests

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

def synchronous_requests():
    """Synchronous HTTP requests."""
    start = time.time()
    results = []
    for url in urls:
        response = requests.get(url)
        results.append(response.status_code)
    elapsed = time.time() - start
    return elapsed, results

async def async_fetch(session, url):
    """Async HTTP request."""
    async with session.get(url) as response:
        return response.status

async def asynchronous_requests():
    """Asynchronous HTTP requests."""
    start = time.time()
    async with aiohttp.ClientSession() as session:
        tasks = [async_fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    elapsed = time.time() - start
    return elapsed, results
python
import asyncio
import aiohttp
import time
import requests

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

def synchronous_requests():
    """Synchronous HTTP requests."""
    start = time.time()
    results = []
    for url in urls:
        response = requests.get(url)
        results.append(response.status_code)
    elapsed = time.time() - start
    return elapsed, results

async def async_fetch(session, url):
    """Async HTTP request."""
    async with session.get(url) as response:
        return response.status

async def asynchronous_requests():
    """Asynchronous HTTP requests."""
    start = time.time()
    async with aiohttp.ClientSession() as session:
        tasks = [async_fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    elapsed = time.time() - start
    return elapsed, results

Async is much faster for I/O-bound work

Async is much faster for I/O-bound work

sync_time, sync_results = synchronous_requests() async_time, async_results = asyncio.run(asynchronous_requests())
print(f"Synchronous: {sync_time:.2f}s") print(f"Asynchronous: {async_time:.2f}s") print(f"Speedup: {sync_time/async_time:.2f}x")
undefined
sync_time, sync_results = synchronous_requests() async_time, async_results = asyncio.run(asynchronous_requests())
print(f"Synchronous: {sync_time:.2f}s") print(f"Asynchronous: {async_time:.2f}s") print(f"Speedup: {sync_time/async_time:.2f}x")
undefined

Database Optimization

数据库优化

Pattern 16: Batch Database Operations

模式16:批量数据库操作

python
import sqlite3
import time

def create_db():
    """Create test database."""
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
    return conn

def slow_inserts(conn, count):
    """Insert records one at a time."""
    start = time.time()
    cursor = conn.cursor()
    for i in range(count):
        cursor.execute("INSERT INTO users (name) VALUES (?)", (f"User {i}",))
        conn.commit()  # Commit each insert
    elapsed = time.time() - start
    return elapsed

def fast_inserts(conn, count):
    """Batch insert with single commit."""
    start = time.time()
    cursor = conn.cursor()
    data = [(f"User {i}",) for i in range(count)]
    cursor.executemany("INSERT INTO users (name) VALUES (?)", data)
    conn.commit()  # Single commit
    elapsed = time.time() - start
    return elapsed
python
import sqlite3
import time

def create_db():
    """Create test database."""
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
    return conn

def slow_inserts(conn, count):
    """Insert records one at a time."""
    start = time.time()
    cursor = conn.cursor()
    for i in range(count):
        cursor.execute("INSERT INTO users (name) VALUES (?)", (f"User {i}",))
        conn.commit()  # Commit each insert
    elapsed = time.time() - start
    return elapsed

def fast_inserts(conn, count):
    """Batch insert with single commit."""
    start = time.time()
    cursor = conn.cursor()
    data = [(f"User {i}",) for i in range(count)]
    cursor.executemany("INSERT INTO users (name) VALUES (?)", data)
    conn.commit()  # Single commit
    elapsed = time.time() - start
    return elapsed

Benchmark

Benchmark

conn1 = create_db() slow_time = slow_inserts(conn1, 1000)
conn2 = create_db() fast_time = fast_inserts(conn2, 1000)
print(f"Individual inserts: {slow_time:.4f}s") print(f"Batch insert: {fast_time:.4f}s") print(f"Speedup: {slow_time/fast_time:.2f}x")
undefined
conn1 = create_db() slow_time = slow_inserts(conn1, 1000)
conn2 = create_db() fast_time = fast_inserts(conn2, 1000)
print(f"Individual inserts: {slow_time:.4f}s") print(f"Batch insert: {fast_time:.4f}s") print(f"Speedup: {slow_time/fast_time:.2f}x")
undefined

Pattern 17: Query Optimization

模式17:查询优化

python
undefined
python
undefined

Use indexes for frequently queried columns

Use indexes for frequently queried columns

""" -- Slow: No index SELECT * FROM users WHERE email = 'user@example.com';
-- Fast: With index CREATE INDEX idx_users_email ON users(email); SELECT * FROM users WHERE email = 'user@example.com'; """
""" -- Slow: No index SELECT * FROM users WHERE email = 'user@example.com';
-- Fast: With index CREATE INDEX idx_users_email ON users(email); SELECT * FROM users WHERE email = 'user@example.com'; """

Use query planning

Use query planning

import sqlite3
conn = sqlite3.connect("example.db") cursor = conn.cursor()
import sqlite3
conn = sqlite3.connect("example.db") cursor = conn.cursor()

Analyze query performance

Analyze query performance

cursor.execute("EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?", ("test@example.com",)) print(cursor.fetchall())
cursor.execute("EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?", ("test@example.com",)) print(cursor.fetchall())

Use SELECT only needed columns

Use SELECT only needed columns

Slow: SELECT *

Slow: SELECT *

Fast: SELECT id, name

Fast: SELECT id, name

undefined
undefined

Memory Optimization

内存优化

Pattern 18: Detecting Memory Leaks

模式18:检测内存泄漏

python
import tracemalloc
import gc

def memory_leak_example():
    """Example that leaks memory."""
    leaked_objects = []

    for i in range(100000):
        # Objects added but never removed
        leaked_objects.append([i] * 100)

    # In real code, this would be an unintended reference

def track_memory_usage():
    """Track memory allocations."""
    tracemalloc.start()

    # Take snapshot before
    snapshot1 = tracemalloc.take_snapshot()

    # Run code
    memory_leak_example()

    # Take snapshot after
    snapshot2 = tracemalloc.take_snapshot()

    # Compare
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("Top 10 memory allocations:")
    for stat in top_stats[:10]:
        print(stat)

    tracemalloc.stop()
python
import tracemalloc
import gc

def memory_leak_example():
    """Example that leaks memory."""
    leaked_objects = []

    for i in range(100000):
        # Objects added but never removed
        leaked_objects.append([i] * 100)

    # In real code, this would be an unintended reference

def track_memory_usage():
    """Track memory allocations."""
    tracemalloc.start()

    # Take snapshot before
    snapshot1 = tracemalloc.take_snapshot()

    # Run code
    memory_leak_example()

    # Take snapshot after
    snapshot2 = tracemalloc.take_snapshot()

    # Compare
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("Top 10 memory allocations:")
    for stat in top_stats[:10]:
        print(stat)

    tracemalloc.stop()

Monitor memory

Monitor memory

track_memory_usage()
track_memory_usage()

Force garbage collection

Force garbage collection

gc.collect()
undefined
gc.collect()
undefined

Pattern 19: Iterators vs Lists

模式19:迭代器 vs 列表

python
import sys

def process_file_list(filename):
    """Load entire file into memory."""
    with open(filename) as f:
        lines = f.readlines()  # Loads all lines
        return sum(1 for line in lines if line.strip())

def process_file_iterator(filename):
    """Process file line by line."""
    with open(filename) as f:
        return sum(1 for line in f if line.strip())
python
import sys

def process_file_list(filename):
    """Load entire file into memory."""
    with open(filename) as f:
        lines = f.readlines()  # Loads all lines
        return sum(1 for line in lines if line.strip())

def process_file_iterator(filename):
    """Process file line by line."""
    with open(filename) as f:
        return sum(1 for line in f if line.strip())

Iterator uses constant memory

Iterator uses constant memory

List loads entire file into memory

List loads entire file into memory

undefined
undefined

Pattern 20: Weakref for Caches

模式20:使用Weakref实现缓存

python
import weakref

class CachedResource:
    """Resource that can be garbage collected."""
    def __init__(self, data):
        self.data = data
python
import weakref

class CachedResource:
    """Resource that can be garbage collected."""
    def __init__(self, data):
        self.data = data

Regular cache prevents garbage collection

Regular cache prevents garbage collection

regular_cache = {}
def get_resource_regular(key): """Get resource from regular cache.""" if key not in regular_cache: regular_cache[key] = CachedResource(f"Data for {key}") return regular_cache[key]
regular_cache = {}
def get_resource_regular(key): """Get resource from regular cache.""" if key not in regular_cache: regular_cache[key] = CachedResource(f"Data for {key}") return regular_cache[key]

Weak reference cache allows garbage collection

Weak reference cache allows garbage collection

weak_cache = weakref.WeakValueDictionary()
def get_resource_weak(key): """Get resource from weak cache.""" resource = weak_cache.get(key) if resource is None: resource = CachedResource(f"Data for {key}") weak_cache[key] = resource return resource
weak_cache = weakref.WeakValueDictionary()
def get_resource_weak(key): """Get resource from weak cache.""" resource = weak_cache.get(key) if resource is None: resource = CachedResource(f"Data for {key}") weak_cache[key] = resource return resource

When no strong references exist, objects can be GC'd

When no strong references exist, objects can be GC'd

undefined
undefined

Benchmarking Tools

基准测试工具

Custom Benchmark Decorator

自定义基准测试装饰器

python
import time
from functools import wraps

def benchmark(func):
    """Decorator to benchmark function execution."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.6f} seconds")
        return result
    return wrapper

@benchmark
def slow_function():
    """Function to benchmark."""
    time.sleep(0.5)
    return sum(range(1000000))

result = slow_function()
python
import time
from functools import wraps

def benchmark(func):
    """Decorator to benchmark function execution."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.6f} seconds")
        return result
    return wrapper

@benchmark
def slow_function():
    """Function to benchmark."""
    time.sleep(0.5)
    return sum(range(1000000))

result = slow_function()

Performance Testing with pytest-benchmark

使用pytest-benchmark进行性能测试

python
undefined
python
undefined

Install: pip install pytest-benchmark

Install: pip install pytest-benchmark

def test_list_comprehension(benchmark): """Benchmark list comprehension.""" result = benchmark(lambda: [i**2 for i in range(10000)]) assert len(result) == 10000
def test_map_function(benchmark): """Benchmark map function.""" result = benchmark(lambda: list(map(lambda x: x**2, range(10000)))) assert len(result) == 10000
def test_list_comprehension(benchmark): """Benchmark list comprehension.""" result = benchmark(lambda: [i**2 for i in range(10000)]) assert len(result) == 10000
def test_map_function(benchmark): """Benchmark map function.""" result = benchmark(lambda: list(map(lambda x: x**2, range(10000)))) assert len(result) == 10000

Run with: pytest test_performance.py --benchmark-compare

Run with: pytest test_performance.py --benchmark-compare

undefined
undefined

Best Practices

最佳实践

  1. Profile before optimizing - Measure to find real bottlenecks
  2. Focus on hot paths - Optimize code that runs most frequently
  3. Use appropriate data structures - Dict for lookups, set for membership
  4. Avoid premature optimization - Clarity first, then optimize
  5. Use built-in functions - They're implemented in C
  6. Cache expensive computations - Use lru_cache
  7. Batch I/O operations - Reduce system calls
  8. Use generators for large datasets
  9. Consider NumPy for numerical operations
  10. Profile production code - Use py-spy for live systems
  1. 先分析再优化 - 通过测量找到真正的瓶颈
  2. 聚焦热点路径 - 优化运行最频繁的代码
  3. 使用合适的数据结构 - 字典用于查找,集合用于成员判断
  4. 避免过早优化 - 先保证代码清晰,再进行优化
  5. 使用内置函数 - 它们由C实现,性能更高
  6. 缓存昂贵的计算 - 使用lru_cache
  7. 批量处理I/O操作 - 减少系统调用
  8. 对大数据集使用生成器
  9. 数值运算考虑使用NumPy
  10. 分析生产环境代码 - 使用py-spy分析实时系统

Common Pitfalls

常见陷阱

  • Optimizing without profiling
  • Using global variables unnecessarily
  • Not using appropriate data structures
  • Creating unnecessary copies of data
  • Not using connection pooling for databases
  • Ignoring algorithmic complexity
  • Over-optimizing rare code paths
  • Not considering memory usage
  • 未进行分析就直接优化
  • 不必要地使用全局变量
  • 未使用合适的数据结构
  • 创建不必要的数据副本
  • 未对数据库使用连接池
  • 忽略算法复杂度
  • 过度优化不常用的代码路径
  • 未考虑内存使用情况

Resources

资源

  • cProfile: Built-in CPU profiler
  • memory_profiler: Memory usage profiling
  • line_profiler: Line-by-line profiling
  • py-spy: Sampling profiler for production
  • NumPy: High-performance numerical computing
  • Cython: Compile Python to C
  • PyPy: Alternative Python interpreter with JIT
  • cProfile:内置CPU分析器
  • memory_profiler:内存使用分析工具
  • line_profiler:逐行分析工具
  • py-spy:生产环境采样分析器
  • NumPy:高性能数值计算库
  • Cython:将Python编译为C语言
  • PyPy:带有JIT的替代Python解释器

Performance Checklist

性能检查清单

  • Profiled code to identify bottlenecks
  • Used appropriate data structures
  • Implemented caching where beneficial
  • Optimized database queries
  • Used generators for large datasets
  • Considered multiprocessing for CPU-bound tasks
  • Used async I/O for I/O-bound tasks
  • Minimized function call overhead in hot loops
  • Checked for memory leaks
  • Benchmarked before and after optimization
  • 已分析代码以识别瓶颈
  • 使用了合适的数据结构
  • 在有益的场景实现了缓存
  • 优化了数据库查询
  • 对大数据集使用了生成器
  • 考虑使用多进程处理CPU密集型任务
  • 使用异步I/O处理I/O密集型任务
  • 减少了热点循环中的函数调用开销
  • 检查了内存泄漏
  • 优化前后都进行了基准测试