portfolio-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Portfolio Optimization

投资组合优化

Overview

概述

This skill provides guidance for implementing high-performance portfolio optimization algorithms using Python C extensions. It covers the workflow for creating C extensions that interface with NumPy arrays, proper verification strategies, and common pitfalls to avoid when optimizing numerical computations.
本技能提供了使用Python C扩展实现高性能投资组合优化算法的指南。涵盖了创建与NumPy数组交互的C扩展的工作流程、正确的验证策略,以及优化数值计算时需避免的常见陷阱。

When to Apply This Skill

适用场景

Apply this skill when:
  • Implementing portfolio risk calculations (variance, volatility, Sharpe ratio)
  • Optimizing matrix-vector operations for large asset portfolios
  • Creating C extensions for Python numerical code
  • Performance requirements specify speedup ratios (e.g., >= 1.2x)
  • Working with covariance matrices and portfolio weights
在以下场景中应用本技能:
  • 实现投资组合风险计算(方差、波动率、夏普比率)
  • 针对大型资产投资组合优化矩阵-向量运算
  • 为Python数值代码创建C扩展
  • 性能要求指定加速比率(例如≥1.2倍)
  • 处理协方差矩阵和投资组合权重

Recommended Workflow

推荐工作流程

Phase 1: Codebase Understanding

阶段1:代码库理解

Before writing any code:
  1. Read all relevant source files completely - Understand the baseline implementation, data structures, and expected interfaces
  2. Identify the mathematical operations - Common operations include:
    • Matrix-vector multiplication (covariance matrix times weights)
    • Dot products (weights times returns)
    • Square root operations (for volatility from variance)
  3. Understand the test suite - Know what correctness tolerances are expected (e.g., 1e-10) and what performance benchmarks must be met
  4. Document the input/output contracts - Array shapes, data types (typically float64), and return value specifications
在编写任何代码之前:
  1. 完整阅读所有相关源文件 - 理解基准实现、数据结构和预期接口
  2. 识别数学运算 - 常见运算包括:
    • 矩阵-向量乘法(协方差矩阵乘以权重)
    • 点积(权重乘以收益)
    • 平方根运算(从方差计算波动率)
  3. 理解测试套件 - 明确预期的正确性容差(例如1e-10)以及必须满足的性能基准
  4. 记录输入/输出约定 - 数组形状、数据类型(通常为float64)和返回值规范

Phase 2: Implementation Planning

阶段2:实现规划

Consider these factors before implementation:
  1. Why C provides speedup:
    • Eliminates Python interpreter overhead
    • Enables direct memory access without bounds checking
    • Allows compiler optimizations (vectorization, loop unrolling)
    • Reduces temporary array allocations
  2. Design decisions to make:
    • Whether to use NumPy C API for zero-copy array access
    • Memory layout assumptions (C-contiguous vs Fortran-contiguous)
    • Error handling strategy for type mismatches and dimension errors
  3. Potential algorithmic optimizations:
    • Cache-friendly memory access patterns (row-major iteration for C arrays)
    • SIMD vectorization opportunities
    • Minimizing Python-to-C data conversion overhead
在实现前考虑以下因素:
  1. C语言实现加速的原因
    • 消除Python解释器开销
    • 支持无边界检查的直接内存访问
    • 允许编译器优化(向量化、循环展开)
    • 减少临时数组分配
  2. 需要做出的设计决策
    • 是否使用NumPy C API实现零拷贝数组访问
    • 内存布局假设(C连续 vs Fortran连续)
    • 类型不匹配和维度错误的错误处理策略
  3. 潜在的算法优化
    • 缓存友好的内存访问模式(C数组的行优先迭代)
    • SIMD向量化机会
    • 最小化Python到C的数据转换开销

Phase 3: C Extension Implementation

阶段3:C扩展实现

When implementing the C extension:
  1. Include proper headers:
    • Python.h
      (must be first)
    • numpy/arrayobject.h
      for NumPy array access
  2. Initialize NumPy in the module init function:
    • Call
      import_array()
      to initialize NumPy C API
  3. Use NumPy C API for array access:
    • PyArray_DATA()
      for getting data pointer
    • PyArray_DIM()
      for dimensions
    • PyArray_STRIDE()
      for memory strides
    • Check
      PyArray_IS_C_CONTIGUOUS()
      for memory layout
  4. Implement robust error handling:
    • Validate array dimensions match expected shapes
    • Check data types (expect
      NPY_FLOAT64
      for double precision)
    • Handle non-contiguous arrays (either reject or handle strides)
    • Set appropriate Python exceptions on error
实现C扩展时:
  1. 包含正确的头文件
    • Python.h
      (必须放在首位)
    • numpy/arrayobject.h
      用于NumPy数组访问
  2. 在模块初始化函数中初始化NumPy
    • 调用
      import_array()
      初始化NumPy C API
  3. 使用NumPy C API访问数组
    • PyArray_DATA()
      获取数据指针
    • PyArray_DIM()
      获取维度
    • PyArray_STRIDE()
      获取内存步长
    • 检查
      PyArray_IS_C_CONTIGUOUS()
      确认内存布局
  4. 实现健壮的错误处理
    • 验证数组维度是否符合预期形状
    • 检查数据类型(期望
      NPY_FLOAT64
      双精度)
    • 处理非连续数组(拒绝或处理步长)
    • 出错时设置合适的Python异常

Phase 4: Python Wrapper Implementation

阶段4:Python封装器实现

Create a Python module that:
  1. Imports the C extension module
  2. Provides a clean interface matching the baseline API
  3. Handles any necessary array preparation (ensuring contiguity)
  4. Documents the interface clearly
创建一个Python模块,该模块需:
  1. 导入C扩展模块
  2. 提供与基准API匹配的简洁接口
  3. 处理必要的数组准备工作(确保连续性)
  4. 清晰记录接口

Phase 5: Verification Strategy

阶段5:验证策略

Critical: Verify every change completely
  1. After editing files, re-read them - Confirm edits were applied correctly, especially for multi-line changes
  2. Test incrementally:
    • Build the C extension first and verify it compiles
    • Test individual functions before running full benchmarks
    • Use small test cases for correctness verification before scaling up
  3. Correctness verification:
    • Compare outputs against baseline implementation
    • Use appropriate numerical tolerances (typically 1e-10 for double precision)
    • Test with known inputs where expected outputs can be calculated manually
  4. Performance verification:
    • Run benchmarks with representative data sizes
    • Verify speedup meets requirements across different portfolio sizes
    • Test edge cases: small portfolios (n=1, n=10), large portfolios (n=5000+)
关键:完整验证每一处变更
  1. 编辑文件后重新阅读 - 确认编辑已正确应用,尤其是多行变更
  2. 增量测试
    • 先构建C扩展并验证编译通过
    • 在运行完整基准测试前测试单个函数
    • 在扩容前使用小型测试用例验证正确性
  3. 正确性验证
    • 将输出与基准实现进行比较
    • 使用合适的数值容差(双精度通常为1e-10)
    • 使用可手动计算预期输出的已知输入进行测试
  4. 性能验证
    • 使用具有代表性的数据大小运行基准测试
    • 验证不同投资组合规模下加速比均满足要求
    • 测试边缘案例:小型投资组合(n=1、n=10)、大型投资组合(n=5000+)

Edge Cases to Handle

需要处理的边缘案例

Ensure the implementation addresses:
  1. Empty portfolios (n=0) - Return appropriate default or error
  2. Single-asset portfolios (n=1) - Degenerate case for covariance
  3. Dimension mismatches - Weights vector length vs covariance matrix dimensions
  4. Invalid inputs:
    • Non-square covariance matrices
    • NaN or infinity values in inputs
    • Negative variance (mathematically invalid)
  5. Memory considerations:
    • Non-contiguous NumPy arrays
    • Memory allocation failures in C code
    • Large portfolios that may stress memory
确保实现能够处理以下情况:
  1. 空投资组合(n=0)- 返回合适的默认值或错误
  2. 单资产投资组合(n=1)- 协方差的退化情况
  3. 维度不匹配 - 权重向量长度与协方差矩阵维度不匹配
  4. 无效输入
    • 非方阵的协方差矩阵
    • 输入中包含NaN或无穷值
    • 负方差(数学上无效)
  5. 内存考虑
    • 非连续的NumPy数组
    • C代码中的内存分配失败
    • 可能占用大量内存的大型投资组合

Common Pitfalls to Avoid

需避免的常见陷阱

Code Completeness

代码完整性

  • Never truncate code in edit operations - always provide complete implementations
  • Verify file contents after editing to confirm changes applied correctly
  • Document all design choices explicitly
  • 编辑操作中切勿截断代码 - 始终提供完整实现
  • 编辑后验证文件内容,确认变更已正确应用
  • 明确记录所有设计选择

Testing Approach

测试方法

  • Avoid going directly from implementation to full benchmark testing
  • Test each function individually before integration testing
  • Do not rely solely on "tests pass" for validation - understand why they pass
  • 避免直接从实现跳到完整基准测试
  • 在集成测试前单独测试每个函数
  • 不要仅依赖“测试通过”进行验证 - 理解测试通过的原因

C Extension Specific

C扩展特定陷阱

  • Always check NumPy array types before accessing data
  • Handle reference counting properly to avoid memory leaks
  • Initialize NumPy API with
    import_array()
    in module init
  • Use
    PyErr_SetString()
    to set exceptions on errors
  • 在访问数据前始终检查NumPy数组类型
  • 正确处理引用计数以避免内存泄漏
  • 在模块初始化中使用
    import_array()
    初始化NumPy API
  • 出错时使用
    PyErr_SetString()
    设置异常

Performance Validation

性能验证

  • Verify speedup is consistent across different input sizes
  • Profile if further optimizations might be needed
  • Consider the overhead of Python-to-C transitions for small inputs
  • 验证不同输入大小下加速比的一致性
  • 若需要进一步优化则进行性能分析
  • 考虑小输入时Python到C转换的开销

Build and Test Commands

构建与测试命令

Typical workflow commands:
bash
undefined
典型工作流程命令:
bash
undefined

Build the C extension

Build the C extension

python setup.py build_ext --inplace
python setup.py build_ext --inplace

Run correctness tests

Run correctness tests

python -c "from portfolio_optimized import *; # test calls"
python -c "from portfolio_optimized import *; # test calls"

Run benchmark

Run benchmark

python benchmark.py
python benchmark.py

Run full test suite

Run full test suite

pytest test_portfolio.py -v
undefined
pytest test_portfolio.py -v
undefined

Verification Checklist

验证检查清单

Before considering the task complete:
  • All source files read and understood
  • C extension compiles without warnings
  • Individual functions tested for correctness
  • Numerical results match baseline within tolerance
  • Performance meets speedup requirements
  • Edge cases explicitly tested or handled
  • Error handling implemented for invalid inputs
  • File contents verified after all edits
  • No memory leaks in C code (proper reference counting)
在认为任务完成前,确认以下事项:
  • 已阅读并理解所有源文件
  • C扩展编译无警告
  • 单个函数已通过正确性测试
  • 数值结果与基准实现的差异在容差范围内
  • 性能满足加速比要求
  • 边缘案例已明确测试或处理
  • 已为无效输入实现错误处理
  • 所有编辑后已验证文件内容
  • C代码中无内存泄漏(引用计数正确)