portfolio-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePortfolio Optimization
投资组合优化
Overview
概述
This skill provides guidance for implementing high-performance portfolio optimization algorithms using Python C extensions. It covers the workflow for creating C extensions that interface with NumPy arrays, proper verification strategies, and common pitfalls to avoid when optimizing numerical computations.
本技能提供了使用Python C扩展实现高性能投资组合优化算法的指南。涵盖了创建与NumPy数组交互的C扩展的工作流程、正确的验证策略,以及优化数值计算时需避免的常见陷阱。
When to Apply This Skill
适用场景
Apply this skill when:
- Implementing portfolio risk calculations (variance, volatility, Sharpe ratio)
- Optimizing matrix-vector operations for large asset portfolios
- Creating C extensions for Python numerical code
- Performance requirements specify speedup ratios (e.g., >= 1.2x)
- Working with covariance matrices and portfolio weights
在以下场景中应用本技能:
- 实现投资组合风险计算(方差、波动率、夏普比率)
- 针对大型资产投资组合优化矩阵-向量运算
- 为Python数值代码创建C扩展
- 性能要求指定加速比率(例如≥1.2倍)
- 处理协方差矩阵和投资组合权重
Recommended Workflow
推荐工作流程
Phase 1: Codebase Understanding
阶段1:代码库理解
Before writing any code:
- Read all relevant source files completely - Understand the baseline implementation, data structures, and expected interfaces
- Identify the mathematical operations - Common operations include:
- Matrix-vector multiplication (covariance matrix times weights)
- Dot products (weights times returns)
- Square root operations (for volatility from variance)
- Understand the test suite - Know what correctness tolerances are expected (e.g., 1e-10) and what performance benchmarks must be met
- Document the input/output contracts - Array shapes, data types (typically float64), and return value specifications
在编写任何代码之前:
- 完整阅读所有相关源文件 - 理解基准实现、数据结构和预期接口
- 识别数学运算 - 常见运算包括:
- 矩阵-向量乘法(协方差矩阵乘以权重)
- 点积(权重乘以收益)
- 平方根运算(从方差计算波动率)
- 理解测试套件 - 明确预期的正确性容差(例如1e-10)以及必须满足的性能基准
- 记录输入/输出约定 - 数组形状、数据类型(通常为float64)和返回值规范
Phase 2: Implementation Planning
阶段2:实现规划
Consider these factors before implementation:
-
Why C provides speedup:
- Eliminates Python interpreter overhead
- Enables direct memory access without bounds checking
- Allows compiler optimizations (vectorization, loop unrolling)
- Reduces temporary array allocations
-
Design decisions to make:
- Whether to use NumPy C API for zero-copy array access
- Memory layout assumptions (C-contiguous vs Fortran-contiguous)
- Error handling strategy for type mismatches and dimension errors
-
Potential algorithmic optimizations:
- Cache-friendly memory access patterns (row-major iteration for C arrays)
- SIMD vectorization opportunities
- Minimizing Python-to-C data conversion overhead
在实现前考虑以下因素:
-
C语言实现加速的原因:
- 消除Python解释器开销
- 支持无边界检查的直接内存访问
- 允许编译器优化(向量化、循环展开)
- 减少临时数组分配
-
需要做出的设计决策:
- 是否使用NumPy C API实现零拷贝数组访问
- 内存布局假设(C连续 vs Fortran连续)
- 类型不匹配和维度错误的错误处理策略
-
潜在的算法优化:
- 缓存友好的内存访问模式(C数组的行优先迭代)
- SIMD向量化机会
- 最小化Python到C的数据转换开销
Phase 3: C Extension Implementation
阶段3:C扩展实现
When implementing the C extension:
-
Include proper headers:
- (must be first)
Python.h - for NumPy array access
numpy/arrayobject.h
-
Initialize NumPy in the module init function:
- Call to initialize NumPy C API
import_array()
- Call
-
Use NumPy C API for array access:
- for getting data pointer
PyArray_DATA() - for dimensions
PyArray_DIM() - for memory strides
PyArray_STRIDE() - Check for memory layout
PyArray_IS_C_CONTIGUOUS()
-
Implement robust error handling:
- Validate array dimensions match expected shapes
- Check data types (expect for double precision)
NPY_FLOAT64 - Handle non-contiguous arrays (either reject or handle strides)
- Set appropriate Python exceptions on error
实现C扩展时:
-
包含正确的头文件:
- (必须放在首位)
Python.h - 用于NumPy数组访问
numpy/arrayobject.h
-
在模块初始化函数中初始化NumPy:
- 调用初始化NumPy C API
import_array()
- 调用
-
使用NumPy C API访问数组:
- 获取数据指针
PyArray_DATA() - 获取维度
PyArray_DIM() - 获取内存步长
PyArray_STRIDE() - 检查确认内存布局
PyArray_IS_C_CONTIGUOUS()
-
实现健壮的错误处理:
- 验证数组维度是否符合预期形状
- 检查数据类型(期望双精度)
NPY_FLOAT64 - 处理非连续数组(拒绝或处理步长)
- 出错时设置合适的Python异常
Phase 4: Python Wrapper Implementation
阶段4:Python封装器实现
Create a Python module that:
- Imports the C extension module
- Provides a clean interface matching the baseline API
- Handles any necessary array preparation (ensuring contiguity)
- Documents the interface clearly
创建一个Python模块,该模块需:
- 导入C扩展模块
- 提供与基准API匹配的简洁接口
- 处理必要的数组准备工作(确保连续性)
- 清晰记录接口
Phase 5: Verification Strategy
阶段5:验证策略
Critical: Verify every change completely
-
After editing files, re-read them - Confirm edits were applied correctly, especially for multi-line changes
-
Test incrementally:
- Build the C extension first and verify it compiles
- Test individual functions before running full benchmarks
- Use small test cases for correctness verification before scaling up
-
Correctness verification:
- Compare outputs against baseline implementation
- Use appropriate numerical tolerances (typically 1e-10 for double precision)
- Test with known inputs where expected outputs can be calculated manually
-
Performance verification:
- Run benchmarks with representative data sizes
- Verify speedup meets requirements across different portfolio sizes
- Test edge cases: small portfolios (n=1, n=10), large portfolios (n=5000+)
关键:完整验证每一处变更
-
编辑文件后重新阅读 - 确认编辑已正确应用,尤其是多行变更
-
增量测试:
- 先构建C扩展并验证编译通过
- 在运行完整基准测试前测试单个函数
- 在扩容前使用小型测试用例验证正确性
-
正确性验证:
- 将输出与基准实现进行比较
- 使用合适的数值容差(双精度通常为1e-10)
- 使用可手动计算预期输出的已知输入进行测试
-
性能验证:
- 使用具有代表性的数据大小运行基准测试
- 验证不同投资组合规模下加速比均满足要求
- 测试边缘案例:小型投资组合(n=1、n=10)、大型投资组合(n=5000+)
Edge Cases to Handle
需要处理的边缘案例
Ensure the implementation addresses:
- Empty portfolios (n=0) - Return appropriate default or error
- Single-asset portfolios (n=1) - Degenerate case for covariance
- Dimension mismatches - Weights vector length vs covariance matrix dimensions
- Invalid inputs:
- Non-square covariance matrices
- NaN or infinity values in inputs
- Negative variance (mathematically invalid)
- Memory considerations:
- Non-contiguous NumPy arrays
- Memory allocation failures in C code
- Large portfolios that may stress memory
确保实现能够处理以下情况:
- 空投资组合(n=0)- 返回合适的默认值或错误
- 单资产投资组合(n=1)- 协方差的退化情况
- 维度不匹配 - 权重向量长度与协方差矩阵维度不匹配
- 无效输入:
- 非方阵的协方差矩阵
- 输入中包含NaN或无穷值
- 负方差(数学上无效)
- 内存考虑:
- 非连续的NumPy数组
- C代码中的内存分配失败
- 可能占用大量内存的大型投资组合
Common Pitfalls to Avoid
需避免的常见陷阱
Code Completeness
代码完整性
- Never truncate code in edit operations - always provide complete implementations
- Verify file contents after editing to confirm changes applied correctly
- Document all design choices explicitly
- 编辑操作中切勿截断代码 - 始终提供完整实现
- 编辑后验证文件内容,确认变更已正确应用
- 明确记录所有设计选择
Testing Approach
测试方法
- Avoid going directly from implementation to full benchmark testing
- Test each function individually before integration testing
- Do not rely solely on "tests pass" for validation - understand why they pass
- 避免直接从实现跳到完整基准测试
- 在集成测试前单独测试每个函数
- 不要仅依赖“测试通过”进行验证 - 理解测试通过的原因
C Extension Specific
C扩展特定陷阱
- Always check NumPy array types before accessing data
- Handle reference counting properly to avoid memory leaks
- Initialize NumPy API with in module init
import_array() - Use to set exceptions on errors
PyErr_SetString()
- 在访问数据前始终检查NumPy数组类型
- 正确处理引用计数以避免内存泄漏
- 在模块初始化中使用初始化NumPy API
import_array() - 出错时使用设置异常
PyErr_SetString()
Performance Validation
性能验证
- Verify speedup is consistent across different input sizes
- Profile if further optimizations might be needed
- Consider the overhead of Python-to-C transitions for small inputs
- 验证不同输入大小下加速比的一致性
- 若需要进一步优化则进行性能分析
- 考虑小输入时Python到C转换的开销
Build and Test Commands
构建与测试命令
Typical workflow commands:
bash
undefined典型工作流程命令:
bash
undefinedBuild the C extension
Build the C extension
python setup.py build_ext --inplace
python setup.py build_ext --inplace
Run correctness tests
Run correctness tests
python -c "from portfolio_optimized import *; # test calls"
python -c "from portfolio_optimized import *; # test calls"
Run benchmark
Run benchmark
python benchmark.py
python benchmark.py
Run full test suite
Run full test suite
pytest test_portfolio.py -v
undefinedpytest test_portfolio.py -v
undefinedVerification Checklist
验证检查清单
Before considering the task complete:
- All source files read and understood
- C extension compiles without warnings
- Individual functions tested for correctness
- Numerical results match baseline within tolerance
- Performance meets speedup requirements
- Edge cases explicitly tested or handled
- Error handling implemented for invalid inputs
- File contents verified after all edits
- No memory leaks in C code (proper reference counting)
在认为任务完成前,确认以下事项:
- 已阅读并理解所有源文件
- C扩展编译无警告
- 单个函数已通过正确性测试
- 数值结果与基准实现的差异在容差范围内
- 性能满足加速比要求
- 边缘案例已明确测试或处理
- 已为无效输入实现错误处理
- 所有编辑后已验证文件内容
- C代码中无内存泄漏(引用计数正确)