adaptive-rejection-sampler
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAdaptive Rejection Sampler
Adaptive Rejection Sampler
Overview
概述
This skill provides guidance for implementing Adaptive Rejection Sampling (ARS) algorithms. ARS is a method for generating samples from log-concave probability distributions by constructing piecewise linear upper and lower envelopes of the log-density function. The skill focuses on procedural approaches, performance optimization, and verification strategies rather than providing implementation code.
本技能提供Adaptive Rejection Sampling(ARS)算法的实现指南。ARS是一种通过构建对数密度函数的分段线性上下包络,从log-concave概率分布中生成样本的方法。本技能聚焦于流程方法、性能优化和验证策略,而非提供具体的实现代码。
When to Use This Skill
适用场景
Use this skill when:
- Implementing adaptive rejection sampling from scratch
- Working with log-concave distribution samplers
- Building statistical sampling algorithms that require envelope construction
- Debugging or optimizing existing ARS implementations
- The task involves R, Python, or other statistical computing environments
在以下场景中使用本技能:
- 从零开始实现自适应拒绝采样
- 开发log-concave分布采样器
- 构建需要包络构建的统计采样算法
- 调试或优化现有的ARS实现
- 涉及R、Python或其他统计计算环境的任务
Implementation Approach
实现步骤
Phase 1: Algorithm Design Before Coding
阶段1:编码前的算法设计
Before writing any code:
-
Understand the mathematical foundations
- Review the ARS algorithm requirements for log-concave functions
- Understand how piecewise linear envelopes are constructed from tangent lines
- Identify the squeeze function optimization
-
Design with performance in mind
- Plan iteration limits and safeguards from the start
- Consider worst-case computational complexity of the sampling loop
- Design for timeout constraints that may be stricter than development testing
-
Plan the module structure
- Log-concavity verification module
- Envelope construction and update module
- Sampling loop with rejection logic
- Initialization point selection
在编写任何代码之前:
-
理解数学基础
- 回顾ARS算法对log-concave函数的要求
- 理解如何通过切线构建分段线性包络
- 明确压缩函数的优化方式
-
以性能为导向进行设计
- 从一开始就规划迭代限制和防护措施
- 考虑采样循环的最坏情况计算复杂度
- 针对可能比开发测试更严格的超时约束进行设计
-
规划模块结构
- log-concavity验证模块
- 包络构建与更新模块
- 带有拒绝逻辑的采样循环
- 初始点选择模块
Phase 2: Critical Implementation Considerations
阶段2:关键实现注意事项
Log-Concavity Checking
Log-Concavity检查
When implementing log-concavity verification:
- Use appropriate tolerance values for numerical comparison
- Consider that some valid distributions have constant second derivatives (e.g., exponential distribution)
- Use instead of strict
<= tolerancecomparisons when checking for non-positive second derivatives< - Test with edge cases like exponential, normal, and gamma distributions
在实现log-concavity验证时:
- 使用合适的数值比较容差
- 注意部分有效分布的二阶导数为常数(例如指数分布)
- 检查非正二阶导数时使用而非严格的
<= 容差比较< - 使用指数分布、正态分布和伽马分布等边缘案例进行测试
Initialization Points
初始点设置
Proper initialization significantly affects sampling quality and performance:
- Handle shifted distributions by adjusting initialization points relative to the mode
- Consider the support bounds when selecting initial points
- Use multiple initial points to ensure good envelope coverage
- Test with distributions that have modes far from the origin
合理的初始化对采样质量和性能有显著影响:
- 通过相对于众数调整初始点来处理偏移分布
- 选择初始点时考虑分布的支持边界
- 使用多个初始点以确保良好的包络覆盖
- 对众数远离原点的分布进行测试
Iteration Limits and Safeguards
迭代限制与防护措施
Critical for preventing infinite loops and timeouts:
- Implement maximum iteration limits in all sampling loops
- Add progress indicators or logging for long-running computations
- Include timeout protection mechanisms
- Design early exit conditions for convergence
这对防止无限循环和超时至关重要:
- 在所有采样循环中实现最大迭代限制
- 为长时间运行的计算添加进度指示器或日志
- 包含超时保护机制
- 设计收敛的提前退出条件
Phase 3: Performance-First Development
阶段3:以性能为核心的开发
Timeout Considerations
超时考量
- External test frameworks may use shorter timeouts than development testing
- If development tests use 180-second timeouts, production may use 60 seconds
- Profile performance early to catch issues before they become blocking
- Test with varying timeout constraints to ensure robustness
- 外部测试框架的超时时间可能比开发测试更短
- 如果开发测试使用180秒超时,生产环境可能使用60秒
- 尽早进行性能分析,在问题成为阻塞之前发现它们
- 使用不同的超时约束进行测试,以确保鲁棒性
Computational Efficiency
计算效率
- Analyze the computational complexity of the sampling loop
- Optimize envelope updates to minimize recalculation
- Consider caching frequently computed values
- Profile with realistic sample sizes
- 分析采样循环的计算复杂度
- 优化包络更新以最小化重复计算
- 考虑缓存频繁计算的值
- 使用真实样本量进行性能分析
Verification Strategies
验证策略
Unit Testing Approach
单元测试方法
-
Test each module independently
- Log-concavity checker with known log-concave and non-log-concave functions
- Envelope construction with simple distributions
- Sampling loop with controlled random seeds
-
Test known distributions
- Standard normal distribution
- Exponential distribution (constant second derivative edge case)
- Truncated normal distributions
- Gamma distributions with various parameters
-
Test edge cases explicitly
- Non-function inputs
- Negative sample counts
- Invalid bounds (lower > upper)
- Very small and very large sample sizes
-
独立测试每个模块
- 使用已知的log-concave和非log-concave函数测试log-concavity检查器
- 使用简单分布测试包络构建
- 使用受控随机种子测试采样循环
-
测试已知分布
- 标准正态分布
- 指数分布(二阶导数为常数的边缘案例)
- 截断正态分布
- 不同参数的伽马分布
-
显式测试边缘案例
- 非函数输入
- 负样本数量
- 无效边界(下限 > 上限)
- 极小和极大样本量
Performance Testing
性能测试
- Match testing timeouts to expected production constraints
- Test worst-case behavior, not just average case
- Profile with different random seeds to catch stochastic failures
- Measure time per sample to identify performance degradation
- 使测试超时时间与预期的生产约束匹配
- 测试最坏情况行为,而非仅平均情况
- 使用不同的随机种子进行性能分析,以发现随机故障
- 测量每个样本的耗时,以识别性能下降
Integration Testing
集成测试
- Test the complete pipeline from input to samples
- Verify statistical properties of generated samples (mean, variance, distribution shape)
- Use statistical tests (KS test, chi-square) to verify sample quality
- Test with distributions that have known analytical moments
- 测试从输入到生成样本的完整流程
- 验证生成样本的统计特性(均值、方差、分布形状)
- 使用统计检验(KS test、chi-square)验证样本质量
- 对具有已知解析矩的分布进行测试
Common Pitfalls and Mistakes
常见陷阱与错误
Critical Mistakes to Avoid
需避免的关键错误
-
Missing iteration limits
- Always include maximum iteration safeguards in rejection sampling loops
- An unbounded rejection loop will cause timeouts on difficult distributions
-
Inadequate performance testing
- Development tests passing does not guarantee production success
- Test with the same timeout constraints as the target environment
-
Tolerance issues in log-concavity checking
- Strict inequality checks () may incorrectly reject valid distributions
< 0 - Use tolerant comparisons () for numerical stability
<= tolerance
- Strict inequality checks (
-
Poor initialization for shifted distributions
- Hard-coded initialization points fail for distributions with non-zero modes
- Always compute initialization relative to the distribution's characteristics
-
Incomplete file verification
- After writing large code blocks, verify the complete content was written
- Truncated files can cause subtle bugs
-
缺少迭代限制
- 始终在拒绝采样循环中添加最大迭代防护
- 无界的拒绝循环会在处理复杂分布时导致超时
-
性能测试不充分
- 开发测试通过并不代表生产环境能成功运行
- 使用与目标环境相同的超时约束进行测试
-
Log-Concavity检查中的容差问题
- 严格的不等式检查()可能会错误地拒绝有效分布
< 0 - 为了数值稳定性,使用容错比较()
<= 容差
- 严格的不等式检查(
-
偏移分布的初始化不当
- 硬编码的初始点在处理非零众数的分布时会失效
- 始终根据分布的特征计算初始点
-
文件验证不完整
- 编写大段代码后,验证内容是否完整写入
- 截断的文件会导致难以察觉的bug
Debugging Approach
调试方法
When tests fail or timeout:
-
Systematic debugging over trial-and-error
- Understand root causes before implementing fixes
- Use logging to identify where time is being spent
-
Isolate the problematic component
- Test log-concavity checking separately
- Test envelope construction separately
- Test sampling loop with mock envelopes
-
Check for infinite loops
- Add iteration counters to all loops
- Log when iteration limits are approached
- Verify exit conditions are reachable
当测试失败或超时时:
-
系统化调试而非试错
- 在实施修复前先理解根本原因
- 使用日志确定时间消耗的位置
-
隔离有问题的组件
- 单独测试log-concavity检查
- 单独测试包络构建
- 使用模拟包络测试采样循环
-
检查是否存在无限循环
- 为所有循环添加迭代计数器
- 当接近迭代限制时记录日志
- 验证退出条件是否可达
Quality Checklist
质量检查清单
Before considering the implementation complete:
- All sampling loops have maximum iteration limits
- Log-concavity checking uses appropriate tolerances
- Initialization points adapt to distribution characteristics
- Performance tested with target timeout constraints
- Edge cases for invalid inputs are handled
- Statistical properties of samples verified
- Code verified to be completely written (no truncation)
- Tested with multiple random seeds
在认为实现完成之前:
- 所有采样循环都有最大迭代限制
- Log-Concavity检查使用了合适的容差
- 初始点可根据分布特征自适应调整
- 已使用目标超时约束进行性能测试
- 已处理无效输入的边缘案例
- 已验证样本的统计特性
- 已验证代码完整写入(无截断)
- 已使用多个随机种子进行测试