adaptive-rejection-sampler

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Adaptive Rejection Sampler

Adaptive Rejection Sampler

Overview

概述

This skill provides guidance for implementing Adaptive Rejection Sampling (ARS) algorithms. ARS is a method for generating samples from log-concave probability distributions by constructing piecewise linear upper and lower envelopes of the log-density function. The skill focuses on procedural approaches, performance optimization, and verification strategies rather than providing implementation code.
本技能提供Adaptive Rejection Sampling(ARS)算法的实现指南。ARS是一种通过构建对数密度函数的分段线性上下包络,从log-concave概率分布中生成样本的方法。本技能聚焦于流程方法、性能优化和验证策略,而非提供具体的实现代码。

When to Use This Skill

适用场景

Use this skill when:
  • Implementing adaptive rejection sampling from scratch
  • Working with log-concave distribution samplers
  • Building statistical sampling algorithms that require envelope construction
  • Debugging or optimizing existing ARS implementations
  • The task involves R, Python, or other statistical computing environments
在以下场景中使用本技能:
  • 从零开始实现自适应拒绝采样
  • 开发log-concave分布采样器
  • 构建需要包络构建的统计采样算法
  • 调试或优化现有的ARS实现
  • 涉及R、Python或其他统计计算环境的任务

Implementation Approach

实现步骤

Phase 1: Algorithm Design Before Coding

阶段1:编码前的算法设计

Before writing any code:
  1. Understand the mathematical foundations
    • Review the ARS algorithm requirements for log-concave functions
    • Understand how piecewise linear envelopes are constructed from tangent lines
    • Identify the squeeze function optimization
  2. Design with performance in mind
    • Plan iteration limits and safeguards from the start
    • Consider worst-case computational complexity of the sampling loop
    • Design for timeout constraints that may be stricter than development testing
  3. Plan the module structure
    • Log-concavity verification module
    • Envelope construction and update module
    • Sampling loop with rejection logic
    • Initialization point selection
在编写任何代码之前:
  1. 理解数学基础
    • 回顾ARS算法对log-concave函数的要求
    • 理解如何通过切线构建分段线性包络
    • 明确压缩函数的优化方式
  2. 以性能为导向进行设计
    • 从一开始就规划迭代限制和防护措施
    • 考虑采样循环的最坏情况计算复杂度
    • 针对可能比开发测试更严格的超时约束进行设计
  3. 规划模块结构
    • log-concavity验证模块
    • 包络构建与更新模块
    • 带有拒绝逻辑的采样循环
    • 初始点选择模块

Phase 2: Critical Implementation Considerations

阶段2:关键实现注意事项

Log-Concavity Checking

Log-Concavity检查

When implementing log-concavity verification:
  • Use appropriate tolerance values for numerical comparison
  • Consider that some valid distributions have constant second derivatives (e.g., exponential distribution)
  • Use
    <= tolerance
    instead of strict
    <
    comparisons when checking for non-positive second derivatives
  • Test with edge cases like exponential, normal, and gamma distributions
在实现log-concavity验证时:
  • 使用合适的数值比较容差
  • 注意部分有效分布的二阶导数为常数(例如指数分布)
  • 检查非正二阶导数时使用
    <= 容差
    而非严格的
    <
    比较
  • 使用指数分布、正态分布和伽马分布等边缘案例进行测试

Initialization Points

初始点设置

Proper initialization significantly affects sampling quality and performance:
  • Handle shifted distributions by adjusting initialization points relative to the mode
  • Consider the support bounds when selecting initial points
  • Use multiple initial points to ensure good envelope coverage
  • Test with distributions that have modes far from the origin
合理的初始化对采样质量和性能有显著影响:
  • 通过相对于众数调整初始点来处理偏移分布
  • 选择初始点时考虑分布的支持边界
  • 使用多个初始点以确保良好的包络覆盖
  • 对众数远离原点的分布进行测试

Iteration Limits and Safeguards

迭代限制与防护措施

Critical for preventing infinite loops and timeouts:
  • Implement maximum iteration limits in all sampling loops
  • Add progress indicators or logging for long-running computations
  • Include timeout protection mechanisms
  • Design early exit conditions for convergence
这对防止无限循环和超时至关重要:
  • 在所有采样循环中实现最大迭代限制
  • 为长时间运行的计算添加进度指示器或日志
  • 包含超时保护机制
  • 设计收敛的提前退出条件

Phase 3: Performance-First Development

阶段3:以性能为核心的开发

Timeout Considerations

超时考量

  • External test frameworks may use shorter timeouts than development testing
  • If development tests use 180-second timeouts, production may use 60 seconds
  • Profile performance early to catch issues before they become blocking
  • Test with varying timeout constraints to ensure robustness
  • 外部测试框架的超时时间可能比开发测试更短
  • 如果开发测试使用180秒超时,生产环境可能使用60秒
  • 尽早进行性能分析,在问题成为阻塞之前发现它们
  • 使用不同的超时约束进行测试,以确保鲁棒性

Computational Efficiency

计算效率

  • Analyze the computational complexity of the sampling loop
  • Optimize envelope updates to minimize recalculation
  • Consider caching frequently computed values
  • Profile with realistic sample sizes
  • 分析采样循环的计算复杂度
  • 优化包络更新以最小化重复计算
  • 考虑缓存频繁计算的值
  • 使用真实样本量进行性能分析

Verification Strategies

验证策略

Unit Testing Approach

单元测试方法

  1. Test each module independently
    • Log-concavity checker with known log-concave and non-log-concave functions
    • Envelope construction with simple distributions
    • Sampling loop with controlled random seeds
  2. Test known distributions
    • Standard normal distribution
    • Exponential distribution (constant second derivative edge case)
    • Truncated normal distributions
    • Gamma distributions with various parameters
  3. Test edge cases explicitly
    • Non-function inputs
    • Negative sample counts
    • Invalid bounds (lower > upper)
    • Very small and very large sample sizes
  1. 独立测试每个模块
    • 使用已知的log-concave和非log-concave函数测试log-concavity检查器
    • 使用简单分布测试包络构建
    • 使用受控随机种子测试采样循环
  2. 测试已知分布
    • 标准正态分布
    • 指数分布(二阶导数为常数的边缘案例)
    • 截断正态分布
    • 不同参数的伽马分布
  3. 显式测试边缘案例
    • 非函数输入
    • 负样本数量
    • 无效边界(下限 > 上限)
    • 极小和极大样本量

Performance Testing

性能测试

  • Match testing timeouts to expected production constraints
  • Test worst-case behavior, not just average case
  • Profile with different random seeds to catch stochastic failures
  • Measure time per sample to identify performance degradation
  • 使测试超时时间与预期的生产约束匹配
  • 测试最坏情况行为,而非仅平均情况
  • 使用不同的随机种子进行性能分析,以发现随机故障
  • 测量每个样本的耗时,以识别性能下降

Integration Testing

集成测试

  • Test the complete pipeline from input to samples
  • Verify statistical properties of generated samples (mean, variance, distribution shape)
  • Use statistical tests (KS test, chi-square) to verify sample quality
  • Test with distributions that have known analytical moments
  • 测试从输入到生成样本的完整流程
  • 验证生成样本的统计特性(均值、方差、分布形状)
  • 使用统计检验(KS test、chi-square)验证样本质量
  • 对具有已知解析矩的分布进行测试

Common Pitfalls and Mistakes

常见陷阱与错误

Critical Mistakes to Avoid

需避免的关键错误

  1. Missing iteration limits
    • Always include maximum iteration safeguards in rejection sampling loops
    • An unbounded rejection loop will cause timeouts on difficult distributions
  2. Inadequate performance testing
    • Development tests passing does not guarantee production success
    • Test with the same timeout constraints as the target environment
  3. Tolerance issues in log-concavity checking
    • Strict inequality checks (
      < 0
      ) may incorrectly reject valid distributions
    • Use tolerant comparisons (
      <= tolerance
      ) for numerical stability
  4. Poor initialization for shifted distributions
    • Hard-coded initialization points fail for distributions with non-zero modes
    • Always compute initialization relative to the distribution's characteristics
  5. Incomplete file verification
    • After writing large code blocks, verify the complete content was written
    • Truncated files can cause subtle bugs
  1. 缺少迭代限制
    • 始终在拒绝采样循环中添加最大迭代防护
    • 无界的拒绝循环会在处理复杂分布时导致超时
  2. 性能测试不充分
    • 开发测试通过并不代表生产环境能成功运行
    • 使用与目标环境相同的超时约束进行测试
  3. Log-Concavity检查中的容差问题
    • 严格的不等式检查(
      < 0
      )可能会错误地拒绝有效分布
    • 为了数值稳定性,使用容错比较(
      <= 容差
  4. 偏移分布的初始化不当
    • 硬编码的初始点在处理非零众数的分布时会失效
    • 始终根据分布的特征计算初始点
  5. 文件验证不完整
    • 编写大段代码后,验证内容是否完整写入
    • 截断的文件会导致难以察觉的bug

Debugging Approach

调试方法

When tests fail or timeout:
  1. Systematic debugging over trial-and-error
    • Understand root causes before implementing fixes
    • Use logging to identify where time is being spent
  2. Isolate the problematic component
    • Test log-concavity checking separately
    • Test envelope construction separately
    • Test sampling loop with mock envelopes
  3. Check for infinite loops
    • Add iteration counters to all loops
    • Log when iteration limits are approached
    • Verify exit conditions are reachable
当测试失败或超时时:
  1. 系统化调试而非试错
    • 在实施修复前先理解根本原因
    • 使用日志确定时间消耗的位置
  2. 隔离有问题的组件
    • 单独测试log-concavity检查
    • 单独测试包络构建
    • 使用模拟包络测试采样循环
  3. 检查是否存在无限循环
    • 为所有循环添加迭代计数器
    • 当接近迭代限制时记录日志
    • 验证退出条件是否可达

Quality Checklist

质量检查清单

Before considering the implementation complete:
  • All sampling loops have maximum iteration limits
  • Log-concavity checking uses appropriate tolerances
  • Initialization points adapt to distribution characteristics
  • Performance tested with target timeout constraints
  • Edge cases for invalid inputs are handled
  • Statistical properties of samples verified
  • Code verified to be completely written (no truncation)
  • Tested with multiple random seeds
在认为实现完成之前:
  • 所有采样循环都有最大迭代限制
  • Log-Concavity检查使用了合适的容差
  • 初始点可根据分布特征自适应调整
  • 已使用目标超时约束进行性能测试
  • 已处理无效输入的边缘案例
  • 已验证样本的统计特性
  • 已验证代码完整写入(无截断)
  • 已使用多个随机种子进行测试