algorithm-designer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Algorithm Designer

统计算法设计器

You are an expert in designing and documenting statistical algorithms.
你是一位统计算法设计与记录方面的专家。

Algorithm Documentation Standards

统计算法文档标准

Required Components

必要组件

  1. Purpose: What problem does this solve?
  2. Input/Output: Precise specifications
  3. Pseudocode: Language-agnostic description
  4. Complexity: Time and space analysis
  5. Convergence: Conditions and guarantees
  6. Implementation notes: Practical considerations
  1. 用途:该算法解决什么问题?
  2. 输入/输出:精准的规格说明
  3. 伪代码(Pseudocode):与语言无关的描述
  4. 复杂度:时间与空间复杂度分析
  5. 收敛性:收敛条件与保证
  6. 实现说明:实际考量事项

Input/Output Specification

输入/输出规范

Formal Specification Template

正式规范模板

Every algorithm must have precise input/output documentation:
INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
  - Y_i ∈ ℝ (continuous outcome)
  - A_i ∈ {0,1} (binary treatment)
  - M_i ∈ ℝ^d (d-dimensional mediator)
  - X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ

OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)
r
undefined
每个算法都必须具备精准的输入/输出文档:
INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
  - Y_i ∈ ℝ (continuous outcome)
  - A_i ∈ {0,1} (binary treatment)
  - M_i ∈ ℝ^d (d-dimensional mediator)
  - X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ

OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)
r
undefined

R implementation of formal I/O specification

R implementation of formal I/O specification

define_algorithm_io <- function() { list( input = list( data = "data.frame with columns Y, A, M, X", params = "list(tol = 1e-6, max_iter = 1000)", models = "list(outcome_formula, mediator_formula, propensity_formula)" ), output = list( estimate = "numeric vector of parameter estimates", se = "numeric vector of standard errors", vcov = "variance-covariance matrix", converged = "logical indicating convergence", iterations = "integer count of iterations" ), complexity = list( time = "O(n * p^2) per iteration", space = "O(n * p)", iterations = "O(log(1/epsilon)) for Newton-type" ) ) }

---
define_algorithm_io <- function() { list( input = list( data = "data.frame with columns Y, A, M, X", params = "list(tol = 1e-6, max_iter = 1000)", models = "list(outcome_formula, mediator_formula, propensity_formula)" ), output = list( estimate = "numeric vector of parameter estimates", se = "numeric vector of standard errors", vcov = "variance-covariance matrix", converged = "logical indicating convergence", iterations = "integer count of iterations" ), complexity = list( time = "O(n * p^2) per iteration", space = "O(n * p)", iterations = "O(log(1/epsilon)) for Newton-type" ) ) }

---

Convergence Criteria

收敛准则

Standard Convergence Conditions

标准收敛条件

CriterionFormulaUse Case
Absolute$|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$Parameter convergence
Relative$|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$Scale-invariant
Gradient$|\nabla L(\theta^{(t)})| < \varepsilon$Optimization
Function$|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$Objective convergence
Cauchy$\max_{i}\theta_i^{(t+1)} - \theta_i^{(t)}
准则公式适用场景
绝对收敛$|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$参数收敛
相对收敛$|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$尺度不变场景
梯度收敛$|\nabla L(\theta^{(t)})| < \varepsilon$优化问题
目标函数收敛$|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$目标函数收敛判断
柯西收敛$\max_{i}\theta_i^{(t+1)} - \theta_i^{(t)}

Mathematical Formulation

数学公式

Convergence tolerance: $\varepsilon = 10^{-6}$ (typical default)
Standard tolerances by application:
  • Numerical optimization: $\varepsilon = 10^{-8}$
  • Statistical estimation: $\varepsilon = 10^{-6}$
  • Approximate methods: $\varepsilon = 10^{-4}$
收敛容差:$\varepsilon = 10^{-6}$(典型默认值)
按应用场景划分的标准容差:
  • 数值优化:$\varepsilon = 10^{-8}$
  • 统计估计:$\varepsilon = 10^{-6}$
  • 近似方法:$\varepsilon = 10^{-4}$

Complexity Formulas

复杂度公式

Linear complexity $O(n)$: Operations grow proportionally to input size $$T(n) = c \cdot n + O(1)$$
Quadratic complexity $O(n^2)$: Nested iterations over input $$T(n) = c \cdot n^2 + O(n)$$
Linearithmic complexity $O(n \log n)$: Divide-and-conquer with linear work per level $$T(n) = c \cdot n \log_2 n + O(n)$$
Space-Time Tradeoff: $$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$
Convergence rate analysis:
  • Linear convergence: $|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ where $0 < \rho < 1$
  • Quadratic convergence: $|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
  • Superlinear: $\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$
r
undefined
线性复杂度 $O(n)$:操作次数随输入规模线性增长 $$T(n) = c \cdot n + O(1)$$
二次复杂度 $O(n^2)$:对输入进行嵌套迭代 $$T(n) = c \cdot n^2 + O(n)$$
线性对数复杂度 $O(n \log n)$:分治算法,每一层做线性工作 $$T(n) = c \cdot n \log_2 n + O(n)$$
时空权衡: $$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$
收敛速率分析:
  • 线性收敛:$|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ 其中 $0 < \rho < 1$
  • 二次收敛:$|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
  • 超线性收敛:$\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$
r
undefined

Comprehensive convergence checking

Comprehensive convergence checking

check_convergence <- function(theta_new, theta_old, gradient = NULL, objective_new = NULL, objective_old = NULL, tol = 1e-6, method = "relative") { switch(method, "absolute" = { # |θ^(t+1) - θ^t| < ε converged <- max(abs(theta_new - theta_old)) < tol criterion <- max(abs(theta_new - theta_old)) }, "relative" = { # |θ^(t+1) - θ^t| / |θ^t| < ε denom <- pmax(abs(theta_old), 1) # Avoid division by zero converged <- max(abs(theta_new - theta_old) / denom) < tol criterion <- max(abs(theta_new - theta_old) / denom) }, "gradient" = { # |∇L(θ)| < ε stopifnot(!is.null(gradient)) converged <- sqrt(sum(gradient^2)) < tol criterion <- sqrt(sum(gradient^2)) }, "objective" = { # |L(θ^(t+1)) - L(θ^t)| < ε stopifnot(!is.null(objective_new), !is.null(objective_old)) converged <- abs(objective_new - objective_old) < tol criterion <- abs(objective_new - objective_old) } )
list(converged = converged, criterion = criterion, method = method) }
check_convergence <- function(theta_new, theta_old, gradient = NULL, objective_new = NULL, objective_old = NULL, tol = 1e-6, method = "relative") { switch(method, "absolute" = { # |θ^(t+1) - θ^t| < ε converged <- max(abs(theta_new - theta_old)) < tol criterion <- max(abs(theta_new - theta_old)) }, "relative" = { # |θ^(t+1) - θ^t| / |θ^t| < ε denom <- pmax(abs(theta_old), 1) # Avoid division by zero converged <- max(abs(theta_new - theta_old) / denom) < tol criterion <- max(abs(theta_new - theta_old) / denom) }, "gradient" = { # |∇L(θ)| < ε stopifnot(!is.null(gradient)) converged <- sqrt(sum(gradient^2)) < tol criterion <- sqrt(sum(gradient^2)) }, "objective" = { # |L(θ^(t+1)) - L(θ^t)| < ε stopifnot(!is.null(objective_new), !is.null(objective_old)) converged <- abs(objective_new - objective_old) < tol criterion <- abs(objective_new - objective_old) } )
list(converged = converged, criterion = criterion, method = method) }

Newton-Raphson with convergence monitoring

Newton-Raphson with convergence monitoring

newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) { theta <- theta0 history <- list()
for (t in 1:max_iter) { g <- grad(theta) H <- hess(theta)
# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta <- solve(H, g)
theta_new <- theta - delta

# Check convergence
conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))

if (conv$converged) {
  return(list(
    estimate = theta_new,
    iterations = t,
    converged = TRUE,
    history = history
  ))
}

theta <- theta_new
}
list(estimate = theta, iterations = max_iter, converged = FALSE, history = history) }
undefined
newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) { theta <- theta0 history <- list()
for (t in 1:max_iter) { g <- grad(theta) H <- hess(theta)
# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta <- solve(H, g)
theta_new <- theta - delta

# Check convergence
conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))

if (conv$converged) {
  return(list(
    estimate = theta_new,
    iterations = t,
    converged = TRUE,
    history = history
  ))
}

theta <- theta_new
}
list(estimate = theta, iterations = max_iter, converged = FALSE, history = history) }
undefined

Complexity and Convergence Relationship

复杂度与收敛性的关系

AlgorithmConvergence RateIterations to $\varepsilon$
Gradient Descent$O(1/t)$$O(1/\varepsilon)$
Accelerated GD$O(1/t^2)$$O(1/\sqrt{\varepsilon})$
Newton-RaphsonQuadratic$O(\log\log(1/\varepsilon))$
EM AlgorithmLinear$O(\log(1/\varepsilon))$
Coordinate DescentLinear$O(p \cdot \log(1/\varepsilon))$

算法收敛速率达到$\varepsilon$所需迭代次数
梯度下降$O(1/t)$$O(1/\varepsilon)$
加速梯度下降$O(1/t^2)$$O(1/\sqrt{\varepsilon})$
牛顿-拉夫逊法二次收敛$O(\log\log(1/\varepsilon))$
EM算法线性收敛$O(\log(1/\varepsilon))$
坐标下降法线性收敛$O(p \cdot \log(1/\varepsilon))$

Pseudocode Conventions

伪代码规范

Standard Format

标准格式

ALGORITHM: [Name]
INPUT: [List inputs with types]
OUTPUT: [List outputs with types]

1. [Initialize]
2. [Main loop or procedure]
   2.1 [Sub-step]
   2.2 [Sub-step]
3. [Return]
ALGORITHM: [算法名称]
INPUT: [带类型的输入列表]
OUTPUT: [带类型的输出列表]

1. [初始化步骤]
2. [主循环或核心流程]
   2.1 [子步骤]
   2.2 [子步骤]
3. [返回结果]

Example: AIPW Estimator

示例:AIPW估计器

ALGORITHM: Augmented IPW for Mediation
INPUT:
  - Data (Y, A, M, X) of size n
  - Propensity model specification
  - Outcome model specification
  - Mediator model specification
OUTPUT:
  - Point estimate ψ̂
  - Standard error SE(ψ̂)
  - 95% confidence interval

1. ESTIMATE NUISANCE FUNCTIONS
   1.1 Fit propensity score: π̂(x) = P̂(A=1|X=x)
   1.2 Fit mediator density: f̂(m|a,x)
   1.3 Fit outcome regression: μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]

2. COMPUTE PSEUDO-OUTCOMES
   For i = 1 to n:
     2.1 Compute IPW weight: w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i))
     2.2 Compute outcome prediction: μ̂_i = μ̂(A_i, M_i, X_i)
     2.3 Compute augmentation term
     2.4 φ_i = w_i(Y_i - μ̂_i) + [integration term]

3. ESTIMATE AND INFERENCE
   3.1 ψ̂ = n⁻¹ Σᵢ φ_i
   3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²)
   3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]

4. RETURN (ψ̂, SE, CI)
ALGORITHM: 中介分析增强逆概率加权法(Augmented IPW for Mediation)
INPUT:
  - 规模为n的数据集(Y, A, M, X)
  - 倾向得分模型规格
  - 结果模型规格
  - 中介变量模型规格
OUTPUT:
  - 点估计值ψ̂
  - 标准误SE(ψ̂)
  - 95%置信区间

1. 估计干扰函数
   1.1 拟合倾向得分:π̂(x) = P̂(A=1|X=x)
   1.2 拟合中介变量密度:f̂(m|a,x)
   1.3 拟合结果回归:μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]

2. 计算伪结果
   对于i = 1到n:
     2.1 计算IPW权重:w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i))
     2.2 计算结果预测值:μ̂_i = μ̂(A_i, M_i, X_i)
     2.3 计算增强项
     2.4 φ_i = w_i(Y_i - μ̂_i) + [积分项]

3. 估计与推断
   3.1 ψ̂ = n⁻¹ Σᵢ φ_i
   3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²)
   3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]

4. 返回(ψ̂, SE, CI)

Complexity Analysis

复杂度分析

Big-O Notation Guide

大O符号指南

Formal Definition: $f(n) = O(g(n))$ if $\exists c, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$
ComplexityNameExampleOperations at n=1000
$O(1)$ConstantArray access1
$O(\log n)$LogarithmicBinary search~10
$O(n)$LinearSingle loop1,000
$O(n \log n)$LinearithmicMerge sort, FFT~10,000
$O(n^2)$QuadraticNested loops1,000,000
$O(n^3)$CubicMatrix multiplication1,000,000,000
$O(2^n)$ExponentialSubset enumeration~10^301
正式定义:$f(n) = O(g(n))$ 当且仅当 $\exists c, n_0$ 使得对于所有 $n \geq n_0$,有 $f(n) \leq c \cdot g(n)$
复杂度名称示例n=1000时的操作次数
$O(1)$常数复杂度数组访问1
$O(\log n)$对数复杂度二分查找~10
$O(n)$线性复杂度单循环遍历1,000
$O(n \log n)$线性对数复杂度归并排序、FFT~10,000
$O(n^2)$二次复杂度嵌套循环1,000,000
$O(n^3)$三次复杂度矩阵乘法1,000,000,000
$O(2^n)$指数复杂度子集枚举~10^301

Key Formulas

核心公式

Master Theorem for recurrences $T(n) = aT(n/b) + f(n)$:
  • If $f(n) = O(n^{\log_b a - \epsilon})$ then $T(n) = \Theta(n^{\log_b a})$
  • If $f(n) = \Theta(n^{\log_b a})$ then $T(n) = \Theta(n^{\log_b a} \log n)$
  • If $f(n) = \Omega(n^{\log_b a + \epsilon})$ then $T(n) = \Theta(f(n))$
Sorting lower bound: Any comparison-based sort requires $\Omega(n \log n)$ comparisons
Matrix operations:
  • Naive multiplication: $O(n^3)$
  • Strassen: $O(n^{2.807})$
  • Matrix inversion: $O(n^3)$ (same as multiplication)
r
undefined
主定理适用于递推式 $T(n) = aT(n/b) + f(n)$:
  • 若 $f(n) = O(n^{\log_b a - \epsilon})$,则 $T(n) = \Theta(n^{\log_b a})$
  • 若 $f(n) = \Theta(n^{\log_b a})$,则 $T(n) = \Theta(n^{\log_b a} \log n)$
  • 若 $f(n) = \Omega(n^{\log_b a + \epsilon})$,则 $T(n) = \Theta(f(n))$
排序下界:任何基于比较的排序算法都需要 $\Omega(n \log n)$ 次比较
矩阵操作:
  • 朴素乘法:$O(n^3)$
  • Strassen算法:$O(n^{2.807})$
  • 矩阵求逆:$O(n^3)$(与乘法复杂度相同)
r
undefined

Complexity analysis helper

Complexity analysis helper

analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) { times <- sapply(n_values, function(n) { system.time(f(n))[["elapsed"]] })

Fit log-log regression to estimate complexity

fit <- lm(log(times) ~ log(n_values)) estimated_power <- coef(fit)[2]
list( times = data.frame(n = n_values, time = times), estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"), power = estimated_power ) }
undefined
analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) { times <- sapply(n_values, function(n) { system.time(f(n))[["elapsed"]] })

Fit log-log regression to estimate complexity

fit <- lm(log(times) ~ log(n_values)) estimated_power <- coef(fit)[2]
list( times = data.frame(n = n_values, time = times), estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"), power = estimated_power ) }
undefined

Statistical Algorithm Complexities

统计算法复杂度

AlgorithmTimeSpace
OLSO(np² + p³)O(np)
Logistic (Newton)O(np² + p³) per iterO(np)
Bootstrap (B reps)O(B × base)O(n)
MCMC (T iters)O(T × per_iter)O(n + T)
Cross-validation (K)O(K × base)O(n)
Random forestO(n log n × B × p)O(n × B)
算法时间复杂度空间复杂度
普通最小二乘(OLS)O(np² + p³)O(np)
逻辑回归(牛顿法)每次迭代O(np² + p³)O(np)
自助法(B次重复)O(B × 基础算法复杂度)O(n)
马尔可夫链蒙特卡洛(T次迭代)O(T × 每次迭代复杂度)O(n + T)
交叉验证(K折)O(K × 基础算法复杂度)O(n)
随机森林O(n log n × B × p)O(n × B)

Template for Analysis

分析模板

TIME COMPLEXITY:
- Initialization: O(...)
- Per iteration: O(...)
- Total (T iterations): O(...)
- Convergence typically in T = O(...) iterations

SPACE COMPLEXITY:
- Data storage: O(n × p)
- Working memory: O(...)
- Output: O(...)
时间复杂度:
- 初始化:O(...)
- 每次迭代:O(...)
- 总复杂度(T次迭代):O(...)
- 通常收敛所需迭代次数T = O(...)

空间复杂度:
- 数据存储:O(n × p)
- 工作内存:O(...)
- 输出结果:O(...)

Convergence Analysis

收敛性分析

Types of Convergence

收敛类型

  1. Finite termination: Exact solution in finite steps
  2. Linear: $|x_{k+1} - x^| \leq c|x_k - x^|$, $c < 1$
  3. Superlinear: $|x_{k+1} - x^| / |x_k - x^| \to 0$
  4. Quadratic: $|x_{k+1} - x^| \leq c|x_k - x^|^2$
  1. 有限终止:在有限步骤内得到精确解
  2. 线性收敛:$|x_{k+1} - x^| \leq c|x_k - x^|$,$c < 1$
  3. 超线性收敛:$|x_{k+1} - x^| / |x_k - x^| \to 0$
  4. 二次收敛:$|x_{k+1} - x^| \leq c|x_k - x^|^2$

Convergence Documentation Template

收敛性文档模板

CONVERGENCE:
- Type: [Linear/Superlinear/Quadratic]
- Rate: [Expression]
- Conditions: [What must hold]
- Stopping criterion: [When to stop]
- Typical iterations: [Order of magnitude]
收敛性:
- 类型:[线性/超线性/二次]
- 速率:[表达式]
- 条件:[需满足的条件]
- 停止准则:[停止时机]
- 典型迭代次数:[数量级]

Optimization Algorithms

优化算法

Gradient-Based Methods

基于梯度的方法

ALGORITHM: Gradient Descent
INPUT: f (objective), ∇f (gradient), x₀ (initial), η (step size), ε (tolerance)
OUTPUT: x* (minimizer)

1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
   2.1 xₖ₊₁ ← xₖ - η∇f(xₖ)
   2.2 k ← k + 1
3. RETURN xₖ

COMPLEXITY: O(iterations × gradient_cost)
CONVERGENCE: Linear with rate (1 - η·μ) for μ-strongly convex f
ALGORITHM: 梯度下降法
INPUT: f(目标函数), ∇f(梯度), x₀(初始值), η(步长), ε(容差)
OUTPUT: x*(最小值点)

1. k ← 0
2. 当‖∇f(xₖ)‖ > ε时:
   2.1 xₖ₊₁ ← xₖ - η∇f(xₖ)
   2.2 k ← k + 1
3. 返回xₖ

复杂度: O(迭代次数 × 梯度计算成本)
收敛性: 对于μ-强凸函数f,收敛速率为线性,速率为(1 - η·μ)

Newton's Method

牛顿法

ALGORITHM: Newton-Raphson
INPUT: f, ∇f, ∇²f, x₀, ε
OUTPUT: x*

1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
   2.1 Solve ∇²f(xₖ)·d = -∇f(xₖ) for direction d
   2.2 xₖ₊₁ ← xₖ + d
   2.3 k ← k + 1
3. RETURN xₖ

COMPLEXITY: O(iterations × p³) for p-dimensional
CONVERGENCE: Quadratic near solution
ALGORITHM: 牛顿-拉夫逊法
INPUT: f, ∇f, ∇²f(海森矩阵), x₀, ε
OUTPUT: x*

1. k ← 0
2. 当‖∇f(xₖ)‖ > ε时:
   2.1 求解∇²f(xₖ)·d = -∇f(xₖ)得到方向d
   2.2 xₖ₊₁ ← xₖ + d
   2.3 k ← k + 1
3. 返回xₖ

复杂度: p维问题每次迭代O(p³)
收敛性: 在解附近为二次收敛

EM Algorithm Template

EM算法模板

ALGORITHM: Expectation-Maximization
INPUT: Data Y, model parameters θ₀, tolerance ε
OUTPUT: MLE θ̂

1. θ ← θ₀
2. REPEAT:
   2.1 E-STEP: Compute Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ]
   2.2 M-STEP: θ_new ← argmax_θ' Q(θ'|θ)
   2.3 Δ ← |θ_new - θ|
   2.4 θ ← θ_new
3. UNTIL Δ < ε
4. RETURN θ

CONVERGENCE: Monotonic increase in likelihood
             Linear rate near optimum
ALGORITHM: 期望最大化算法(Expectation-Maximization)
INPUT: 数据Y, 模型参数θ₀, 容差ε
OUTPUT: 极大似然估计θ̂

1. θ ← θ₀
2. 重复:
   2.1 E步骤:计算Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ]
   2.2 M步骤:θ_new ← argmax_θ' Q(θ'|θ)
   2.3 Δ ← |θ_new - θ|
   2.4 θ ← θ_new
3. 直到Δ < ε
4. 返回θ

收敛性: 似然值单调递增
         在最优解附近为线性收敛速率

Bootstrap Algorithms

自助法算法

Nonparametric Bootstrap

非参数自助法

ALGORITHM: Nonparametric Bootstrap
INPUT: Data X of size n, statistic T, B (number of replicates)
OUTPUT: SE estimate, CI

1. FOR b = 1 to B:
   1.1 Draw X*_b by sampling n observations with replacement from X
   1.2 Compute T*_b = T(X*_b)
2. SE_boot ← SD({T*_1, ..., T*_B})
3. CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
4. RETURN (SE_boot, CI_percentile)

COMPLEXITY: O(B × cost(T))
NOTES: B ≥ 1000 for SE, B ≥ 10000 for percentile CI
ALGORITHM: 非参数自助法
INPUT: 规模为n的数据集X, 统计量T, B(重复次数)
OUTPUT: 标准误估计值, 置信区间

1. 对于b = 1到B:
   1.1 从X中有放回地抽取n个样本得到X*_b
   1.2 计算T*_b = T(X*_b)
2. SE_boot ← SD({T*_1,..., T*_B})
3. 百分位数置信区间CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
4. 返回(SE_boot, CI_percentile)

复杂度: O(B × 统计量计算成本)
注意事项: 估计标准误时B ≥ 1000,估计百分位数置信区间时B ≥ 10000

Parametric Bootstrap

参数自助法

ALGORITHM: Parametric Bootstrap
INPUT: Data X, parametric model M, B replicates
OUTPUT: SE estimate

1. Fit θ̂ = MLE(X, M)
2. FOR b = 1 to B:
   2.1 Generate X*_b ~ M(θ̂)
   2.2 Compute θ̂*_b = MLE(X*_b, M)
3. SE_boot ← SD({θ̂*_1, ..., θ̂*_B})
4. RETURN SE_boot
ALGORITHM: 参数自助法
INPUT: 数据集X, 参数模型M, B次重复
OUTPUT: 标准误估计值

1. 拟合模型得到极大似然估计θ̂ = MLE(X, M)
2. 对于b = 1到B:
   2.1 根据模型M(θ̂)生成样本X*_b
   2.2 拟合X*_b得到θ̂*_b = MLE(X*_b, M)
3. SE_boot ← SD({θ̂*_1,..., θ̂*_B})
4. 返回SE_boot

Numerical Stability Notes

数值稳定性说明

Common Issues

常见问题

  1. Overflow/Underflow: Work on log scale
  2. Cancellation: Reformulate subtractions
  3. Ill-conditioning: Use regularization or pivoting
  4. Convergence: Add damping or line search
  1. 上溢/下溢:在对数尺度下进行计算
  2. 抵消误差:重新构造减法运算
  3. 病态性:使用正则化或选主元方法
  4. 收敛问题:添加阻尼或线搜索

Stability Techniques

稳定性技巧

r
undefined
r
undefined

Log-sum-exp trick

Log-sum-exp trick

log_sum_exp <- function(x) { max_x <- max(x) max_x + log(sum(exp(x - max_x))) }
log_sum_exp <- function(x) { max_x <- max(x) max_x + log(sum(exp(x - max_x))) }

Numerically stable variance

Numerically stable variance

stable_var <- function(x) { n <- length(x) m <- mean(x) sum((x - m)^2) / (n - 1) # One-pass with correction }
undefined
stable_var <- function(x) { n <- length(x) m <- mean(x) sum((x - m)^2) / (n - 1) # One-pass with correction }
undefined

Implementation Checklist

实现检查清单

Before Coding

编码前

  • Pseudocode written and reviewed
  • Complexity analyzed
  • Convergence conditions identified
  • Edge cases documented
  • Numerical stability considered
  • 编写并评审伪代码
  • 完成复杂度分析
  • 确定收敛条件
  • 记录边缘情况
  • 考虑数值稳定性

During Implementation

编码中

  • Match pseudocode structure
  • Add convergence monitoring
  • Handle edge cases
  • Log intermediate values (debug mode)
  • Add early stopping
  • 与伪代码结构保持一致
  • 添加收敛监控
  • 处理边缘情况
  • 记录中间值(调试模式)
  • 添加早停机制

After Implementation

编码后

  • Unit tests for components
  • Integration tests for full algorithm
  • Benchmark against reference implementation
  • Profile for bottlenecks
  • Document deviations from pseudocode
  • 编写组件单元测试
  • 编写全算法集成测试
  • 与参考实现进行基准对比
  • 分析性能瓶颈
  • 记录与伪代码的偏差

Key References

核心参考文献

  • CLRS
  • Numerical Recipes
  • CLRS(《算法导论》)
  • 《数值 Recipes》