algorithm-designer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAlgorithm Designer
统计算法设计器
You are an expert in designing and documenting statistical algorithms.
你是一位统计算法设计与记录方面的专家。
Algorithm Documentation Standards
统计算法文档标准
Required Components
必要组件
- Purpose: What problem does this solve?
- Input/Output: Precise specifications
- Pseudocode: Language-agnostic description
- Complexity: Time and space analysis
- Convergence: Conditions and guarantees
- Implementation notes: Practical considerations
- 用途:该算法解决什么问题?
- 输入/输出:精准的规格说明
- 伪代码(Pseudocode):与语言无关的描述
- 复杂度:时间与空间复杂度分析
- 收敛性:收敛条件与保证
- 实现说明:实际考量事项
Input/Output Specification
输入/输出规范
Formal Specification Template
正式规范模板
Every algorithm must have precise input/output documentation:
INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
- Y_i ∈ ℝ (continuous outcome)
- A_i ∈ {0,1} (binary treatment)
- M_i ∈ ℝ^d (d-dimensional mediator)
- X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ
OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)r
undefined每个算法都必须具备精准的输入/输出文档:
INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
- Y_i ∈ ℝ (continuous outcome)
- A_i ∈ {0,1} (binary treatment)
- M_i ∈ ℝ^d (d-dimensional mediator)
- X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ
OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)r
undefinedR implementation of formal I/O specification
R implementation of formal I/O specification
define_algorithm_io <- function() {
list(
input = list(
data = "data.frame with columns Y, A, M, X",
params = "list(tol = 1e-6, max_iter = 1000)",
models = "list(outcome_formula, mediator_formula, propensity_formula)"
),
output = list(
estimate = "numeric vector of parameter estimates",
se = "numeric vector of standard errors",
vcov = "variance-covariance matrix",
converged = "logical indicating convergence",
iterations = "integer count of iterations"
),
complexity = list(
time = "O(n * p^2) per iteration",
space = "O(n * p)",
iterations = "O(log(1/epsilon)) for Newton-type"
)
)
}
---define_algorithm_io <- function() {
list(
input = list(
data = "data.frame with columns Y, A, M, X",
params = "list(tol = 1e-6, max_iter = 1000)",
models = "list(outcome_formula, mediator_formula, propensity_formula)"
),
output = list(
estimate = "numeric vector of parameter estimates",
se = "numeric vector of standard errors",
vcov = "variance-covariance matrix",
converged = "logical indicating convergence",
iterations = "integer count of iterations"
),
complexity = list(
time = "O(n * p^2) per iteration",
space = "O(n * p)",
iterations = "O(log(1/epsilon)) for Newton-type"
)
)
}
---Convergence Criteria
收敛准则
Standard Convergence Conditions
标准收敛条件
| Criterion | Formula | Use Case |
|---|---|---|
| Absolute | $|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$ | Parameter convergence |
| Relative | $|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$ | Scale-invariant |
| Gradient | $|\nabla L(\theta^{(t)})| < \varepsilon$ | Optimization |
| Function | $|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$ | Objective convergence |
| Cauchy | $\max_{i} | \theta_i^{(t+1)} - \theta_i^{(t)} |
| 准则 | 公式 | 适用场景 |
|---|---|---|
| 绝对收敛 | $|\theta^{(t+1)} - \theta^{(t)}| < \varepsilon$ | 参数收敛 |
| 相对收敛 | $|\theta^{(t+1)} - \theta^{(t)}|/|\theta^{(t)}| < \varepsilon$ | 尺度不变场景 |
| 梯度收敛 | $|\nabla L(\theta^{(t)})| < \varepsilon$ | 优化问题 |
| 目标函数收敛 | $|L(\theta^{(t+1)}) - L(\theta^{(t)})| < \varepsilon$ | 目标函数收敛判断 |
| 柯西收敛 | $\max_{i} | \theta_i^{(t+1)} - \theta_i^{(t)} |
Mathematical Formulation
数学公式
Convergence tolerance: $\varepsilon = 10^{-6}$ (typical default)
Standard tolerances by application:
- Numerical optimization: $\varepsilon = 10^{-8}$
- Statistical estimation: $\varepsilon = 10^{-6}$
- Approximate methods: $\varepsilon = 10^{-4}$
收敛容差:$\varepsilon = 10^{-6}$(典型默认值)
按应用场景划分的标准容差:
- 数值优化:$\varepsilon = 10^{-8}$
- 统计估计:$\varepsilon = 10^{-6}$
- 近似方法:$\varepsilon = 10^{-4}$
Complexity Formulas
复杂度公式
Linear complexity $O(n)$: Operations grow proportionally to input size
$$T(n) = c \cdot n + O(1)$$
Quadratic complexity $O(n^2)$: Nested iterations over input
$$T(n) = c \cdot n^2 + O(n)$$
Linearithmic complexity $O(n \log n)$: Divide-and-conquer with linear work per level
$$T(n) = c \cdot n \log_2 n + O(n)$$
Space-Time Tradeoff:
$$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$
Convergence rate analysis:
- Linear convergence: $|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ where $0 < \rho < 1$
- Quadratic convergence: $|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
- Superlinear: $\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$
r
undefined线性复杂度 $O(n)$:操作次数随输入规模线性增长
$$T(n) = c \cdot n + O(1)$$
二次复杂度 $O(n^2)$:对输入进行嵌套迭代
$$T(n) = c \cdot n^2 + O(n)$$
线性对数复杂度 $O(n \log n)$:分治算法,每一层做线性工作
$$T(n) = c \cdot n \log_2 n + O(n)$$
时空权衡:
$$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$
收敛速率分析:
- 线性收敛:$|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ 其中 $0 < \rho < 1$
- 二次收敛:$|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
- 超线性收敛:$\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$
r
undefinedComprehensive convergence checking
Comprehensive convergence checking
check_convergence <- function(theta_new, theta_old, gradient = NULL,
objective_new = NULL, objective_old = NULL,
tol = 1e-6, method = "relative") {
switch(method,
"absolute" = {
# |θ^(t+1) - θ^t| < ε
converged <- max(abs(theta_new - theta_old)) < tol
criterion <- max(abs(theta_new - theta_old))
},
"relative" = {
# |θ^(t+1) - θ^t| / |θ^t| < ε
denom <- pmax(abs(theta_old), 1) # Avoid division by zero
converged <- max(abs(theta_new - theta_old) / denom) < tol
criterion <- max(abs(theta_new - theta_old) / denom)
},
"gradient" = {
# |∇L(θ)| < ε
stopifnot(!is.null(gradient))
converged <- sqrt(sum(gradient^2)) < tol
criterion <- sqrt(sum(gradient^2))
},
"objective" = {
# |L(θ^(t+1)) - L(θ^t)| < ε
stopifnot(!is.null(objective_new), !is.null(objective_old))
converged <- abs(objective_new - objective_old) < tol
criterion <- abs(objective_new - objective_old)
}
)
list(converged = converged, criterion = criterion, method = method)
}
check_convergence <- function(theta_new, theta_old, gradient = NULL,
objective_new = NULL, objective_old = NULL,
tol = 1e-6, method = "relative") {
switch(method,
"absolute" = {
# |θ^(t+1) - θ^t| < ε
converged <- max(abs(theta_new - theta_old)) < tol
criterion <- max(abs(theta_new - theta_old))
},
"relative" = {
# |θ^(t+1) - θ^t| / |θ^t| < ε
denom <- pmax(abs(theta_old), 1) # Avoid division by zero
converged <- max(abs(theta_new - theta_old) / denom) < tol
criterion <- max(abs(theta_new - theta_old) / denom)
},
"gradient" = {
# |∇L(θ)| < ε
stopifnot(!is.null(gradient))
converged <- sqrt(sum(gradient^2)) < tol
criterion <- sqrt(sum(gradient^2))
},
"objective" = {
# |L(θ^(t+1)) - L(θ^t)| < ε
stopifnot(!is.null(objective_new), !is.null(objective_old))
converged <- abs(objective_new - objective_old) < tol
criterion <- abs(objective_new - objective_old)
}
)
list(converged = converged, criterion = criterion, method = method)
}
Newton-Raphson with convergence monitoring
Newton-Raphson with convergence monitoring
newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) {
theta <- theta0
history <- list()
for (t in 1:max_iter) {
g <- grad(theta)
H <- hess(theta)
# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta <- solve(H, g)
theta_new <- theta - delta
# Check convergence
conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))
if (conv$converged) {
return(list(
estimate = theta_new,
iterations = t,
converged = TRUE,
history = history
))
}
theta <- theta_new}
list(estimate = theta, iterations = max_iter, converged = FALSE, history = history)
}
undefinednewton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) {
theta <- theta0
history <- list()
for (t in 1:max_iter) {
g <- grad(theta)
H <- hess(theta)
# Newton step: θ^(t+1) = θ^t - H^(-1) * g
# Time complexity: O(p^3) for matrix inversion
delta <- solve(H, g)
theta_new <- theta - delta
# Check convergence
conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))
if (conv$converged) {
return(list(
estimate = theta_new,
iterations = t,
converged = TRUE,
history = history
))
}
theta <- theta_new}
list(estimate = theta, iterations = max_iter, converged = FALSE, history = history)
}
undefinedComplexity and Convergence Relationship
复杂度与收敛性的关系
| Algorithm | Convergence Rate | Iterations to $\varepsilon$ |
|---|---|---|
| Gradient Descent | $O(1/t)$ | $O(1/\varepsilon)$ |
| Accelerated GD | $O(1/t^2)$ | $O(1/\sqrt{\varepsilon})$ |
| Newton-Raphson | Quadratic | $O(\log\log(1/\varepsilon))$ |
| EM Algorithm | Linear | $O(\log(1/\varepsilon))$ |
| Coordinate Descent | Linear | $O(p \cdot \log(1/\varepsilon))$ |
| 算法 | 收敛速率 | 达到$\varepsilon$所需迭代次数 |
|---|---|---|
| 梯度下降 | $O(1/t)$ | $O(1/\varepsilon)$ |
| 加速梯度下降 | $O(1/t^2)$ | $O(1/\sqrt{\varepsilon})$ |
| 牛顿-拉夫逊法 | 二次收敛 | $O(\log\log(1/\varepsilon))$ |
| EM算法 | 线性收敛 | $O(\log(1/\varepsilon))$ |
| 坐标下降法 | 线性收敛 | $O(p \cdot \log(1/\varepsilon))$ |
Pseudocode Conventions
伪代码规范
Standard Format
标准格式
ALGORITHM: [Name]
INPUT: [List inputs with types]
OUTPUT: [List outputs with types]
1. [Initialize]
2. [Main loop or procedure]
2.1 [Sub-step]
2.2 [Sub-step]
3. [Return]ALGORITHM: [算法名称]
INPUT: [带类型的输入列表]
OUTPUT: [带类型的输出列表]
1. [初始化步骤]
2. [主循环或核心流程]
2.1 [子步骤]
2.2 [子步骤]
3. [返回结果]Example: AIPW Estimator
示例:AIPW估计器
ALGORITHM: Augmented IPW for Mediation
INPUT:
- Data (Y, A, M, X) of size n
- Propensity model specification
- Outcome model specification
- Mediator model specification
OUTPUT:
- Point estimate ψ̂
- Standard error SE(ψ̂)
- 95% confidence interval
1. ESTIMATE NUISANCE FUNCTIONS
1.1 Fit propensity score: π̂(x) = P̂(A=1|X=x)
1.2 Fit mediator density: f̂(m|a,x)
1.3 Fit outcome regression: μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]
2. COMPUTE PSEUDO-OUTCOMES
For i = 1 to n:
2.1 Compute IPW weight: w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i))
2.2 Compute outcome prediction: μ̂_i = μ̂(A_i, M_i, X_i)
2.3 Compute augmentation term
2.4 φ_i = w_i(Y_i - μ̂_i) + [integration term]
3. ESTIMATE AND INFERENCE
3.1 ψ̂ = n⁻¹ Σᵢ φ_i
3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²)
3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]
4. RETURN (ψ̂, SE, CI)ALGORITHM: 中介分析增强逆概率加权法(Augmented IPW for Mediation)
INPUT:
- 规模为n的数据集(Y, A, M, X)
- 倾向得分模型规格
- 结果模型规格
- 中介变量模型规格
OUTPUT:
- 点估计值ψ̂
- 标准误SE(ψ̂)
- 95%置信区间
1. 估计干扰函数
1.1 拟合倾向得分:π̂(x) = P̂(A=1|X=x)
1.2 拟合中介变量密度:f̂(m|a,x)
1.3 拟合结果回归:μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]
2. 计算伪结果
对于i = 1到n:
2.1 计算IPW权重:w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i))
2.2 计算结果预测值:μ̂_i = μ̂(A_i, M_i, X_i)
2.3 计算增强项
2.4 φ_i = w_i(Y_i - μ̂_i) + [积分项]
3. 估计与推断
3.1 ψ̂ = n⁻¹ Σᵢ φ_i
3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²)
3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]
4. 返回(ψ̂, SE, CI)Complexity Analysis
复杂度分析
Big-O Notation Guide
大O符号指南
Formal Definition: $f(n) = O(g(n))$ if $\exists c, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$
| Complexity | Name | Example | Operations at n=1000 |
|---|---|---|---|
| $O(1)$ | Constant | Array access | 1 |
| $O(\log n)$ | Logarithmic | Binary search | ~10 |
| $O(n)$ | Linear | Single loop | 1,000 |
| $O(n \log n)$ | Linearithmic | Merge sort, FFT | ~10,000 |
| $O(n^2)$ | Quadratic | Nested loops | 1,000,000 |
| $O(n^3)$ | Cubic | Matrix multiplication | 1,000,000,000 |
| $O(2^n)$ | Exponential | Subset enumeration | ~10^301 |
正式定义:$f(n) = O(g(n))$ 当且仅当 $\exists c, n_0$ 使得对于所有 $n \geq n_0$,有 $f(n) \leq c \cdot g(n)$
| 复杂度 | 名称 | 示例 | n=1000时的操作次数 |
|---|---|---|---|
| $O(1)$ | 常数复杂度 | 数组访问 | 1 |
| $O(\log n)$ | 对数复杂度 | 二分查找 | ~10 |
| $O(n)$ | 线性复杂度 | 单循环遍历 | 1,000 |
| $O(n \log n)$ | 线性对数复杂度 | 归并排序、FFT | ~10,000 |
| $O(n^2)$ | 二次复杂度 | 嵌套循环 | 1,000,000 |
| $O(n^3)$ | 三次复杂度 | 矩阵乘法 | 1,000,000,000 |
| $O(2^n)$ | 指数复杂度 | 子集枚举 | ~10^301 |
Key Formulas
核心公式
Master Theorem for recurrences $T(n) = aT(n/b) + f(n)$:
- If $f(n) = O(n^{\log_b a - \epsilon})$ then $T(n) = \Theta(n^{\log_b a})$
- If $f(n) = \Theta(n^{\log_b a})$ then $T(n) = \Theta(n^{\log_b a} \log n)$
- If $f(n) = \Omega(n^{\log_b a + \epsilon})$ then $T(n) = \Theta(f(n))$
Sorting lower bound: Any comparison-based sort requires $\Omega(n \log n)$ comparisons
Matrix operations:
- Naive multiplication: $O(n^3)$
- Strassen: $O(n^{2.807})$
- Matrix inversion: $O(n^3)$ (same as multiplication)
r
undefined主定理适用于递推式 $T(n) = aT(n/b) + f(n)$:
- 若 $f(n) = O(n^{\log_b a - \epsilon})$,则 $T(n) = \Theta(n^{\log_b a})$
- 若 $f(n) = \Theta(n^{\log_b a})$,则 $T(n) = \Theta(n^{\log_b a} \log n)$
- 若 $f(n) = \Omega(n^{\log_b a + \epsilon})$,则 $T(n) = \Theta(f(n))$
排序下界:任何基于比较的排序算法都需要 $\Omega(n \log n)$ 次比较
矩阵操作:
- 朴素乘法:$O(n^3)$
- Strassen算法:$O(n^{2.807})$
- 矩阵求逆:$O(n^3)$(与乘法复杂度相同)
r
undefinedComplexity analysis helper
Complexity analysis helper
analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) {
times <- sapply(n_values, function(n) {
system.time(f(n))[["elapsed"]]
})
Fit log-log regression to estimate complexity
fit <- lm(log(times) ~ log(n_values))
estimated_power <- coef(fit)[2]
list(
times = data.frame(n = n_values, time = times),
estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"),
power = estimated_power
)
}
undefinedanalyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) {
times <- sapply(n_values, function(n) {
system.time(f(n))[["elapsed"]]
})
Fit log-log regression to estimate complexity
fit <- lm(log(times) ~ log(n_values))
estimated_power <- coef(fit)[2]
list(
times = data.frame(n = n_values, time = times),
estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"),
power = estimated_power
)
}
undefinedStatistical Algorithm Complexities
统计算法复杂度
| Algorithm | Time | Space |
|---|---|---|
| OLS | O(np² + p³) | O(np) |
| Logistic (Newton) | O(np² + p³) per iter | O(np) |
| Bootstrap (B reps) | O(B × base) | O(n) |
| MCMC (T iters) | O(T × per_iter) | O(n + T) |
| Cross-validation (K) | O(K × base) | O(n) |
| Random forest | O(n log n × B × p) | O(n × B) |
| 算法 | 时间复杂度 | 空间复杂度 |
|---|---|---|
| 普通最小二乘(OLS) | O(np² + p³) | O(np) |
| 逻辑回归(牛顿法) | 每次迭代O(np² + p³) | O(np) |
| 自助法(B次重复) | O(B × 基础算法复杂度) | O(n) |
| 马尔可夫链蒙特卡洛(T次迭代) | O(T × 每次迭代复杂度) | O(n + T) |
| 交叉验证(K折) | O(K × 基础算法复杂度) | O(n) |
| 随机森林 | O(n log n × B × p) | O(n × B) |
Template for Analysis
分析模板
TIME COMPLEXITY:
- Initialization: O(...)
- Per iteration: O(...)
- Total (T iterations): O(...)
- Convergence typically in T = O(...) iterations
SPACE COMPLEXITY:
- Data storage: O(n × p)
- Working memory: O(...)
- Output: O(...)时间复杂度:
- 初始化:O(...)
- 每次迭代:O(...)
- 总复杂度(T次迭代):O(...)
- 通常收敛所需迭代次数T = O(...)
空间复杂度:
- 数据存储:O(n × p)
- 工作内存:O(...)
- 输出结果:O(...)Convergence Analysis
收敛性分析
Types of Convergence
收敛类型
- Finite termination: Exact solution in finite steps
- Linear: $|x_{k+1} - x^| \leq c|x_k - x^|$, $c < 1$
- Superlinear: $|x_{k+1} - x^| / |x_k - x^| \to 0$
- Quadratic: $|x_{k+1} - x^| \leq c|x_k - x^|^2$
- 有限终止:在有限步骤内得到精确解
- 线性收敛:$|x_{k+1} - x^| \leq c|x_k - x^|$,$c < 1$
- 超线性收敛:$|x_{k+1} - x^| / |x_k - x^| \to 0$
- 二次收敛:$|x_{k+1} - x^| \leq c|x_k - x^|^2$
Convergence Documentation Template
收敛性文档模板
CONVERGENCE:
- Type: [Linear/Superlinear/Quadratic]
- Rate: [Expression]
- Conditions: [What must hold]
- Stopping criterion: [When to stop]
- Typical iterations: [Order of magnitude]收敛性:
- 类型:[线性/超线性/二次]
- 速率:[表达式]
- 条件:[需满足的条件]
- 停止准则:[停止时机]
- 典型迭代次数:[数量级]Optimization Algorithms
优化算法
Gradient-Based Methods
基于梯度的方法
ALGORITHM: Gradient Descent
INPUT: f (objective), ∇f (gradient), x₀ (initial), η (step size), ε (tolerance)
OUTPUT: x* (minimizer)
1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
2.1 xₖ₊₁ ← xₖ - η∇f(xₖ)
2.2 k ← k + 1
3. RETURN xₖ
COMPLEXITY: O(iterations × gradient_cost)
CONVERGENCE: Linear with rate (1 - η·μ) for μ-strongly convex fALGORITHM: 梯度下降法
INPUT: f(目标函数), ∇f(梯度), x₀(初始值), η(步长), ε(容差)
OUTPUT: x*(最小值点)
1. k ← 0
2. 当‖∇f(xₖ)‖ > ε时:
2.1 xₖ₊₁ ← xₖ - η∇f(xₖ)
2.2 k ← k + 1
3. 返回xₖ
复杂度: O(迭代次数 × 梯度计算成本)
收敛性: 对于μ-强凸函数f,收敛速率为线性,速率为(1 - η·μ)Newton's Method
牛顿法
ALGORITHM: Newton-Raphson
INPUT: f, ∇f, ∇²f, x₀, ε
OUTPUT: x*
1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
2.1 Solve ∇²f(xₖ)·d = -∇f(xₖ) for direction d
2.2 xₖ₊₁ ← xₖ + d
2.3 k ← k + 1
3. RETURN xₖ
COMPLEXITY: O(iterations × p³) for p-dimensional
CONVERGENCE: Quadratic near solutionALGORITHM: 牛顿-拉夫逊法
INPUT: f, ∇f, ∇²f(海森矩阵), x₀, ε
OUTPUT: x*
1. k ← 0
2. 当‖∇f(xₖ)‖ > ε时:
2.1 求解∇²f(xₖ)·d = -∇f(xₖ)得到方向d
2.2 xₖ₊₁ ← xₖ + d
2.3 k ← k + 1
3. 返回xₖ
复杂度: p维问题每次迭代O(p³)
收敛性: 在解附近为二次收敛EM Algorithm Template
EM算法模板
ALGORITHM: Expectation-Maximization
INPUT: Data Y, model parameters θ₀, tolerance ε
OUTPUT: MLE θ̂
1. θ ← θ₀
2. REPEAT:
2.1 E-STEP: Compute Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ]
2.2 M-STEP: θ_new ← argmax_θ' Q(θ'|θ)
2.3 Δ ← |θ_new - θ|
2.4 θ ← θ_new
3. UNTIL Δ < ε
4. RETURN θ
CONVERGENCE: Monotonic increase in likelihood
Linear rate near optimumALGORITHM: 期望最大化算法(Expectation-Maximization)
INPUT: 数据Y, 模型参数θ₀, 容差ε
OUTPUT: 极大似然估计θ̂
1. θ ← θ₀
2. 重复:
2.1 E步骤:计算Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ]
2.2 M步骤:θ_new ← argmax_θ' Q(θ'|θ)
2.3 Δ ← |θ_new - θ|
2.4 θ ← θ_new
3. 直到Δ < ε
4. 返回θ
收敛性: 似然值单调递增
在最优解附近为线性收敛速率Bootstrap Algorithms
自助法算法
Nonparametric Bootstrap
非参数自助法
ALGORITHM: Nonparametric Bootstrap
INPUT: Data X of size n, statistic T, B (number of replicates)
OUTPUT: SE estimate, CI
1. FOR b = 1 to B:
1.1 Draw X*_b by sampling n observations with replacement from X
1.2 Compute T*_b = T(X*_b)
2. SE_boot ← SD({T*_1, ..., T*_B})
3. CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
4. RETURN (SE_boot, CI_percentile)
COMPLEXITY: O(B × cost(T))
NOTES: B ≥ 1000 for SE, B ≥ 10000 for percentile CIALGORITHM: 非参数自助法
INPUT: 规模为n的数据集X, 统计量T, B(重复次数)
OUTPUT: 标准误估计值, 置信区间
1. 对于b = 1到B:
1.1 从X中有放回地抽取n个样本得到X*_b
1.2 计算T*_b = T(X*_b)
2. SE_boot ← SD({T*_1,..., T*_B})
3. 百分位数置信区间CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
4. 返回(SE_boot, CI_percentile)
复杂度: O(B × 统计量计算成本)
注意事项: 估计标准误时B ≥ 1000,估计百分位数置信区间时B ≥ 10000Parametric Bootstrap
参数自助法
ALGORITHM: Parametric Bootstrap
INPUT: Data X, parametric model M, B replicates
OUTPUT: SE estimate
1. Fit θ̂ = MLE(X, M)
2. FOR b = 1 to B:
2.1 Generate X*_b ~ M(θ̂)
2.2 Compute θ̂*_b = MLE(X*_b, M)
3. SE_boot ← SD({θ̂*_1, ..., θ̂*_B})
4. RETURN SE_bootALGORITHM: 参数自助法
INPUT: 数据集X, 参数模型M, B次重复
OUTPUT: 标准误估计值
1. 拟合模型得到极大似然估计θ̂ = MLE(X, M)
2. 对于b = 1到B:
2.1 根据模型M(θ̂)生成样本X*_b
2.2 拟合X*_b得到θ̂*_b = MLE(X*_b, M)
3. SE_boot ← SD({θ̂*_1,..., θ̂*_B})
4. 返回SE_bootNumerical Stability Notes
数值稳定性说明
Common Issues
常见问题
- Overflow/Underflow: Work on log scale
- Cancellation: Reformulate subtractions
- Ill-conditioning: Use regularization or pivoting
- Convergence: Add damping or line search
- 上溢/下溢:在对数尺度下进行计算
- 抵消误差:重新构造减法运算
- 病态性:使用正则化或选主元方法
- 收敛问题:添加阻尼或线搜索
Stability Techniques
稳定性技巧
r
undefinedr
undefinedLog-sum-exp trick
Log-sum-exp trick
log_sum_exp <- function(x) {
max_x <- max(x)
max_x + log(sum(exp(x - max_x)))
}
log_sum_exp <- function(x) {
max_x <- max(x)
max_x + log(sum(exp(x - max_x)))
}
Numerically stable variance
Numerically stable variance
stable_var <- function(x) {
n <- length(x)
m <- mean(x)
sum((x - m)^2) / (n - 1) # One-pass with correction
}
undefinedstable_var <- function(x) {
n <- length(x)
m <- mean(x)
sum((x - m)^2) / (n - 1) # One-pass with correction
}
undefinedImplementation Checklist
实现检查清单
Before Coding
编码前
- Pseudocode written and reviewed
- Complexity analyzed
- Convergence conditions identified
- Edge cases documented
- Numerical stability considered
- 编写并评审伪代码
- 完成复杂度分析
- 确定收敛条件
- 记录边缘情况
- 考虑数值稳定性
During Implementation
编码中
- Match pseudocode structure
- Add convergence monitoring
- Handle edge cases
- Log intermediate values (debug mode)
- Add early stopping
- 与伪代码结构保持一致
- 添加收敛监控
- 处理边缘情况
- 记录中间值(调试模式)
- 添加早停机制
After Implementation
编码后
- Unit tests for components
- Integration tests for full algorithm
- Benchmark against reference implementation
- Profile for bottlenecks
- Document deviations from pseudocode
- 编写组件单元测试
- 编写全算法集成测试
- 与参考实现进行基准对比
- 分析性能瓶颈
- 记录与伪代码的偏差
Key References
核心参考文献
- CLRS
- Numerical Recipes
- CLRS(《算法导论》)
- 《数值 Recipes》