numerical-methods

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Numerical Methods

数值方法

You are an expert in numerical stability and computational aspects of statistical methods.
您是数值稳定性和统计方法计算方面的专家。

Floating-Point Fundamentals

浮点数基础

IEEE 754 Double Precision

IEEE 754双精度

  • Precision: ~15-17 significant decimal digits
  • Range: ~10⁻³⁰⁸ to 10³⁰⁸
  • Machine epsilon: ε ≈ 2.2 × 10⁻¹⁶
  • Special values: Inf, -Inf, NaN
  • 精度:约15-17位有效十进制数字
  • 范围:约10⁻³⁰⁸ 至 10³⁰⁸
  • 机器精度:ε ≈ 2.2 × 10⁻¹⁶
  • 特殊值:Inf, -Inf, NaN

Key Constants in R

R语言中的关键常量

r
.Machine$double.eps      # ~2.22e-16 (machine epsilon)
.Machine$double.xmax     # ~1.80e+308 (max finite)
.Machine$double.xmin     # ~2.23e-308 (min positive normalized)
.Machine$double.neg.eps  # ~1.11e-16 (negative epsilon)
r
.Machine$double.eps      # ~2.22e-16 (机器精度)
.Machine$double.xmax     # ~1.80e+308 (最大有限值)
.Machine$double.xmin     # ~2.23e-308 (最小正规格化值)
.Machine$double.neg.eps  # ~1.11e-16 (负精度值)

Common Numerical Issues

常见数值问题

1. Catastrophic Cancellation

1. 灾难性抵消

When subtracting nearly equal numbers:
r
undefined
当减去几乎相等的数字时:
r
undefined

BAD: loses precision

糟糕:丢失精度

x <- 1e10 + 1 y <- 1e10 result <- x - y # Should be 1, may have errors
x <- 1e10 + 1 y <- 1e10 result <- x - y # 应为1,但可能存在误差

BETTER: reformulate to avoid subtraction

更佳:重新表述以避免减法

Example: Computing variance

示例:计算方差

var_bad <- mean(x^2) - mean(x)^2 # Can be negative! var_good <- sum((x - mean(x))^2) / (n-1) # Always non-negative
undefined
var_bad <- mean(x^2) - mean(x)^2 # 可能为负数! var_good <- sum((x - mean(x))^2) / (n-1) # 始终非负
undefined

2. Overflow/Underflow

2. 上溢/下溢

r
undefined
r
undefined

BAD: overflow

糟糕:上溢

prod(1:200) # Inf
prod(1:200) # Inf

GOOD: work on log scale

更佳:在对数尺度上计算

sum(log(1:200)) # Then exp() if needed
sum(log(1:200)) # 如有需要再取exp()

BAD: underflow in probabilities

糟糕:概率计算中的下溢

prod(dnorm(x)) # 0 for large x
prod(dnorm(x)) # 当x很大时结果为0

GOOD: sum log probabilities

更佳:对对数概率求和

sum(dnorm(x, log = TRUE))
undefined
sum(dnorm(x, log = TRUE))
undefined

3. Log-Sum-Exp Trick

3. Log-Sum-Exp技巧

Essential for working with log probabilities:
r
log_sum_exp <- function(log_x) {
  max_log <- max(log_x)
  if (is.infinite(max_log)) return(max_log)
  max_log + log(sum(exp(log_x - max_log)))
}
处理对数概率的必备技巧:
r
log_sum_exp <- function(log_x) {
  max_log <- max(log_x)
  if (is.infinite(max_log)) return(max_log)
  max_log + log(sum(exp(log_x - max_log)))
}

Example: log(exp(-1000) + exp(-1001))

示例:log(exp(-1000) + exp(-1001))

log_sum_exp(c(-1000, -1001)) # Correct: ~-999.69 log(exp(-1000) + exp(-1001)) # Wrong: -Inf
undefined
log_sum_exp(c(-1000, -1001)) # 正确结果:~-999.69 log(exp(-1000) + exp(-1001)) # 错误结果:-Inf
undefined

4. Softmax Stability

4. Softmax稳定性

r
undefined
r
undefined

BAD

糟糕的实现

softmax_bad <- function(x) exp(x) / sum(exp(x))
softmax_bad <- function(x) exp(x) / sum(exp(x))

GOOD

更佳的实现

softmax <- function(x) { x_max <- max(x) exp_x <- exp(x - x_max) exp_x / sum(exp_x) }
undefined
softmax <- function(x) { x_max <- max(x) exp_x <- exp(x - x_max) exp_x / sum(exp_x) }
undefined

Matrix Computations

矩阵计算

Conditioning

条件数

The condition number κ(A) measures sensitivity to perturbation:
  • κ(A) = ‖A‖ · ‖A⁻¹‖
  • Rule: Expect to lose log₁₀(κ) digits of accuracy
  • κ > 10¹⁵ means matrix is numerically singular
r
undefined
条件数κ(A)衡量矩阵对扰动的敏感度:
  • κ(A) = ‖A‖ · ‖A⁻¹‖
  • 规则:预计会丢失log₁₀(κ)位精度
  • κ > 10¹⁵ 表示矩阵在数值上是奇异的
r
undefined

Check condition number

检查条件数

kappa(X, exact = TRUE)
kappa(X, exact = TRUE)

For regression: check X'X conditioning

回归场景:检查X'X的条件数

kappa(crossprod(X))
undefined
kappa(crossprod(X))
undefined

Solving Linear Systems

线性方程组求解

Prefer: Decomposition methods over explicit inversion
r
undefined
优先选择:分解方法而非显式求逆
r
undefined

BAD: explicit inverse

糟糕:显式求逆

beta <- solve(t(X) %% X) %% t(X) %*% y
beta <- solve(t(X) %% X) %% t(X) %*% y

GOOD: QR decomposition

更佳:QR分解

beta <- qr.coef(qr(X), y)
beta <- qr.coef(qr(X), y)

BETTER for positive definite: Cholesky

对正定矩阵更优:Cholesky分解

R <- chol(crossprod(X)) beta <- backsolve(R, forwardsolve(t(R), crossprod(X, y)))
R <- chol(crossprod(X)) beta <- backsolve(R, forwardsolve(t(R), crossprod(X, y)))

For ill-conditioned: SVD/pseudoinverse

针对病态矩阵:SVD/伪逆

beta <- MASS::ginv(X) %*% y
undefined
beta <- MASS::ginv(X) %*% y
undefined

Symmetric Positive Definite Matrices

对称正定矩阵

Always use specialized methods:
r
undefined
始终使用专门的方法:
r
undefined

Cholesky for SPD

对称正定矩阵的Cholesky分解

L <- chol(Sigma)
L <- chol(Sigma)

Eigendecomposition

特征分解

eig <- eigen(Sigma, symmetric = TRUE)
eig <- eigen(Sigma, symmetric = TRUE)

Check positive definiteness

检查正定性

all(eigen(Sigma, symmetric = TRUE, only.values = TRUE)$values > 0)
undefined
all(eigen(Sigma, symmetric = TRUE, only.values = TRUE)$values > 0)
undefined

Optimization Stability

优化稳定性

Gradient Computation

梯度计算

r
undefined
r
undefined

Numerical gradient (for verification)

数值梯度(用于验证)

numerical_grad <- function(f, x, h = sqrt(.Machine$double.eps)) { sapply(seq_along(x), function(i) { x_plus <- x_minus <- x x_plus[i] <- x[i] + h x_minus[i] <- x[i] - h (f(x_plus) - f(x_minus)) / (2 * h) }) }
numerical_grad <- function(f, x, h = sqrt(.Machine$double.eps)) { sapply(seq_along(x), function(i) { x_plus <- x_minus <- x x_plus[i] <- x[i] + h x_minus[i] <- x[i] - h (f(x_plus) - f(x_minus)) / (2 * h) }) }

Central difference is O(h²) accurate

中心差分的精度为O(h²)

Forward difference is O(h) accurate

前向差分的精度为O(h)

undefined
undefined

Hessian Stability

海森矩阵稳定性

r
undefined
r
undefined

Check Hessian is positive definite at optimum

检查最优解处的海森矩阵是否为正定矩阵

check_hessian <- function(H, tol = 1e-8) { eigs <- eigen(H, symmetric = TRUE, only.values = TRUE)$values min_eig <- min(eigs)
list( positive_definite = min_eig > tol, min_eigenvalue = min_eig, condition_number = max(eigs) / min_eig ) }
undefined
check_hessian <- function(H, tol = 1e-8) { eigs <- eigen(H, symmetric = TRUE, only.values = TRUE)$values min_eig <- min(eigs)
list( positive_definite = min_eig > tol, min_eigenvalue = min_eig, condition_number = max(eigs) / min_eig ) }
undefined

Line Search

线搜索

For gradient descent stability:
r
backtracking_line_search <- function(f, x, d, grad, alpha = 1, rho = 0.5, c = 1e-4) {
  # Armijo condition
  while (f(x + alpha * d) > f(x) + c * alpha * sum(grad * d)) {
    alpha <- rho * alpha
    if (alpha < 1e-10) break
  }
  alpha
}
为梯度下降提供稳定性:
r
backtracking_line_search <- function(f, x, d, grad, alpha = 1, rho = 0.5, c = 1e-4) {
  # Armijo条件
  while (f(x + alpha * d) > f(x) + c * alpha * sum(grad * d)) {
    alpha <- rho * alpha
    if (alpha < 1e-10) break
  }
  alpha
}

Integration and Quadrature

积分与数值积分

Numerical Integration Guidelines

数值积分指南

r
undefined
r
undefined

Adaptive quadrature (default choice)

自适应积分(默认选择)

integrate(f, lower, upper)
integrate(f, lower, upper)

For infinite limits

无限区间积分

integrate(f, -Inf, Inf)
integrate(f, -Inf, Inf)

For highly oscillatory or peaked functions

针对高振荡或峰值函数

Increase subdivisions

增加细分次数

integrate(f, lower, upper, subdivisions = 1000)
integrate(f, lower, upper, subdivisions = 1000)

For known singularities, split the domain

对于已知奇点,拆分积分域

undefined
undefined

Monte Carlo Integration

蒙特卡洛积分

r
mc_integrate <- function(f, n, lower, upper) {
  x <- runif(n, lower, upper)
  fx <- sapply(x, f)

  estimate <- (upper - lower) * mean(fx)
  se <- (upper - lower) * sd(fx) / sqrt(n)

  list(value = estimate, se = se)
}
r
mc_integrate <- function(f, n, lower, upper) {
  x <- runif(n, lower, upper)
  fx <- sapply(x, f)

  estimate <- (upper - lower) * mean(fx)
  se <- (upper - lower) * sd(fx) / sqrt(n)

  list(value = estimate, se = se)
}

Root Finding

根查找

Newton-Raphson Stability

牛顿-拉夫逊法稳定性

r
newton_raphson <- function(f, df, x0, tol = 1e-8, max_iter = 100) {
  x <- x0
  for (i in 1:max_iter) {
    fx <- f(x)
    dfx <- df(x)

    # Check for near-zero derivative
    if (abs(dfx) < .Machine$double.eps * 100) {
      warning("Near-zero derivative")
      break
    }

    x_new <- x - fx / dfx

    if (abs(x_new - x) < tol) break
    x <- x_new
  }
  x
}
r
newton_raphson <- function(f, df, x0, tol = 1e-8, max_iter = 100) {
  x <- x0
  for (i in 1:max_iter) {
    fx <- f(x)
    dfx <- df(x)

    # 检查导数是否接近零
    if (abs(dfx) < .Machine$double.eps * 100) {
      warning("Near-zero derivative")
      break
    }

    x_new <- x - fx / dfx

    if (abs(x_new - x) < tol) break
    x <- x_new
  }
  x
}

Brent's Method

Brent方法

For robust root finding without derivatives:
r
uniroot(f, interval = c(lower, upper), tol = .Machine$double.eps^0.5)
无需导数的稳健根查找方法:
r
uniroot(f, interval = c(lower, upper), tol = .Machine$double.eps^0.5)

Statistical Computing Patterns

统计计算模式

Safe Likelihood Computation

安全的似然计算

r
undefined
r
undefined

Always work with log-likelihood

始终使用对数似然

log_lik <- function(theta, data) {

Compute log-likelihood, not likelihood

sum(dnorm(data, mean = theta[1], sd = theta[2], log = TRUE)) }
undefined
log_lik <- function(theta, data) {

计算对数似然而非似然

sum(dnorm(data, mean = theta[1], sd = theta[2], log = TRUE)) }
undefined

Robust Standard Errors

稳健标准误差

r
undefined
r
undefined

Sandwich estimator with numerical stability

具有数值稳定性的三明治估计量

sandwich_se <- function(score, hessian) {

Check Hessian conditioning

H_inv <- tryCatch( solve(hessian), error = function(e) MASS::ginv(hessian) )
meat <- crossprod(score) V <- H_inv %% meat %% H_inv
sqrt(diag(V)) }
undefined
sandwich_se <- function(score, hessian) {

检查海森矩阵的条件数

H_inv <- tryCatch( solve(hessian), error = function(e) MASS::ginv(hessian) )
meat <- crossprod(score) V <- H_inv %% meat %% H_inv
sqrt(diag(V)) }
undefined

Bootstrap with Error Handling

带错误处理的Bootstrap方法

r
safe_bootstrap <- function(data, statistic, R = 1000) {
  results <- numeric(R)
  failures <- 0

  for (i in 1:R) {
    boot_data <- data[sample(nrow(data), replace = TRUE), ]
    result <- tryCatch(
      statistic(boot_data),
      error = function(e) NA
    )
    results[i] <- result
    if (is.na(result)) failures <- failures + 1
  }

  if (failures > 0.1 * R) {
    warning(sprintf("%.1f%% bootstrap failures", 100 * failures / R))
  }

  list(
    estimate = mean(results, na.rm = TRUE),
    se = sd(results, na.rm = TRUE),
    failures = failures
  )
}
r
safe_bootstrap <- function(data, statistic, R = 1000) {
  results <- numeric(R)
  failures <- 0

  for (i in 1:R) {
    boot_data <- data[sample(nrow(data), replace = TRUE), ]
    result <- tryCatch(
      statistic(boot_data),
      error = function(e) NA
    )
    results[i] <- result
    if (is.na(result)) failures <- failures + 1
  }

  if (failures > 0.1 * R) {
    warning(sprintf("%.1f%% bootstrap failures", 100 * failures / R))
  }

  list(
    estimate = mean(results, na.rm = TRUE),
    se = sd(results, na.rm = TRUE),
    failures = failures
  )
}

Debugging Numerical Issues

数值问题调试

Diagnostic Checklist

诊断清单

  1. Check for NaN/Inf:
    any(is.nan(x))
    ,
    any(is.infinite(x))
  2. Check conditioning:
    kappa(matrix)
  3. Check eigenvalues: For PD matrices
  4. Check gradients: Numerically vs analytically
  5. Check scale: Variables on similar scales?
  1. 检查NaN/Inf
    any(is.nan(x))
    ,
    any(is.infinite(x))
  2. 检查条件数
    kappa(matrix)
  3. 检查特征值:针对正定矩阵
  4. 检查梯度:数值梯度与解析梯度对比
  5. 检查尺度:变量是否处于相似尺度?

Debugging Functions

调试函数

r
undefined
r
undefined

Trace NaN/Inf sources

追踪NaN/Inf的来源

debug_numeric <- function(x, name = "x") { cat(sprintf("%s: range [%.3g, %.3g], ", name, min(x), max(x))) cat(sprintf("NaN: %d, Inf: %d, -Inf: %d\n", sum(is.nan(x)), sum(x == Inf), sum(x == -Inf))) }
debug_numeric <- function(x, name = "x") { cat(sprintf("%s: range [%.3g, %.3g], ", name, min(x), max(x))) cat(sprintf("NaN: %d, Inf: %d, -Inf: %d\n", sum(is.nan(x)), sum(x == Inf), sum(x == -Inf))) }

Check relative error

检查相对误差

rel_error <- function(computed, true) { abs(computed - true) / max(abs(true), 1) }
undefined
rel_error <- function(computed, true) { abs(computed - true) / max(abs(true), 1) }
undefined

Best Practices Summary

最佳实践总结

  1. Always work on log scale for products of probabilities
  2. Use QR or Cholesky instead of matrix inversion
  3. Check conditioning before solving linear systems
  4. Center and scale predictors in regression
  5. Handle edge cases (empty data, singular matrices)
  6. Use existing implementations (LAPACK, BLAS) when possible
  7. Test with extreme values (very small, very large, near-zero)
  8. Compare analytical and numerical gradients
  9. Monitor convergence in iterative algorithms
  10. Document numerical assumptions and limitations
  1. 始终在对数尺度上操作概率的乘积
  2. 使用QR或Cholesky分解而非矩阵求逆
  3. 求解线性方程组前检查条件数
  4. 回归中的预测变量要中心化和标准化
  5. 处理边缘情况(空数据、奇异矩阵)
  6. 尽可能使用现有实现(LAPACK、BLAS)
  7. 用极端值测试(极小、极大、接近零的值)
  8. 对比解析梯度与数值梯度
  9. 监控迭代算法的收敛性
  10. 记录数值假设和局限性

Key References

关键参考文献

  • Higham
  • Golub & Van Loan
  • Higham
  • Golub & Van Loan