hardhat

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Creating Modeling Packages with hardhat

使用hardhat创建建模包

The hardhat package provides infrastructure for building modeling packages with consistent interfaces. It standardizes preprocessing via
mold()
(training) and
forge()
(prediction), handling formula, XY, and recipe inputs uniformly.
hardhat包为构建具有一致接口的建模包提供基础设施。它通过
mold()
(训练阶段)和
forge()
(预测阶段)标准化预处理流程,统一处理公式、XY和recipe输入。

Quick Reference

快速参考

TaskFunction
Preprocess training data
mold(x, y)
or
mold(formula, data)
Preprocess prediction data
forge(new_data, blueprint)
Create model object
new_model(..., blueprint, class)
XY blueprint
default_xy_blueprint(intercept = TRUE)
Formula blueprint
default_formula_blueprint(intercept = TRUE)
Recipe blueprint
default_recipe_blueprint(intercept = TRUE)
Format numeric predictions
spruce_numeric(pred)
Format class predictions
spruce_class(pred)
Format probability predictions
spruce_prob(pred)
Validate univariate outcome
validate_outcomes_are_univariate(outcomes)
Validate prediction size
validate_prediction_size(pred, new_data)
任务函数
预处理训练数据
mold(x, y)
mold(formula, data)
预处理预测数据
forge(new_data, blueprint)
创建模型对象
new_model(..., blueprint, class)
XY蓝图
default_xy_blueprint(intercept = TRUE)
公式蓝图
default_formula_blueprint(intercept = TRUE)
Recipe蓝图
default_recipe_blueprint(intercept = TRUE)
格式化数值型预测结果
spruce_numeric(pred)
格式化类别型预测结果
spruce_class(pred)
格式化概率型预测结果
spruce_prob(pred)
验证单变量输出
validate_outcomes_are_univariate(outcomes)
验证预测结果维度
validate_prediction_size(pred, new_data)

Package Architecture

包架构

Stage 1: Model Fitting

阶段1:模型拟合

User → simple_lm() methods → bridge → implementation → constructor
         (formula/xy/recipe)    ↓           ↓              ↓
                            mold()    lm.fit()      new_model()
用户 → simple_lm()方法 → 桥接层 → 实现层 → 构造器
         (公式/XY/recipe)    ↓           ↓              ↓
                            mold()    lm.fit()      new_model()

Stage 2: Model Prediction

阶段2:模型预测

User → predict.simple_lm() → bridge → implementation
              ↓                ↓            ↓
          forge()          switch()   predict_*_numeric()
用户 → predict.simple_lm() → 桥接层 → 实现层
              ↓                ↓            ↓
          forge()          switch()   predict_*_numeric()

Model Constructor

模型构造器

Create objects of your model class. Name:
new_<model_class>()
.
r
new_simple_lm <- function(coefs, coef_names, blueprint) {
  if (!is.numeric(coefs)) {
    stop("`coefs` should be a numeric vector.", call. = FALSE)
  }
  if (!is.character(coef_names)) {
    stop("`coef_names` should be a character vector.", call. = FALSE)
  }

  new_model(
    coefs = coefs,
    coef_names = coef_names,
    blueprint = blueprint,
    class = "simple_lm"
  )
}
创建自定义模型类的对象。命名规则:
new_<model_class>()
r
new_simple_lm <- function(coefs, coef_names, blueprint) {
  if (!is.numeric(coefs)) {
    stop("`coefs` should be a numeric vector.", call. = FALSE)
  }
  if (!is.character(coef_names)) {
    stop("`coef_names` should be a character vector.", call. = FALSE)
  }

  new_model(
    coefs = coefs,
    coef_names = coef_names,
    blueprint = blueprint,
    class = "simple_lm"
  )
}

Implementation Function

实现函数

Core algorithm. Name:
<model_class>_impl()
. Returns named list of model elements.
r
simple_lm_impl <- function(predictors, outcomes) {
  lm_fit <- lm.fit(predictors, outcomes)
  coefs <- lm_fit$coefficients

  list(
    coefs = unname(coefs),
    coef_names = names(coefs)
  )
}
核心算法实现。命名规则:
<model_class>_impl()
。返回包含模型元素的命名列表。
r
simple_lm_impl <- function(predictors, outcomes) {
  lm_fit <- lm.fit(predictors, outcomes)
  coefs <- lm_fit$coefficients

  list(
    coefs = unname(coefs),
    coef_names = names(coefs)
  )
}

Bridge Function

桥接函数

Connects user-facing methods to implementation. Converts
mold()
output to implementation format.
r
simple_lm_bridge <- function(processed) {
  validate_outcomes_are_univariate(processed$outcomes)

  predictors <- as.matrix(processed$predictors)
  outcomes <- processed$outcomes[[1]]

  fit <- simple_lm_impl(predictors, outcomes)

  new_simple_lm(
    coefs = fit$coefs,
    coef_names = fit$coef_names,
    blueprint = processed$blueprint
  )
}
连接用户接口方法与核心实现。将
mold()
的输出转换为实现层所需格式。
r
simple_lm_bridge <- function(processed) {
  validate_outcomes_are_univariate(processed$outcomes)

  predictors <- as.matrix(processed$predictors)
  outcomes <- processed$outcomes[[1]]

  fit <- simple_lm_impl(predictors, outcomes)

  new_simple_lm(
    coefs = fit$coefs,
    coef_names = fit$coef_names,
    blueprint = processed$blueprint
  )
}

User-Facing Fitting Function

面向用户的拟合函数

Generic with methods for each interface. Each method calls
mold()
then the bridge.
r
simple_lm <- function(x, ...) {
 UseMethod("simple_lm")
}

simple_lm.default <- function(x, ...) {
  stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}

simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
  blueprint <- default_formula_blueprint(intercept = intercept)
  processed <- mold(formula, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
  blueprint <- default_recipe_blueprint(intercept = intercept)
  processed <- mold(x, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}
为每种接口实现泛型方法。每个方法都会调用
mold()
,然后传入桥接函数。
r
simple_lm <- function(x, ...) {
 UseMethod("simple_lm")
}

simple_lm.default <- function(x, ...) {
  stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}

simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
  blueprint <- default_formula_blueprint(intercept = intercept)
  processed <- mold(formula, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
  blueprint <- default_recipe_blueprint(intercept = intercept)
  processed <- mold(x, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

Prediction Implementation

预测实现

One function per prediction type. Use
spruce_*()
for standardized output.
r
predict_simple_lm_numeric <- function(object, predictors) {
  coefs <- object$coefs
  pred <- as.vector(predictors %*% coefs)
  spruce_numeric(pred)  # Returns tibble with .pred column
}
每种预测类型对应一个函数。使用
spruce_*()
函数实现标准化输出。
r
predict_simple_lm_numeric <- function(object, predictors) {
  coefs <- object$coefs
  pred <- as.vector(predictors %*% coefs)
  spruce_numeric(pred)  # 返回包含.pred列的tibble
}

Prediction Bridge

预测桥接函数

Converts
forge()
output and switches on type.
r
predict_simple_lm_bridge <- function(type, object, predictors) {
  type <- rlang::arg_match(type, "numeric")
  predictors <- as.matrix(predictors)

  switch(
    type,
    numeric = predict_simple_lm_numeric(object, predictors)
  )
}
转换
forge()
的输出,并根据类型切换实现。
r
predict_simple_lm_bridge <- function(type, object, predictors) {
  type <- rlang::arg_match(type, "numeric")
  predictors <- as.matrix(predictors)

  switch(
    type,
    numeric = predict_simple_lm_numeric(object, predictors)
  )
}

User-Facing Predict Method

面向用户的预测方法

Call
forge()
with blueprint, then bridge, then validate.
r
predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
  processed <- forge(new_data, object$blueprint)
  out <- predict_simple_lm_bridge(type, object, processed$predictors)
  validate_prediction_size(out, new_data)
  out
}
调用带蓝图参数的
forge()
,传入桥接函数,然后验证结果。
r
predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
  processed <- forge(new_data, object$blueprint)
  out <- predict_simple_lm_bridge(type, object, processed$predictors)
  validate_prediction_size(out, new_data)
  out
}

mold() Details

mold() 细节

Returns:
predictors
(tibble),
outcomes
(tibble),
extras
,
blueprint
.
返回值包含:
predictors
(tibble格式)、
outcomes
(tibble格式)、
extras
blueprint

Blueprint Options

蓝图选项

BlueprintKey Options
default_xy_blueprint()
intercept
default_formula_blueprint()
intercept
,
indicators
("traditional", "none", "one_hot")
default_recipe_blueprint()
intercept
蓝图关键选项
default_xy_blueprint()
intercept
default_formula_blueprint()
intercept
,
indicators
(可选值:"traditional", "none", "one_hot")
default_recipe_blueprint()
intercept

Formula Special Behaviors

公式的特殊行为

  • No intercept by default (unlike base R)
  • indicators = "none"
    keeps factors unexpanded
  • Multivariate outcomes:
    y1 + y2 ~ x1 + x2
    (not
    cbind()
    )
  • 默认不包含截距项(与基础R不同)
  • indicators = "none"
    时保留因子类型不展开
  • 多变量输出格式:
    y1 + y2 ~ x1 + x2
    (而非
    cbind()

forge() Validation

forge() 验证机制

Automatically validates new data matches training data:
  • Column names must match
  • Column types must be compatible
  • Factor levels must be subset of training levels
  • Lossy conversions emit warnings (novel levels → NA)
r
undefined
自动验证新数据与训练数据是否匹配:
  • 列名必须一致
  • 列类型必须兼容
  • 因子水平必须是训练数据因子水平的子集
  • 有损转换会发出警告(如新出现的因子水平会被转换为NA)
r
undefined

Missing column → error

缺失列 → 报错

Wrong type (double for factor) → error

类型错误(如因子列传入数值)→ 报错

Character for factor → silent conversion

字符型转换为因子型 → 静默转换

Novel factor level → warning + NA

新因子水平 → 警告 + 转换为NA

undefined
undefined

Spruce Functions

Spruce系列函数

Standardize prediction output to tidymodels conventions:
FunctionOutput Column
spruce_numeric(pred)
.pred
spruce_class(pred)
.pred_class
spruce_prob(pred_matrix)
.pred_{class_name}
将预测结果标准化为tidymodels规范格式:
函数输出列
spruce_numeric(pred)
.pred
spruce_class(pred)
.pred_class
spruce_prob(pred_matrix)
.pred_{class_name}

Validation Functions

验证函数

FunctionChecks
validate_outcomes_are_univariate()
Single outcome column
validate_prediction_size()
Output rows == input rows
validate_outcomes_are_numeric()
Numeric outcomes
validate_predictors_are_numeric()
Numeric predictors
函数检查内容
validate_outcomes_are_univariate()
输出是否为单变量列
validate_prediction_size()
输出行数是否与输入行数一致
validate_outcomes_are_numeric()
输出是否为数值型
validate_predictors_are_numeric()
预测变量是否为数值型

See Also

另请参阅

  • designing-tidy-r-functions: Function API design
  • r-metaprogramming: Expression manipulation (if customizing blueprints)
  • testing-r-packages: Testing patterns
  • designing-tidy-r-functions:函数API设计
  • r-metaprogramming:表达式操作(自定义蓝图时会用到)
  • testing-r-packages:测试模式

Vignettes

文档小册(Vignettes)

Access detailed documentation via R:
r
undefined
通过R访问详细文档:
r
undefined

Open vignette in browser

在浏览器中打开文档小册

RShowDoc("mold", package = "hardhat") # Molding data for modeling RShowDoc("forge", package = "hardhat") # Forging data for predictions RShowDoc("package", package = "hardhat") # Creating modeling packages
RShowDoc("mold", package = "hardhat") # 建模数据预处理 RShowDoc("forge", package = "hardhat") # 预测数据处理 RShowDoc("package", package = "hardhat") # 创建建模包

Or browse all vignettes

或浏览所有文档小册

browseVignettes("hardhat")
undefined
browseVignettes("hardhat")
undefined

External Resources

外部资源