hardhat

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Creating Modeling Packages with hardhat

使用hardhat创建建模包

The hardhat package provides infrastructure for building modeling packages with consistent interfaces. It standardizes preprocessing via

mold()

(training) and

forge()

(prediction), handling formula, XY, and recipe inputs uniformly.

hardhat包为构建具有一致接口的建模包提供基础设施。它通过

mold()

（训练阶段）和

forge()

（预测阶段）标准化预处理流程，统一处理公式、XY和recipe输入。

Quick Reference

快速参考

Task	Function
Preprocess training data	`mold(x, y)` or `mold(formula, data)`
Preprocess prediction data	`forge(new_data, blueprint)`
Create model object	`new_model(..., blueprint, class)`
XY blueprint	`default_xy_blueprint(intercept = TRUE)`
Formula blueprint	`default_formula_blueprint(intercept = TRUE)`
Recipe blueprint	`default_recipe_blueprint(intercept = TRUE)`
Format numeric predictions	`spruce_numeric(pred)`
Format class predictions	`spruce_class(pred)`
Format probability predictions	`spruce_prob(pred)`
Validate univariate outcome	`validate_outcomes_are_univariate(outcomes)`
Validate prediction size	`validate_prediction_size(pred, new_data)`

任务	函数
预处理训练数据	`mold(x, y)` 或 `mold(formula, data)`
预处理预测数据	`forge(new_data, blueprint)`
创建模型对象	`new_model(..., blueprint, class)`
XY蓝图	`default_xy_blueprint(intercept = TRUE)`
公式蓝图	`default_formula_blueprint(intercept = TRUE)`
Recipe蓝图	`default_recipe_blueprint(intercept = TRUE)`
格式化数值型预测结果	`spruce_numeric(pred)`
格式化类别型预测结果	`spruce_class(pred)`
格式化概率型预测结果	`spruce_prob(pred)`
验证单变量输出	`validate_outcomes_are_univariate(outcomes)`
验证预测结果维度	`validate_prediction_size(pred, new_data)`

Package Architecture

包架构

Stage 1: Model Fitting

阶段1：模型拟合

User → simple_lm() methods → bridge → implementation → constructor
         (formula/xy/recipe)    ↓           ↓              ↓
                            mold()    lm.fit()      new_model()

用户 → simple_lm()方法 → 桥接层 → 实现层 → 构造器
         (公式/XY/recipe)    ↓           ↓              ↓
                            mold()    lm.fit()      new_model()

Stage 2: Model Prediction

阶段2：模型预测

User → predict.simple_lm() → bridge → implementation
              ↓                ↓            ↓
          forge()          switch()   predict_*_numeric()

用户 → predict.simple_lm() → 桥接层 → 实现层
              ↓                ↓            ↓
          forge()          switch()   predict_*_numeric()

Model Constructor

模型构造器

Create objects of your model class. Name:

new_<model_class>()

new_simple_lm <- function(coefs, coef_names, blueprint) {
  if (!is.numeric(coefs)) {
    stop("`coefs` should be a numeric vector.", call. = FALSE)
  }
  if (!is.character(coef_names)) {
    stop("`coef_names` should be a character vector.", call. = FALSE)
  }

  new_model(
    coefs = coefs,
    coef_names = coef_names,
    blueprint = blueprint,
    class = "simple_lm"
  )
}

创建自定义模型类的对象。命名规则：

new_<model_class>()

。

new_simple_lm <- function(coefs, coef_names, blueprint) {
  if (!is.numeric(coefs)) {
    stop("`coefs` should be a numeric vector.", call. = FALSE)
  }
  if (!is.character(coef_names)) {
    stop("`coef_names` should be a character vector.", call. = FALSE)
  }

  new_model(
    coefs = coefs,
    coef_names = coef_names,
    blueprint = blueprint,
    class = "simple_lm"
  )
}

Implementation Function

实现函数

Core algorithm. Name:

<model_class>_impl()

. Returns named list of model elements.

simple_lm_impl <- function(predictors, outcomes) {
  lm_fit <- lm.fit(predictors, outcomes)
  coefs <- lm_fit$coefficients

  list(
    coefs = unname(coefs),
    coef_names = names(coefs)
  )
}

核心算法实现。命名规则：

<model_class>_impl()

。返回包含模型元素的命名列表。

simple_lm_impl <- function(predictors, outcomes) {
  lm_fit <- lm.fit(predictors, outcomes)
  coefs <- lm_fit$coefficients

  list(
    coefs = unname(coefs),
    coef_names = names(coefs)
  )
}

Bridge Function

桥接函数

Connects user-facing methods to implementation. Converts

mold()

output to implementation format.

simple_lm_bridge <- function(processed) {
  validate_outcomes_are_univariate(processed$outcomes)

  predictors <- as.matrix(processed$predictors)
  outcomes <- processed$outcomes[[1]]

  fit <- simple_lm_impl(predictors, outcomes)

  new_simple_lm(
    coefs = fit$coefs,
    coef_names = fit$coef_names,
    blueprint = processed$blueprint
  )
}

连接用户接口方法与核心实现。将

mold()

的输出转换为实现层所需格式。

simple_lm_bridge <- function(processed) {
  validate_outcomes_are_univariate(processed$outcomes)

  predictors <- as.matrix(processed$predictors)
  outcomes <- processed$outcomes[[1]]

  fit <- simple_lm_impl(predictors, outcomes)

  new_simple_lm(
    coefs = fit$coefs,
    coef_names = fit$coef_names,
    blueprint = processed$blueprint
  )
}

User-Facing Fitting Function

面向用户的拟合函数

Generic with methods for each interface. Each method calls

mold()

then the bridge.

simple_lm <- function(x, ...) {
 UseMethod("simple_lm")
}

simple_lm.default <- function(x, ...) {
  stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}

simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
  blueprint <- default_formula_blueprint(intercept = intercept)
  processed <- mold(formula, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
  blueprint <- default_recipe_blueprint(intercept = intercept)
  processed <- mold(x, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

为每种接口实现泛型方法。每个方法都会调用

mold()

，然后传入桥接函数。

simple_lm <- function(x, ...) {
 UseMethod("simple_lm")
}

simple_lm.default <- function(x, ...) {
  stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}

simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
  blueprint <- default_xy_blueprint(intercept = intercept)
  processed <- mold(x, y, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
  blueprint <- default_formula_blueprint(intercept = intercept)
  processed <- mold(formula, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
  blueprint <- default_recipe_blueprint(intercept = intercept)
  processed <- mold(x, data, blueprint = blueprint)
  simple_lm_bridge(processed)
}

Prediction Implementation

预测实现

One function per prediction type. Use

spruce_*()

for standardized output.

predict_simple_lm_numeric <- function(object, predictors) {
  coefs <- object$coefs
  pred <- as.vector(predictors %*% coefs)
  spruce_numeric(pred)  # Returns tibble with .pred column
}

每种预测类型对应一个函数。使用

spruce_*()

函数实现标准化输出。

predict_simple_lm_numeric <- function(object, predictors) {
  coefs <- object$coefs
  pred <- as.vector(predictors %*% coefs)
  spruce_numeric(pred)  # 返回包含.pred列的tibble
}

Prediction Bridge

预测桥接函数

Converts

forge()

output and switches on type.

predict_simple_lm_bridge <- function(type, object, predictors) {
  type <- rlang::arg_match(type, "numeric")
  predictors <- as.matrix(predictors)

  switch(
    type,
    numeric = predict_simple_lm_numeric(object, predictors)
  )
}

转换

forge()

的输出，并根据类型切换实现。

predict_simple_lm_bridge <- function(type, object, predictors) {
  type <- rlang::arg_match(type, "numeric")
  predictors <- as.matrix(predictors)

  switch(
    type,
    numeric = predict_simple_lm_numeric(object, predictors)
  )
}

User-Facing Predict Method

面向用户的预测方法

Call

forge()

with blueprint, then bridge, then validate.

predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
  processed <- forge(new_data, object$blueprint)
  out <- predict_simple_lm_bridge(type, object, processed$predictors)
  validate_prediction_size(out, new_data)
  out
}

调用带蓝图参数的

forge()

，传入桥接函数，然后验证结果。

predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
  processed <- forge(new_data, object$blueprint)
  out <- predict_simple_lm_bridge(type, object, processed$predictors)
  validate_prediction_size(out, new_data)
  out
}

mold() Details

mold() 细节

Returns:

predictors

(tibble),

outcomes

(tibble),

extras

blueprint

返回值包含：

predictors

（tibble格式）、

outcomes

（tibble格式）、

extras

、

blueprint

。

Blueprint Options

蓝图选项

Blueprint	Key Options
`default_xy_blueprint()`	`intercept`
`default_formula_blueprint()`	`intercept` , `indicators` ("traditional", "none", "one_hot")
`default_recipe_blueprint()`	`intercept`

蓝图	关键选项
`default_xy_blueprint()`	`intercept`
`default_formula_blueprint()`	`intercept` , `indicators` （可选值："traditional", "none", "one_hot"）
`default_recipe_blueprint()`	`intercept`

Formula Special Behaviors

公式的特殊行为

No intercept by default (unlike base R)
```
indicators = "none"
```
keeps factors unexpanded
Multivariate outcomes:
```
y1 + y2 ~ x1 + x2
```
(not
```
cbind()
```
)

默认不包含截距项（与基础R不同）
```
indicators = "none"
```
时保留因子类型不展开
多变量输出格式：
```
y1 + y2 ~ x1 + x2
```
（而非
```
cbind()
```
）

forge() Validation

forge() 验证机制

Automatically validates new data matches training data:

Column names must match
Column types must be compatible
Factor levels must be subset of training levels
Lossy conversions emit warnings (novel levels → NA)

undefined

自动验证新数据与训练数据是否匹配：

列名必须一致
列类型必须兼容
因子水平必须是训练数据因子水平的子集
有损转换会发出警告（如新出现的因子水平会被转换为NA）

undefined

Missing column → error

缺失列 → 报错

Wrong type (double for factor) → error

类型错误（如因子列传入数值）→ 报错

Character for factor → silent conversion

字符型转换为因子型 → 静默转换

Novel factor level → warning + NA

新因子水平 → 警告 + 转换为NA

undefined

undefined

Spruce Functions

Spruce系列函数

Standardize prediction output to tidymodels conventions:

Function	Output Column
`spruce_numeric(pred)`	`.pred`
`spruce_class(pred)`	`.pred_class`
`spruce_prob(pred_matrix)`	`.pred_{class_name}`

将预测结果标准化为tidymodels规范格式：

函数	输出列
`spruce_numeric(pred)`	`.pred`
`spruce_class(pred)`	`.pred_class`
`spruce_prob(pred_matrix)`	`.pred_{class_name}`

Validation Functions

验证函数

Function	Checks
`validate_outcomes_are_univariate()`	Single outcome column
`validate_prediction_size()`	Output rows == input rows
`validate_outcomes_are_numeric()`	Numeric outcomes
`validate_predictors_are_numeric()`	Numeric predictors

函数	检查内容
`validate_outcomes_are_univariate()`	输出是否为单变量列
`validate_prediction_size()`	输出行数是否与输入行数一致
`validate_outcomes_are_numeric()`	输出是否为数值型
`validate_predictors_are_numeric()`	预测变量是否为数值型

另请参阅

designing-tidy-r-functions: Function API design
r-metaprogramming: Expression manipulation (if customizing blueprints)
testing-r-packages: Testing patterns

designing-tidy-r-functions：函数API设计
r-metaprogramming：表达式操作（自定义蓝图时会用到）
testing-r-packages：测试模式

Vignettes

文档小册（Vignettes）

Access detailed documentation via R:

undefined

通过R访问详细文档：

undefined

Open vignette in browser

在浏览器中打开文档小册

RShowDoc("mold", package = "hardhat") # Molding data for modeling RShowDoc("forge", package = "hardhat") # Forging data for predictions RShowDoc("package", package = "hardhat") # Creating modeling packages

RShowDoc("mold", package = "hardhat") # 建模数据预处理 RShowDoc("forge", package = "hardhat") # 预测数据处理 RShowDoc("package", package = "hardhat") # 创建建模包

Or browse all vignettes

或浏览所有文档小册

browseVignettes("hardhat")

undefined