hardhat
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCreating Modeling Packages with hardhat
使用hardhat创建建模包
The hardhat package provides infrastructure for building modeling packages with consistent interfaces. It standardizes preprocessing via (training) and (prediction), handling formula, XY, and recipe inputs uniformly.
mold()forge()hardhat包为构建具有一致接口的建模包提供基础设施。它通过(训练阶段)和(预测阶段)标准化预处理流程,统一处理公式、XY和recipe输入。
mold()forge()Quick Reference
快速参考
| Task | Function |
|---|---|
| Preprocess training data | |
| Preprocess prediction data | |
| Create model object | |
| XY blueprint | |
| Formula blueprint | |
| Recipe blueprint | |
| Format numeric predictions | |
| Format class predictions | |
| Format probability predictions | |
| Validate univariate outcome | |
| Validate prediction size | |
| 任务 | 函数 |
|---|---|
| 预处理训练数据 | |
| 预处理预测数据 | |
| 创建模型对象 | |
| XY蓝图 | |
| 公式蓝图 | |
| Recipe蓝图 | |
| 格式化数值型预测结果 | |
| 格式化类别型预测结果 | |
| 格式化概率型预测结果 | |
| 验证单变量输出 | |
| 验证预测结果维度 | |
Package Architecture
包架构
Stage 1: Model Fitting
阶段1:模型拟合
User → simple_lm() methods → bridge → implementation → constructor
(formula/xy/recipe) ↓ ↓ ↓
mold() lm.fit() new_model()用户 → simple_lm()方法 → 桥接层 → 实现层 → 构造器
(公式/XY/recipe) ↓ ↓ ↓
mold() lm.fit() new_model()Stage 2: Model Prediction
阶段2:模型预测
User → predict.simple_lm() → bridge → implementation
↓ ↓ ↓
forge() switch() predict_*_numeric()用户 → predict.simple_lm() → 桥接层 → 实现层
↓ ↓ ↓
forge() switch() predict_*_numeric()Model Constructor
模型构造器
Create objects of your model class. Name: .
new_<model_class>()r
new_simple_lm <- function(coefs, coef_names, blueprint) {
if (!is.numeric(coefs)) {
stop("`coefs` should be a numeric vector.", call. = FALSE)
}
if (!is.character(coef_names)) {
stop("`coef_names` should be a character vector.", call. = FALSE)
}
new_model(
coefs = coefs,
coef_names = coef_names,
blueprint = blueprint,
class = "simple_lm"
)
}创建自定义模型类的对象。命名规则:。
new_<model_class>()r
new_simple_lm <- function(coefs, coef_names, blueprint) {
if (!is.numeric(coefs)) {
stop("`coefs` should be a numeric vector.", call. = FALSE)
}
if (!is.character(coef_names)) {
stop("`coef_names` should be a character vector.", call. = FALSE)
}
new_model(
coefs = coefs,
coef_names = coef_names,
blueprint = blueprint,
class = "simple_lm"
)
}Implementation Function
实现函数
Core algorithm. Name: . Returns named list of model elements.
<model_class>_impl()r
simple_lm_impl <- function(predictors, outcomes) {
lm_fit <- lm.fit(predictors, outcomes)
coefs <- lm_fit$coefficients
list(
coefs = unname(coefs),
coef_names = names(coefs)
)
}核心算法实现。命名规则:。返回包含模型元素的命名列表。
<model_class>_impl()r
simple_lm_impl <- function(predictors, outcomes) {
lm_fit <- lm.fit(predictors, outcomes)
coefs <- lm_fit$coefficients
list(
coefs = unname(coefs),
coef_names = names(coefs)
)
}Bridge Function
桥接函数
Connects user-facing methods to implementation. Converts output to implementation format.
mold()r
simple_lm_bridge <- function(processed) {
validate_outcomes_are_univariate(processed$outcomes)
predictors <- as.matrix(processed$predictors)
outcomes <- processed$outcomes[[1]]
fit <- simple_lm_impl(predictors, outcomes)
new_simple_lm(
coefs = fit$coefs,
coef_names = fit$coef_names,
blueprint = processed$blueprint
)
}连接用户接口方法与核心实现。将的输出转换为实现层所需格式。
mold()r
simple_lm_bridge <- function(processed) {
validate_outcomes_are_univariate(processed$outcomes)
predictors <- as.matrix(processed$predictors)
outcomes <- processed$outcomes[[1]]
fit <- simple_lm_impl(predictors, outcomes)
new_simple_lm(
coefs = fit$coefs,
coef_names = fit$coef_names,
blueprint = processed$blueprint
)
}User-Facing Fitting Function
面向用户的拟合函数
Generic with methods for each interface. Each method calls then the bridge.
mold()r
simple_lm <- function(x, ...) {
UseMethod("simple_lm")
}
simple_lm.default <- function(x, ...) {
stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}
simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
blueprint <- default_xy_blueprint(intercept = intercept)
processed <- mold(x, y, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
blueprint <- default_xy_blueprint(intercept = intercept)
processed <- mold(x, y, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
blueprint <- default_formula_blueprint(intercept = intercept)
processed <- mold(formula, data, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
blueprint <- default_recipe_blueprint(intercept = intercept)
processed <- mold(x, data, blueprint = blueprint)
simple_lm_bridge(processed)
}为每种接口实现泛型方法。每个方法都会调用,然后传入桥接函数。
mold()r
simple_lm <- function(x, ...) {
UseMethod("simple_lm")
}
simple_lm.default <- function(x, ...) {
stop("`simple_lm()` is not defined for a '", class(x)[1], "'.", call. = FALSE)
}
simple_lm.data.frame <- function(x, y, intercept = TRUE, ...) {
blueprint <- default_xy_blueprint(intercept = intercept)
processed <- mold(x, y, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.matrix <- function(x, y, intercept = TRUE, ...) {
blueprint <- default_xy_blueprint(intercept = intercept)
processed <- mold(x, y, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.formula <- function(formula, data, intercept = TRUE, ...) {
blueprint <- default_formula_blueprint(intercept = intercept)
processed <- mold(formula, data, blueprint = blueprint)
simple_lm_bridge(processed)
}
simple_lm.recipe <- function(x, data, intercept = TRUE, ...) {
blueprint <- default_recipe_blueprint(intercept = intercept)
processed <- mold(x, data, blueprint = blueprint)
simple_lm_bridge(processed)
}Prediction Implementation
预测实现
One function per prediction type. Use for standardized output.
spruce_*()r
predict_simple_lm_numeric <- function(object, predictors) {
coefs <- object$coefs
pred <- as.vector(predictors %*% coefs)
spruce_numeric(pred) # Returns tibble with .pred column
}每种预测类型对应一个函数。使用函数实现标准化输出。
spruce_*()r
predict_simple_lm_numeric <- function(object, predictors) {
coefs <- object$coefs
pred <- as.vector(predictors %*% coefs)
spruce_numeric(pred) # 返回包含.pred列的tibble
}Prediction Bridge
预测桥接函数
Converts output and switches on type.
forge()r
predict_simple_lm_bridge <- function(type, object, predictors) {
type <- rlang::arg_match(type, "numeric")
predictors <- as.matrix(predictors)
switch(
type,
numeric = predict_simple_lm_numeric(object, predictors)
)
}转换的输出,并根据类型切换实现。
forge()r
predict_simple_lm_bridge <- function(type, object, predictors) {
type <- rlang::arg_match(type, "numeric")
predictors <- as.matrix(predictors)
switch(
type,
numeric = predict_simple_lm_numeric(object, predictors)
)
}User-Facing Predict Method
面向用户的预测方法
Call with blueprint, then bridge, then validate.
forge()r
predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
processed <- forge(new_data, object$blueprint)
out <- predict_simple_lm_bridge(type, object, processed$predictors)
validate_prediction_size(out, new_data)
out
}调用带蓝图参数的,传入桥接函数,然后验证结果。
forge()r
predict.simple_lm <- function(object, new_data, type = "numeric", ...) {
processed <- forge(new_data, object$blueprint)
out <- predict_simple_lm_bridge(type, object, processed$predictors)
validate_prediction_size(out, new_data)
out
}mold() Details
mold() 细节
Returns: (tibble), (tibble), , .
predictorsoutcomesextrasblueprint返回值包含:(tibble格式)、(tibble格式)、、。
predictorsoutcomesextrasblueprintBlueprint Options
蓝图选项
| Blueprint | Key Options |
|---|---|
| |
| |
| |
| 蓝图 | 关键选项 |
|---|---|
| |
| |
| |
Formula Special Behaviors
公式的特殊行为
- No intercept by default (unlike base R)
- keeps factors unexpanded
indicators = "none" - Multivariate outcomes: (not
y1 + y2 ~ x1 + x2)cbind()
- 默认不包含截距项(与基础R不同)
- 时保留因子类型不展开
indicators = "none" - 多变量输出格式:(而非
y1 + y2 ~ x1 + x2)cbind()
forge() Validation
forge() 验证机制
Automatically validates new data matches training data:
- Column names must match
- Column types must be compatible
- Factor levels must be subset of training levels
- Lossy conversions emit warnings (novel levels → NA)
r
undefined自动验证新数据与训练数据是否匹配:
- 列名必须一致
- 列类型必须兼容
- 因子水平必须是训练数据因子水平的子集
- 有损转换会发出警告(如新出现的因子水平会被转换为NA)
r
undefinedMissing column → error
缺失列 → 报错
Wrong type (double for factor) → error
类型错误(如因子列传入数值)→ 报错
Character for factor → silent conversion
字符型转换为因子型 → 静默转换
Novel factor level → warning + NA
新因子水平 → 警告 + 转换为NA
undefinedundefinedSpruce Functions
Spruce系列函数
Standardize prediction output to tidymodels conventions:
| Function | Output Column |
|---|---|
| |
| |
| |
将预测结果标准化为tidymodels规范格式:
| 函数 | 输出列 |
|---|---|
| |
| |
| |
Validation Functions
验证函数
| Function | Checks |
|---|---|
| Single outcome column |
| Output rows == input rows |
| Numeric outcomes |
| Numeric predictors |
| 函数 | 检查内容 |
|---|---|
| 输出是否为单变量列 |
| 输出行数是否与输入行数一致 |
| 输出是否为数值型 |
| 预测变量是否为数值型 |
See Also
另请参阅
- designing-tidy-r-functions: Function API design
- r-metaprogramming: Expression manipulation (if customizing blueprints)
- testing-r-packages: Testing patterns
- designing-tidy-r-functions:函数API设计
- r-metaprogramming:表达式操作(自定义蓝图时会用到)
- testing-r-packages:测试模式
Vignettes
文档小册(Vignettes)
Access detailed documentation via R:
r
undefined通过R访问详细文档:
r
undefinedOpen vignette in browser
在浏览器中打开文档小册
RShowDoc("mold", package = "hardhat") # Molding data for modeling
RShowDoc("forge", package = "hardhat") # Forging data for predictions
RShowDoc("package", package = "hardhat") # Creating modeling packages
RShowDoc("mold", package = "hardhat") # 建模数据预处理
RShowDoc("forge", package = "hardhat") # 预测数据处理
RShowDoc("package", package = "hardhat") # 创建建模包
Or browse all vignettes
或浏览所有文档小册
browseVignettes("hardhat")
undefinedbrowseVignettes("hardhat")
undefined