domain-ml

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Machine Learning Domain

机器学习领域

Layer 3: Domain Constraints
第3层:领域约束

Domain Constraints → Design Implications

领域约束 → 设计影响

Domain RuleDesign ConstraintRust Implication
Large dataEfficient memoryZero-copy, streaming
GPU accelerationCUDA/Metal supportcandle, tch-rs
Model portabilityStandard formatsONNX
Batch processingThroughput over latencyBatched inference
Numerical precisionFloat handlingndarray, careful f32/f64
ReproducibilityDeterministicSeeded random, versioning

领域规则设计约束Rust 实现要点
大数据量高效内存利用零拷贝、流式处理
GPU加速CUDA/Metal支持candle, tch-rs
模型可移植性标准格式ONNX
批量处理吞吐量优先于延迟批量推理
数值精度浮点数处理ndarray、谨慎使用f32/f64
可复现性确定性种子随机数、版本控制

Critical Constraints

关键约束

Memory Efficiency

内存效率

RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops
规则:避免复制大型张量
原因:内存带宽是瓶颈
Rust实现:引用、视图、原地操作

GPU Utilization

GPU利用率

RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading
规则:批量操作提升GPU效率
原因:GPU核函数启动存在开销
Rust实现:批量大小、异步数据加载

Model Portability

模型可移植性

RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle

规则:使用标准模型格式
原因:在Python中训练,在Rust中部署
Rust实现:通过tract或candle使用ONNX

Trace Down ↓

向下追溯 ↓

From constraints to design (Layer 2):
"Need efficient data pipelines"
    ↓ m10-performance: Streaming, batching
    ↓ polars: Lazy evaluation

"Need GPU inference"
    ↓ m07-concurrency: Async data loading
    ↓ candle/tch-rs: CUDA backend

"Need model loading"
    ↓ m12-lifecycle: Lazy init, caching
    ↓ tract: ONNX runtime

从约束到设计(第2层):
"需要高效的数据流水线"
    ↓ m10-performance: 流式处理、批量处理
    ↓ polars: 惰性求值

"需要GPU推理"
    ↓ m07-concurrency: 异步数据加载
    ↓ candle/tch-rs: CUDA后端

"需要模型加载"
    ↓ m12-lifecycle: 惰性初始化、缓存
    ↓ tract: ONNX运行时

Use Case → Framework

使用场景 → 框架选择

Use CaseRecommendedWhy
Inference onlytract (ONNX)Lightweight, portable
Training + inferencecandle, burnPure Rust, GPU
PyTorch modelstch-rsDirect bindings
Data pipelinespolarsFast, lazy eval
使用场景推荐方案原因
仅推理场景tract (ONNX)轻量、可移植
训练+推理场景candle, burn纯Rust实现、支持GPU
PyTorch模型tch-rs直接绑定
数据流水线polars快速、惰性求值

Key Crates

关键依赖库(Crates)

PurposeCrate
Tensorsndarray
ONNX inferencetract
ML frameworkcandle, burn
PyTorch bindingstch-rs
Data processingpolars
Embeddingsfastembed
用途Crate
张量处理ndarray
ONNX推理tract
机器学习框架candle, burn
PyTorch绑定tch-rs
数据处理polars
嵌入向量处理fastembed

Design Patterns

设计模式

PatternPurposeImplementation
Model loadingOnce, reuse
OnceLock<Model>
BatchingThroughputCollect then process
StreamingLarge dataIterator-based
GPU asyncParallelismData loading parallel to compute
模式用途实现方式
模型加载一次性加载、复用
OnceLock<Model>
批量处理提升吞吐量先收集再处理
流式处理处理大数据量基于迭代器
GPU异步处理并行处理数据加载与计算并行

Code Pattern: Inference Server

代码模式:推理服务器

rust
use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}
rust
use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}

Code Pattern: Batched Inference

代码模式:批量推理

rust
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // Stack inputs into batch tensor
        let batch_tensor = stack_inputs(batch);

        // Run inference on batch
        let batch_output = model.run(batch_tensor).await;

        // Unstack results
        results.extend(unstack_outputs(batch_output));
    }

    results
}

rust
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // 堆叠输入为批量张量
        let batch_tensor = stack_inputs(batch);

        // 对批量数据执行推理
        let batch_output = model.run(batch_tensor).await;

        // 拆分结果
        results.extend(unstack_outputs(batch_output));
    }

    results
}

Common Mistakes

常见错误

MistakeDomain ViolationFix
Clone tensorsMemory wasteUse views
Single inferenceGPU underutilizedBatch processing
Load model per requestSlowSingleton pattern
Sync data loadingGPU idleAsync pipeline

错误做法违反的领域原则修复方案
复制张量内存浪费使用视图
单条推理GPU利用率不足批量处理
每次请求加载模型性能缓慢单例模式
同步数据加载GPU空闲异步流水线

Trace to Layer 1

追溯到第1层

ConstraintLayer 2 PatternLayer 1 Implementation
Memory efficiencyZero-copyndarray views
Model singletonLazy initOnceLock<Model>
Batch processingChunked iterationchunks() + parallel
GPU asyncConcurrent loadingtokio::spawn + GPU

约束第2层模式第1层实现
内存效率零拷贝ndarray视图
模型单例惰性初始化OnceLock<Model>
批量处理分块迭代chunks() + 并行处理
GPU异步处理并发加载tokio::spawn + GPU

Related Skills

相关技能

WhenSee
Performancem10-performance
Lazy initializationm12-lifecycle
Async patternsm07-concurrency
Memory efficiencym01-ownership
适用场景参考内容
性能优化m10-performance
惰性初始化m12-lifecycle
异步模式m07-concurrency
内存效率m01-ownership