daft-udf-tuning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDaft UDF Tuning
Daft UDF 调优
Optimize User-Defined Functions for performance.
优化用户自定义函数的运行性能。
UDF Types
UDF 类型
| Type | Decorator | Use Case |
|---|---|---|
| Stateless | | Simple transforms. Use |
| Stateful | | Expensive init (e.g., loading models). Supports |
| Batch | | Vectorized CPU/GPU ops (NumPy/PyTorch). Faster. |
| 类型 | 装饰器 | 适用场景 |
|---|---|---|
| 无状态 | | 简单转换场景。I/O密集型任务可使用 |
| 有状态 | | 初始化开销高的场景(例如加载模型)。支持配置 |
| 批处理 | | 矢量化CPU/GPU运算(NumPy/PyTorch),运行速度更快。 |
Quick Recipes
快速实践方案
1. Async I/O (Web APIs)
1. 异步I/O(Web API场景)
python
@daft.func
async def fetch(url: str):
async with aiohttp.ClientSession() as s:
return await s.get(url).text()python
@daft.func
async def fetch(url: str):
async with aiohttp.ClientSession() as s:
return await s.get(url).text()2. GPU Batch Inference (PyTorch/Models)
2. GPU批推理(PyTorch/模型场景)
python
@daft.cls(gpus=1)
class Classifier:
def __init__(self):
self.model = load_model().cuda() # Run once per worker
@daft.method.batch(batch_size=32)
def predict(self, images):
return self.model(images.to_pylist())python
@daft.cls(gpus=1)
class Classifier:
def __init__(self):
self.model = load_model().cuda() # 每个worker仅运行一次
@daft.method.batch(batch_size=32)
def predict(self, images):
return self.model(images.to_pylist())Run with concurrency
并发运行
df.with_column("preds", Classifier(max_concurrency=4).predict(df["img"]))
undefineddf.with_column("preds", Classifier(max_concurrency=4).predict(df["img"]))
undefinedTuning Keys
调优关键参数
- : Total parallel UDF instances.
max_concurrency - : GPU request per instance.
gpus=N - : Rows per call. Too small = overhead; too big = OOM.
batch_size - : Pre-slice partitions if memory is tight.
into_batches(N)
- : UDF实例总并行数。
max_concurrency - : 每个实例申请的GPU数量。
gpus=N - : 单次调用处理的行数。数值过小会产生额外开销,过大则会引发内存不足(OOM)问题。
batch_size - : 内存紧张时可预先对分区进行切片。
into_batches(N)