modal
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- Adapted from: claude-scientific-skills/scientific-skills/modal -->
<!-- 改编自:claude-scientific-skills/scientific-skills/modal -->
Modal Serverless Cloud Platform
Modal无服务器云平台
Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.
支持GPU、自动扩缩容和按使用付费的无服务器Python执行平台。
When to Use
适用场景
- Deploy and serve ML models (LLMs, image generation)
- Run GPU-accelerated computation
- Batch process large datasets in parallel
- Schedule compute-intensive jobs
- Build serverless APIs with autoscaling
- 部署并提供ML模型服务(大语言模型、图像生成模型)
- 运行GPU加速计算任务
- 并行批处理大型数据集
- 调度计算密集型任务
- 构建支持自动扩缩容的无服务器API
Quick Start
快速开始
bash
undefinedbash
undefinedInstall
安装
pip install modal
pip install modal
Authenticate
认证
modal token new
```python
import modal
app = modal.App("my-app")
@app.function()
def hello():
return "Hello from Modal!"modal token new
```python
import modal
app = modal.App("my-app")
@app.function()
def hello():
return "Hello from Modal!"Run with: modal run script.py
运行:modal run script.py
undefinedundefinedContainer Images
容器镜像
python
undefinedpython
undefinedBuild image with dependencies
构建包含依赖的镜像
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
undefinedimage = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
undefinedGPU Functions
GPU Functions
python
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# GPU code herepython
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# GPU代码写在这里Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200
可用GPU型号:T4, L4, A10, A100, L40S, H100, H200, B200
Multi-GPU: gpu="H100:8"
多GPU:gpu="H100:8"
undefinedundefinedWeb Endpoints
Web端点
python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
result = model.predict(data["input"])
return {"prediction": result}python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
result = model.predict(data["input"])
return {"prediction": result}Deploy: modal deploy script.py
部署:modal deploy script.py
undefinedundefinedScheduled Jobs
定时任务
python
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM
def daily_backup():
pass
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours
def refresh_cache():
passpython
@app.function(schedule=modal.Cron("0 2 * * *")) # 每天凌晨2点执行
def daily_backup():
pass
@app.function(schedule=modal.Period(hours=4)) # 每4小时执行一次
def refresh_cache():
passAutoscaling
自动扩缩容
python
@app.function()
def process_item(item_id: int):
return analyze(item_id)
@app.local_entrypoint()
def main():
items = range(1000)
# Automatically parallelized across containers
results = list(process_item.map(items))python
@app.function()
def process_item(item_id: int):
return analyze(item_id)
@app.local_entrypoint()
def main():
items = range(1000)
# 自动在多个容器中并行处理
results = list(process_item.map(items))Persistent Storage
持久化存储
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # Persist changespython
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # 持久化更改Secrets Management
密钥管理
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]ML Model Serving
ML模型服务
python
@app.cls(gpu="L40S")
class Model:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
return self.pipe(text)
@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")python
@app.cls(gpu="L40S")
class Model:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
return self.pipe(text)
@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")Resource Configuration
资源配置
python
@app.function(
cpu=8.0, # 8 CPU cores
memory=32768, # 32 GiB RAM
ephemeral_disk=10240, # 10 GiB disk
timeout=3600 # 1 hour timeout
)
def memory_intensive_task():
passpython
@app.function(
cpu=8.0, # 8核CPU
memory=32768, # 32 GiB 内存
ephemeral_disk=10240, # 10 GiB 临时磁盘
timeout=3600 # 1小时超时时间
)
def memory_intensive_task():
passBest Practices
最佳实践
- Pin dependencies for reproducible builds
- Use appropriate GPU types - L40S for inference, H100 for training
- Leverage caching via Volumes for model weights
- Use for parallel processing
.map() - Import packages inside functions if not available locally
- Store secrets securely - never hardcode API keys
- 固定依赖版本以确保构建的可复现性
- 选择合适的GPU型号 - L40S适用于推理,H100适用于训练
- 通过Volumes利用缓存存储模型权重
- 使用进行并行处理
.map() - 若本地无对应包,在函数内部导入
- 安全存储密钥 - 切勿硬编码API密钥
vs Alternatives
与竞品对比
| Platform | Best For |
|---|---|
| Modal | Serverless GPUs, autoscaling, Python-native |
| RunPod | GPU rental, long-running jobs |
| AWS Lambda | CPU workloads, AWS ecosystem |
| Replicate | Model hosting, simple deployments |
| 平台 | 最佳适用场景 |
|---|---|
| Modal | 无服务器GPU、自动扩缩容、原生支持Python |
| RunPod | GPU租赁、长时间运行任务 |
| AWS Lambda | CPU工作负载、AWS生态系统 |
| Replicate | 模型托管、简单部署 |