modal

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- Adapted from: claude-scientific-skills/scientific-skills/modal -->
<!-- 改编自:claude-scientific-skills/scientific-skills/modal -->

Modal Serverless Cloud Platform

Modal无服务器云平台

Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.
支持GPU、自动扩缩容和按使用付费的无服务器Python执行平台。

When to Use

适用场景

  • Deploy and serve ML models (LLMs, image generation)
  • Run GPU-accelerated computation
  • Batch process large datasets in parallel
  • Schedule compute-intensive jobs
  • Build serverless APIs with autoscaling
  • 部署并提供ML模型服务(大语言模型、图像生成模型)
  • 运行GPU加速计算任务
  • 并行批处理大型数据集
  • 调度计算密集型任务
  • 构建支持自动扩缩容的无服务器API

Quick Start

快速开始

bash
undefined
bash
undefined

Install

安装

pip install modal
pip install modal

Authenticate

认证

modal token new

```python
import modal

app = modal.App("my-app")

@app.function()
def hello():
    return "Hello from Modal!"
modal token new

```python
import modal

app = modal.App("my-app")

@app.function()
def hello():
    return "Hello from Modal!"

Run with: modal run script.py

运行:modal run script.py

undefined
undefined

Container Images

容器镜像

python
undefined
python
undefined

Build image with dependencies

构建包含依赖的镜像

image = ( modal.Image.debian_slim(python_version="3.12") .pip_install("torch", "transformers", "numpy") )
app = modal.App("ml-app", image=image)
undefined
image = ( modal.Image.debian_slim(python_version="3.12") .pip_install("torch", "transformers", "numpy") )
app = modal.App("ml-app", image=image)
undefined

GPU Functions

GPU Functions

python
@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU code here
python
@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU代码写在这里

Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200

可用GPU型号:T4, L4, A10, A100, L40S, H100, H200, B200

Multi-GPU: gpu="H100:8"

多GPU:gpu="H100:8"

undefined
undefined

Web Endpoints

Web端点

python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    result = model.predict(data["input"])
    return {"prediction": result}
python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    result = model.predict(data["input"])
    return {"prediction": result}

Deploy: modal deploy script.py

部署:modal deploy script.py

undefined
undefined

Scheduled Jobs

定时任务

python
@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    pass
python
@app.function(schedule=modal.Cron("0 2 * * *"))  # 每天凌晨2点执行
def daily_backup():
    pass

@app.function(schedule=modal.Period(hours=4))  # 每4小时执行一次
def refresh_cache():
    pass

Autoscaling

自动扩缩容

python
@app.function()
def process_item(item_id: int):
    return analyze(item_id)

@app.local_entrypoint()
def main():
    items = range(1000)
    # Automatically parallelized across containers
    results = list(process_item.map(items))
python
@app.function()
def process_item(item_id: int):
    return analyze(item_id)

@app.local_entrypoint()
def main():
    items = range(1000)
    # 自动在多个容器中并行处理
    results = list(process_item.map(items))

Persistent Storage

持久化存储

python
volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # 持久化更改

Secrets Management

密钥管理

python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]

ML Model Serving

ML模型服务

python
@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")
python
@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")

Resource Configuration

资源配置

python
@app.function(
    cpu=8.0,              # 8 CPU cores
    memory=32768,         # 32 GiB RAM
    ephemeral_disk=10240, # 10 GiB disk
    timeout=3600          # 1 hour timeout
)
def memory_intensive_task():
    pass
python
@app.function(
    cpu=8.0,              # 8核CPU
    memory=32768,         # 32 GiB 内存
    ephemeral_disk=10240, # 10 GiB 临时磁盘
    timeout=3600          # 1小时超时时间
)
def memory_intensive_task():
    pass

Best Practices

最佳实践

  1. Pin dependencies for reproducible builds
  2. Use appropriate GPU types - L40S for inference, H100 for training
  3. Leverage caching via Volumes for model weights
  4. Use
    .map()
    for parallel processing
  5. Import packages inside functions if not available locally
  6. Store secrets securely - never hardcode API keys
  1. 固定依赖版本以确保构建的可复现性
  2. 选择合适的GPU型号 - L40S适用于推理,H100适用于训练
  3. 通过Volumes利用缓存存储模型权重
  4. 使用
    .map()
    进行并行处理
  5. 若本地无对应包,在函数内部导入
  6. 安全存储密钥 - 切勿硬编码API密钥

vs Alternatives

与竞品对比

PlatformBest For
ModalServerless GPUs, autoscaling, Python-native
RunPodGPU rental, long-running jobs
AWS LambdaCPU workloads, AWS ecosystem
ReplicateModel hosting, simple deployments
平台最佳适用场景
Modal无服务器GPU、自动扩缩容、原生支持Python
RunPodGPU租赁、长时间运行任务
AWS LambdaCPU工作负载、AWS生态系统
Replicate模型托管、简单部署

Resources

相关资源