modal

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Modal Serverless Cloud Platform

Modal无服务器云平台

Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.

支持GPU、自动扩缩容和按使用付费的无服务器Python执行平台。

When to Use

适用场景

Deploy and serve ML models (LLMs, image generation)
Run GPU-accelerated computation
Batch process large datasets in parallel
Schedule compute-intensive jobs
Build serverless APIs with autoscaling

部署并提供ML模型服务（大语言模型、图像生成模型）
运行GPU加速计算任务
并行批处理大型数据集
调度计算密集型任务
构建支持自动扩缩容的无服务器API

Quick Start

快速开始

bash

undefined

bash

undefined

Install

安装

pip install modal

Authenticate

认证

modal token new


```python
import modal

app = modal.App("my-app")

@app.function()
def hello():
    return "Hello from Modal!"

modal token new


```python
import modal

app = modal.App("my-app")

@app.function()
def hello():
    return "Hello from Modal!"

Run with: modal run script.py

运行：modal run script.py

undefined

undefined

Container Images

容器镜像

python

undefined

python

undefined

Build image with dependencies

构建包含依赖的镜像

image = ( modal.Image.debian_slim(python_version="3.12") .pip_install("torch", "transformers", "numpy") )

app = modal.App("ml-app", image=image)

undefined

image = ( modal.Image.debian_slim(python_version="3.12") .pip_install("torch", "transformers", "numpy") )

app = modal.App("ml-app", image=image)

undefined

GPU Functions

python

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU code here

python

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU代码写在这里

Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200

可用GPU型号：T4, L4, A10, A100, L40S, H100, H200, B200

Multi-GPU: gpu="H100:8"

多GPU：gpu="H100:8"

undefined

undefined

Web Endpoints

Web端点

python

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    result = model.predict(data["input"])
    return {"prediction": result}

python

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    result = model.predict(data["input"])
    return {"prediction": result}

Deploy: modal deploy script.py

部署：modal deploy script.py

undefined

undefined

Scheduled Jobs

定时任务

python

@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    pass

python

@app.function(schedule=modal.Cron("0 2 * * *"))  # 每天凌晨2点执行
def daily_backup():
    pass

@app.function(schedule=modal.Period(hours=4))  # 每4小时执行一次
def refresh_cache():
    pass

Autoscaling

自动扩缩容

python

@app.function()
def process_item(item_id: int):
    return analyze(item_id)

@app.local_entrypoint()
def main():
    items = range(1000)
    # Automatically parallelized across containers
    results = list(process_item.map(items))

python

@app.function()
def process_item(item_id: int):
    return analyze(item_id)

@app.local_entrypoint()
def main():
    items = range(1000)
    # 自动在多个容器中并行处理
    results = list(process_item.map(items))

Persistent Storage

持久化存储

python

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes

python

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # 持久化更改

Secrets Management

密钥管理

python

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]

python

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]

ML Model Serving

ML模型服务

python

@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")

python

@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")

Resource Configuration

资源配置

python

@app.function(
    cpu=8.0,              # 8 CPU cores
    memory=32768,         # 32 GiB RAM
    ephemeral_disk=10240, # 10 GiB disk
    timeout=3600          # 1 hour timeout
)
def memory_intensive_task():
    pass

python

@app.function(
    cpu=8.0,              # 8核CPU
    memory=32768,         # 32 GiB 内存
    ephemeral_disk=10240, # 10 GiB 临时磁盘
    timeout=3600          # 1小时超时时间
)
def memory_intensive_task():
    pass

Best Practices

最佳实践

Pin dependencies for reproducible builds
Use appropriate GPU types - L40S for inference, H100 for training
Leverage caching via Volumes for model weights
Use
.map()
for parallel processing
Import packages inside functions if not available locally
Store secrets securely - never hardcode API keys

固定依赖版本以确保构建的可复现性
选择合适的GPU型号 - L40S适用于推理，H100适用于训练
通过Volumes利用缓存存储模型权重
使用
.map()
进行并行处理
若本地无对应包，在函数内部导入
安全存储密钥 - 切勿硬编码API密钥

vs Alternatives

与竞品对比

Platform	Best For
Modal	Serverless GPUs, autoscaling, Python-native
RunPod	GPU rental, long-running jobs
AWS Lambda	CPU workloads, AWS ecosystem
Replicate	Model hosting, simple deployments

平台	最佳适用场景
Modal	无服务器GPU、自动扩缩容、原生支持Python
RunPod	GPU租赁、长时间运行任务
AWS Lambda	CPU工作负载、AWS生态系统
Replicate	模型托管、简单部署

modal

Original

Translation

Modal Serverless Cloud Platform

Modal无服务器云平台

When to Use

适用场景

Quick Start

快速开始

Install

安装

Authenticate

认证

Run with: modal run script.py

运行：modal run script.py

Container Images

容器镜像

Build image with dependencies

构建包含依赖的镜像

GPU Functions

GPU Functions

Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200

可用GPU型号：T4, L4, A10, A100, L40S, H100, H200, B200

Multi-GPU: gpu="H100:8"

多GPU：gpu="H100:8"

Web Endpoints

Web端点

Deploy: modal deploy script.py

部署：modal deploy script.py

Scheduled Jobs

定时任务

Autoscaling

自动扩缩容

Persistent Storage

持久化存储

Secrets Management

密钥管理

ML Model Serving

ML模型服务

Resource Configuration

资源配置

Best Practices

最佳实践

vs Alternatives

与竞品对比

Resources

相关资源