modal

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Modal

Overview

概述

Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.

Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.

Modal是一个无需过多配置即可在云端运行Python代码的无服务器平台。可在强大的GPU上执行函数，自动扩缩至数千个容器，并且仅为实际使用的计算资源付费。

Modal尤其适用于AI/ML工作负载、高性能批处理、定时任务、GPU推理以及无服务器API。可在https://modal.com免费注册，每月可获得30美元的信用额度。

When to Use This Skill

何时使用该工具

Use Modal for:

Deploying and serving ML models (LLMs, image generation, embedding models)
Running GPU-accelerated computation (training, inference, rendering)
Batch processing large datasets in parallel
Scheduling compute-intensive jobs (daily data processing, model training)
Building serverless APIs that need automatic scaling
Scientific computing requiring distributed compute or specialized hardware

在以下场景使用Modal：

部署并提供机器学习模型服务（大语言模型、图像生成模型、嵌入模型）
运行GPU加速的计算任务（训练、推理、渲染）
并行批处理大型数据集
调度计算密集型作业（每日数据处理、模型训练）
构建需要自动扩缩容的无服务器API
需要分布式计算或专用硬件的科学计算

Authentication and Setup

认证与设置

Modal requires authentication via API token.

Modal需要通过API令牌进行认证。

Initial Setup

初始设置

bash

undefined

bash

undefined

Install Modal

安装Modal

uv uv pip install modal

Authenticate (opens browser for login)

认证（打开浏览器登录）

modal token new


This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.

modal token new


此操作会在`~/.modal.toml`中存储一个令牌，该令牌将用于所有Modal操作的认证。

Verify Setup

验证设置

python

import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")

Run with:

modal run script.py

python

import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")

运行命令：

modal run script.py

Core Capabilities

核心功能

Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.

Modal通过在容器中运行的函数提供无服务器Python执行能力。可声明式地定义计算需求、依赖项和扩缩容行为。

1. Define Container Images

1. 定义容器镜像

Specify dependencies and environment for functions using Modal Images.

python

import modal

使用Modal镜像为函数指定依赖项和运行环境。

python

import modal

Basic image with Python packages

包含Python包的基础镜像

image = ( modal.Image.debian_slim(python_version="3.12") .uv_pip_install("torch", "transformers", "numpy") )

app = modal.App("ml-app", image=image)


**Common patterns:**
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
- Install system packages: `.apt_install("ffmpeg", "git")`
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- Add local code: `.add_local_python_source("my_module")`

See `references/images.md` for comprehensive image building documentation.

image = ( modal.Image.debian_slim(python_version="3.12") .uv_pip_install("torch", "transformers", "numpy") )

app = modal.App("ml-app", image=image)


**常见模式：**
- 安装Python包：`.uv_pip_install("pandas", "scikit-learn")`
- 安装系统包：`.apt_install("ffmpeg", "git")`
- 使用现有Docker镜像：`modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- 添加本地代码：`.add_local_python_source("my_module")`

详细的镜像构建文档请参考`references/images.md`。

2. Create Functions

2. 创建函数

Define functions that run in the cloud with the

@app.function()

decorator.

python

@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()

Call functions:

python

undefined

使用

@app.function()

装饰器定义在云端运行的函数。

python

@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()

调用函数：

python

undefined

From local entrypoint

本地入口点

@app.local_entrypoint() def main(): result = process_data.remote("data.csv") print(result)


Run with: `modal run script.py`

See `references/functions.md` for function patterns, deployment, and parameter handling.

@app.local_entrypoint() def main(): result = process_data.remote("data.csv") print(result)


运行命令：`modal run script.py`

函数模式、部署和参数处理的详细内容请参考`references/functions.md`。

3. Request GPUs

3. 请求GPU

Attach GPUs to functions for accelerated computation.

python

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU-accelerated code here

Available GPU types:

```
T4
```
,
```
L4
```
- Cost-effective inference
```
A10
```
,
```
A100
```
,
```
A100-80GB
```
- Standard training/inference
```
L40S
```
- Excellent cost/performance balance (48GB)
```
H100
```
,
```
H200
```
- High-performance training
```
B200
```
- Flagship performance (most powerful)

Request multiple GPUs:

python

@app.function(gpu="H100:8")  # 8x H100 GPUs
def train_large_model():
    pass

See

references/gpu.md

for GPU selection guidance, CUDA setup, and multi-GPU configuration.

为函数附加GPU以实现加速计算。

python

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # 此处为GPU加速代码

可用GPU类型：

```
T4
```
、
```
L4
```
- 高性价比推理
```
A10
```
、
```
A100
```
、
```
A100-80GB
```
- 标准训练/推理
```
L40S
```
- 出色的性价比（48GB显存）
```
H100
```
、
```
H200
```
- 高性能训练
```
B200
```
- 旗舰性能（最强大）

请求多块GPU：

python

@app.function(gpu="H100:8")  # 8块H100 GPU
def train_large_model():
    pass

GPU选择指南、CUDA设置和多GPU配置的详细内容请参考

references/gpu.md

。

4. Configure Resources

4. 配置资源

Request CPU cores, memory, and disk for functions.

python

@app.function(
    cpu=8.0,           # 8 physical cores
    memory=32768,      # 32 GiB RAM
    ephemeral_disk=10240  # 10 GiB disk
)
def memory_intensive_task():
    pass

Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.

See

references/resources.md

for resource limits and billing details.

为函数请求CPU核心、内存和磁盘资源。

python

@app.function(
    cpu=8.0,           # 8个物理核心
    memory=32768,      # 32 GiB内存
    ephemeral_disk=10240  # 10 GiB磁盘
)
def memory_intensive_task():
    pass

默认分配：0.125个CPU核心，128 MiB内存。计费基于预留资源或实际使用资源，以较高者为准。

资源限制和计费详情请参考

references/resources.md

。

5. Scale Automatically

5. 自动扩缩容

Modal autoscales functions from zero to thousands of containers based on demand.

Process inputs in parallel:

python

@app.function()
def analyze_sample(sample_id: int):
    # Process single sample
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # Automatically parallelized across containers
    results = list(analyze_sample.map(sample_ids))

Configure autoscaling:

python

@app.function(
    max_containers=100,      # Upper limit
    min_containers=2,        # Keep warm
    buffer_containers=5      # Idle buffer for bursts
)
def inference():
    pass

See

references/scaling.md

for autoscaling configuration, concurrency, and scaling limits.

Modal可根据需求自动将函数从0扩缩至数千个容器。

并行处理输入：

python

@app.function()
def analyze_sample(sample_id: int):
    # 处理单个样本
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # 自动在多个容器间并行处理
    results = list(analyze_sample.map(sample_ids))

配置自动扩缩容：

python

@app.function(
    max_containers=100,      # 上限
    min_containers=2,        # 保持预热
    buffer_containers=5      # 应对突发请求的空闲缓冲容器
)
def inference():
    pass

自动扩缩容配置、并发和扩缩容限制的详细内容请参考

references/scaling.md

。

6. Store Data Persistently

6. 持久化存储数据

Use Volumes for persistent storage across function invocations.

python

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes

Volumes persist data between runs, store model weights, cache datasets, and share data between functions.

See

references/volumes.md

for volume management, commits, and caching patterns.

使用Volumes在函数调用之间实现持久化存储。

python

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # 持久化更改

Volumes可在多次运行之间保留数据、存储模型权重、缓存数据集，并在函数之间共享数据。

卷管理、提交和缓存模式的详细内容请参考

references/volumes.md

。

7. Manage Secrets

7. 管理密钥

Store API keys and credentials securely using Modal Secrets.

python

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # Use token for authentication

Create secrets in Modal dashboard or via CLI:

bash

modal secret create my-secret KEY=value API_TOKEN=xyz

See

references/secrets.md

for secret management and authentication patterns.

使用Modal Secrets安全存储API密钥和凭证。

python

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # 使用令牌进行认证

在Modal控制台或通过CLI创建密钥：

bash

modal secret create my-secret KEY=value API_TOKEN=xyz

密钥管理和认证模式的详细内容请参考

references/secrets.md

。

8. Deploy Web Endpoints

8. 部署Web端点

Serve HTTP endpoints, APIs, and webhooks with

@modal.web_endpoint()

python

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # Process request
    result = model.predict(data["input"])
    return {"prediction": result}

Deploy with:

bash

modal deploy script.py

Modal provides HTTPS URL for the endpoint.

See

references/web-endpoints.md

for FastAPI integration, streaming, authentication, and WebSocket support.

使用

@modal.web_endpoint()

提供HTTP端点、API和Webhook。

python

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # 处理请求
    result = model.predict(data["input"])
    return {"prediction": result}

部署命令：

bash

modal deploy script.py

Modal会为该端点提供HTTPS URL。

FastAPI集成、流式传输、认证和WebSocket支持的详细内容请参考

references/web-endpoints.md

。

9. Schedule Jobs

9. 调度作业

Run functions on a schedule with cron expressions.

python

@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    # Backup data
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    # Update cache
    pass

Scheduled functions run automatically without manual invocation.

See

references/scheduled-jobs.md

for cron syntax, timezone configuration, and monitoring.

使用cron表达式按计划运行函数。

python

@app.function(schedule=modal.Cron("0 2 * * *"))  # 每日凌晨2点运行
def daily_backup():
    # 备份数据
    pass

@app.function(schedule=modal.Period(hours=4))  # 每4小时运行一次
def refresh_cache():
    # 更新缓存
    pass

定时作业会自动运行，无需手动调用。

Cron语法、时区配置和监控的详细内容请参考

references/scheduled-jobs.md

。

Common Workflows

常见工作流

Deploy ML Model for Inference

部署机器学习模型用于推理

python

import modal

python

import modal

Define dependencies

定义依赖项

image = modal.Image.debian_slim().uv_pip_install("torch", "transformers") app = modal.App("llm-inference", image=image)

Download model at build time

在构建时下载模型

@app.function() def download_model(): from transformers import AutoModel AutoModel.from_pretrained("bert-base-uncased")

Serve model

提供模型服务

@app.cls(gpu="L40S") class Model: @modal.enter() def load_model(self): from transformers import pipeline self.pipe = pipeline("text-classification", device="cuda")

@modal.method()
def predict(self, text: str):
    return self.pipe(text)

@app.local_entrypoint() def main(): model = Model() result = model.predict.remote("Modal is great!") print(result)

undefined

@app.cls(gpu="L40S") class Model: @modal.enter() def load_model(self): from transformers import pipeline self.pipe = pipeline("text-classification", device="cuda")

@modal.method()
def predict(self, text: str):
    return self.pipe(text)

@app.local_entrypoint() def main(): model = Model() result = model.predict.remote("Modal is great!") print(result)

undefined

Batch Process Large Dataset

批处理大型数据集

python

@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # Process data
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 1000s of files
    # Automatically parallelized across containers
    for count in process_file.map(files):
        print(f"Processed {count} rows")

python

@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # 处理数据
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 数千个文件
    # 自动在多个容器间并行处理
    for count in process_file.map(files):
        print(f"已处理 {count} 行数据")

Train Model on GPU

在GPU上训练模型

python

@app.function(
    gpu="A100:2",      # 2x A100 GPUs
    timeout=3600       # 1 hour timeout
)
def train_model(config: dict):
    import torch
    # Multi-GPU training code
    model = create_model(config)
    train(model)
    return metrics

python

@app.function(
    gpu="A100:2",      # 2块A100 GPU
    timeout=3600       # 1小时超时时间
)
def train_model(config: dict):
    import torch
    # 多GPU训练代码
    model = create_model(config)
    train(model)
    return metrics

Reference Documentation

参考文档

Detailed documentation for specific features:

references/getting-started.md
- Authentication, setup, basic concepts
references/images.md
- Image building, dependencies, Dockerfiles
references/functions.md
- Function patterns, deployment, parameters
references/gpu.md
- GPU types, CUDA, multi-GPU configuration
references/resources.md
- CPU, memory, disk management
references/scaling.md
- Autoscaling, parallel execution, concurrency
references/volumes.md
- Persistent storage, data management
references/secrets.md
- Environment variables, authentication
references/web-endpoints.md
- APIs, webhooks, endpoints
references/scheduled-jobs.md
- Cron jobs, periodic tasks
references/examples.md
- Common patterns for scientific computing

特定功能的详细文档：

references/getting-started.md
- 认证、设置、基本概念
references/images.md
- 镜像构建、依赖项、Dockerfile
references/functions.md
- 函数模式、部署、参数
references/gpu.md
- GPU类型、CUDA、多GPU配置
references/resources.md
- CPU、内存、磁盘管理
references/scaling.md
- 自动扩缩容、并行执行、并发
references/volumes.md
- 持久化存储、数据管理
references/secrets.md
- 环境变量、认证
references/web-endpoints.md
- API、Webhook、端点
references/scheduled-jobs.md
- Cron作业、周期性任务
references/examples.md
- 科学计算的常见模式

Best Practices

最佳实践

Pin dependencies in
```
.uv_pip_install()
```
for reproducible builds
Use appropriate GPU types - L40S for inference, H100/A100 for training
Leverage caching - Use Volumes for model weights and datasets
Configure autoscaling - Set
```
max_containers
```
and
```
min_containers
```
based on workload
Import packages in function body if not available locally
Use
.map()
for parallel processing instead of sequential loops
Store secrets securely - Never hardcode API keys
Monitor costs - Check Modal dashboard for usage and billing

固定依赖项：在
```
.uv_pip_install()
```
中固定依赖版本以实现可复现的构建
选择合适的GPU类型 - L40S用于推理，H100/A100用于训练
利用缓存 - 使用Volumes存储模型权重和数据集
配置自动扩缩容 - 根据工作负载设置
```
max_containers
```
和
```
min_containers
```
在函数体内导入包：如果本地没有该包
使用
.map()
进行并行处理：替代顺序循环
安全存储密钥 - 切勿硬编码API密钥
监控成本 - 在Modal控制台查看使用情况和账单

Troubleshooting

故障排除

"Module not found" errors:

Add packages to image with
```
.uv_pip_install("package-name")
```
Import packages inside function body if not available locally

GPU not detected:

Verify GPU specification:
```
@app.function(gpu="A100")
```
Check CUDA availability:
```
torch.cuda.is_available()
```

Function timeout:

Increase timeout:
```
@app.function(timeout=3600)
```
Default timeout is 5 minutes

Volume changes not persisting:

Call
```
volume.commit()
```
after writing files
Verify volume mounted correctly in function decorator

For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.

"Module not found"错误：

使用
```
.uv_pip_install("package-name")
```
将包添加到镜像中
如果本地没有该包，在函数体内导入

GPU未被检测到：

验证GPU规格：
```
@app.function(gpu="A100")
```
检查CUDA可用性：
```
torch.cuda.is_available()
```

函数超时：

增加超时时间：
```
@app.function(timeout=3600)
```
默认超时时间为5分钟

Volume更改未持久化：

写入文件后调用
```
volume.commit()
```
验证函数装饰器中Volume是否正确挂载

如需更多帮助，请查看Modal文档https://modal.com/docs或加入Modal Slack社区。