modal

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Modal

Modal

Overview

概述

Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.
Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.
Modal是一个无需过多配置即可在云端运行Python代码的无服务器平台。可在强大的GPU上执行函数,自动扩缩至数千个容器,并且仅为实际使用的计算资源付费。
Modal尤其适用于AI/ML工作负载、高性能批处理、定时任务、GPU推理以及无服务器API。可在https://modal.com免费注册,每月可获得30美元的信用额度。

When to Use This Skill

何时使用该工具

Use Modal for:
  • Deploying and serving ML models (LLMs, image generation, embedding models)
  • Running GPU-accelerated computation (training, inference, rendering)
  • Batch processing large datasets in parallel
  • Scheduling compute-intensive jobs (daily data processing, model training)
  • Building serverless APIs that need automatic scaling
  • Scientific computing requiring distributed compute or specialized hardware
在以下场景使用Modal:
  • 部署并提供机器学习模型服务(大语言模型、图像生成模型、嵌入模型)
  • 运行GPU加速的计算任务(训练、推理、渲染)
  • 并行批处理大型数据集
  • 调度计算密集型作业(每日数据处理、模型训练)
  • 构建需要自动扩缩容的无服务器API
  • 需要分布式计算或专用硬件的科学计算

Authentication and Setup

认证与设置

Modal requires authentication via API token.
Modal需要通过API令牌进行认证。

Initial Setup

初始设置

bash
undefined
bash
undefined

Install Modal

安装Modal

uv uv pip install modal
uv uv pip install modal

Authenticate (opens browser for login)

认证(打开浏览器登录)

modal token new

This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.
modal token new

此操作会在`~/.modal.toml`中存储一个令牌,该令牌将用于所有Modal操作的认证。

Verify Setup

验证设置

python
import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")
Run with:
modal run script.py
python
import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")
运行命令:
modal run script.py

Core Capabilities

核心功能

Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
Modal通过在容器中运行的函数提供无服务器Python执行能力。可声明式地定义计算需求、依赖项和扩缩容行为。

1. Define Container Images

1. 定义容器镜像

Specify dependencies and environment for functions using Modal Images.
python
import modal
使用Modal镜像为函数指定依赖项和运行环境。
python
import modal

Basic image with Python packages

包含Python包的基础镜像

image = ( modal.Image.debian_slim(python_version="3.12") .uv_pip_install("torch", "transformers", "numpy") )
app = modal.App("ml-app", image=image)

**Common patterns:**
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
- Install system packages: `.apt_install("ffmpeg", "git")`
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- Add local code: `.add_local_python_source("my_module")`

See `references/images.md` for comprehensive image building documentation.
image = ( modal.Image.debian_slim(python_version="3.12") .uv_pip_install("torch", "transformers", "numpy") )
app = modal.App("ml-app", image=image)

**常见模式:**
- 安装Python包:`.uv_pip_install("pandas", "scikit-learn")`
- 安装系统包:`.apt_install("ffmpeg", "git")`
- 使用现有Docker镜像:`modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- 添加本地代码:`.add_local_python_source("my_module")`

详细的镜像构建文档请参考`references/images.md`。

2. Create Functions

2. 创建函数

Define functions that run in the cloud with the
@app.function()
decorator.
python
@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()
Call functions:
python
undefined
使用
@app.function()
装饰器定义在云端运行的函数。
python
@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()
调用函数:
python
undefined

From local entrypoint

本地入口点

@app.local_entrypoint() def main(): result = process_data.remote("data.csv") print(result)

Run with: `modal run script.py`

See `references/functions.md` for function patterns, deployment, and parameter handling.
@app.local_entrypoint() def main(): result = process_data.remote("data.csv") print(result)

运行命令:`modal run script.py`

函数模式、部署和参数处理的详细内容请参考`references/functions.md`。

3. Request GPUs

3. 请求GPU

Attach GPUs to functions for accelerated computation.
python
@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU-accelerated code here
Available GPU types:
  • T4
    ,
    L4
    - Cost-effective inference
  • A10
    ,
    A100
    ,
    A100-80GB
    - Standard training/inference
  • L40S
    - Excellent cost/performance balance (48GB)
  • H100
    ,
    H200
    - High-performance training
  • B200
    - Flagship performance (most powerful)
Request multiple GPUs:
python
@app.function(gpu="H100:8")  # 8x H100 GPUs
def train_large_model():
    pass
See
references/gpu.md
for GPU selection guidance, CUDA setup, and multi-GPU configuration.
为函数附加GPU以实现加速计算。
python
@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # 此处为GPU加速代码
可用GPU类型:
  • T4
    L4
    - 高性价比推理
  • A10
    A100
    A100-80GB
    - 标准训练/推理
  • L40S
    - 出色的性价比(48GB显存)
  • H100
    H200
    - 高性能训练
  • B200
    - 旗舰性能(最强大)
请求多块GPU:
python
@app.function(gpu="H100:8")  # 8块H100 GPU
def train_large_model():
    pass
GPU选择指南、CUDA设置和多GPU配置的详细内容请参考
references/gpu.md

4. Configure Resources

4. 配置资源

Request CPU cores, memory, and disk for functions.
python
@app.function(
    cpu=8.0,           # 8 physical cores
    memory=32768,      # 32 GiB RAM
    ephemeral_disk=10240  # 10 GiB disk
)
def memory_intensive_task():
    pass
Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
See
references/resources.md
for resource limits and billing details.
为函数请求CPU核心、内存和磁盘资源。
python
@app.function(
    cpu=8.0,           # 8个物理核心
    memory=32768,      # 32 GiB内存
    ephemeral_disk=10240  # 10 GiB磁盘
)
def memory_intensive_task():
    pass
默认分配:0.125个CPU核心,128 MiB内存。计费基于预留资源或实际使用资源,以较高者为准。
资源限制和计费详情请参考
references/resources.md

5. Scale Automatically

5. 自动扩缩容

Modal autoscales functions from zero to thousands of containers based on demand.
Process inputs in parallel:
python
@app.function()
def analyze_sample(sample_id: int):
    # Process single sample
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # Automatically parallelized across containers
    results = list(analyze_sample.map(sample_ids))
Configure autoscaling:
python
@app.function(
    max_containers=100,      # Upper limit
    min_containers=2,        # Keep warm
    buffer_containers=5      # Idle buffer for bursts
)
def inference():
    pass
See
references/scaling.md
for autoscaling configuration, concurrency, and scaling limits.
Modal可根据需求自动将函数从0扩缩至数千个容器。
并行处理输入:
python
@app.function()
def analyze_sample(sample_id: int):
    # 处理单个样本
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # 自动在多个容器间并行处理
    results = list(analyze_sample.map(sample_ids))
配置自动扩缩容:
python
@app.function(
    max_containers=100,      # 上限
    min_containers=2,        # 保持预热
    buffer_containers=5      # 应对突发请求的空闲缓冲容器
)
def inference():
    pass
自动扩缩容配置、并发和扩缩容限制的详细内容请参考
references/scaling.md

6. Store Data Persistently

6. 持久化存储数据

Use Volumes for persistent storage across function invocations.
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes
Volumes persist data between runs, store model weights, cache datasets, and share data between functions.
See
references/volumes.md
for volume management, commits, and caching patterns.
使用Volumes在函数调用之间实现持久化存储。
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # 持久化更改
Volumes可在多次运行之间保留数据、存储模型权重、缓存数据集,并在函数之间共享数据。
卷管理、提交和缓存模式的详细内容请参考
references/volumes.md

7. Manage Secrets

7. 管理密钥

Store API keys and credentials securely using Modal Secrets.
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # Use token for authentication
Create secrets in Modal dashboard or via CLI:
bash
modal secret create my-secret KEY=value API_TOKEN=xyz
See
references/secrets.md
for secret management and authentication patterns.
使用Modal Secrets安全存储API密钥和凭证。
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # 使用令牌进行认证
在Modal控制台或通过CLI创建密钥:
bash
modal secret create my-secret KEY=value API_TOKEN=xyz
密钥管理和认证模式的详细内容请参考
references/secrets.md

8. Deploy Web Endpoints

8. 部署Web端点

Serve HTTP endpoints, APIs, and webhooks with
@modal.web_endpoint()
.
python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # Process request
    result = model.predict(data["input"])
    return {"prediction": result}
Deploy with:
bash
modal deploy script.py
Modal provides HTTPS URL for the endpoint.
See
references/web-endpoints.md
for FastAPI integration, streaming, authentication, and WebSocket support.
使用
@modal.web_endpoint()
提供HTTP端点、API和Webhook。
python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # 处理请求
    result = model.predict(data["input"])
    return {"prediction": result}
部署命令:
bash
modal deploy script.py
Modal会为该端点提供HTTPS URL。
FastAPI集成、流式传输、认证和WebSocket支持的详细内容请参考
references/web-endpoints.md

9. Schedule Jobs

9. 调度作业

Run functions on a schedule with cron expressions.
python
@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    # Backup data
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    # Update cache
    pass
Scheduled functions run automatically without manual invocation.
See
references/scheduled-jobs.md
for cron syntax, timezone configuration, and monitoring.
使用cron表达式按计划运行函数。
python
@app.function(schedule=modal.Cron("0 2 * * *"))  # 每日凌晨2点运行
def daily_backup():
    # 备份数据
    pass

@app.function(schedule=modal.Period(hours=4))  # 每4小时运行一次
def refresh_cache():
    # 更新缓存
    pass
定时作业会自动运行,无需手动调用。
Cron语法、时区配置和监控的详细内容请参考
references/scheduled-jobs.md

Common Workflows

常见工作流

Deploy ML Model for Inference

部署机器学习模型用于推理

python
import modal
python
import modal

Define dependencies

定义依赖项

image = modal.Image.debian_slim().uv_pip_install("torch", "transformers") app = modal.App("llm-inference", image=image)
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers") app = modal.App("llm-inference", image=image)

Download model at build time

在构建时下载模型

@app.function() def download_model(): from transformers import AutoModel AutoModel.from_pretrained("bert-base-uncased")
@app.function() def download_model(): from transformers import AutoModel AutoModel.from_pretrained("bert-base-uncased")

Serve model

提供模型服务

@app.cls(gpu="L40S") class Model: @modal.enter() def load_model(self): from transformers import pipeline self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
    return self.pipe(text)
@app.local_entrypoint() def main(): model = Model() result = model.predict.remote("Modal is great!") print(result)
undefined
@app.cls(gpu="L40S") class Model: @modal.enter() def load_model(self): from transformers import pipeline self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
    return self.pipe(text)
@app.local_entrypoint() def main(): model = Model() result = model.predict.remote("Modal is great!") print(result)
undefined

Batch Process Large Dataset

批处理大型数据集

python
@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # Process data
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 1000s of files
    # Automatically parallelized across containers
    for count in process_file.map(files):
        print(f"Processed {count} rows")
python
@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # 处理数据
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 数千个文件
    # 自动在多个容器间并行处理
    for count in process_file.map(files):
        print(f"已处理 {count} 行数据")

Train Model on GPU

在GPU上训练模型

python
@app.function(
    gpu="A100:2",      # 2x A100 GPUs
    timeout=3600       # 1 hour timeout
)
def train_model(config: dict):
    import torch
    # Multi-GPU training code
    model = create_model(config)
    train(model)
    return metrics
python
@app.function(
    gpu="A100:2",      # 2块A100 GPU
    timeout=3600       # 1小时超时时间
)
def train_model(config: dict):
    import torch
    # 多GPU训练代码
    model = create_model(config)
    train(model)
    return metrics

Reference Documentation

参考文档

Detailed documentation for specific features:
  • references/getting-started.md
    - Authentication, setup, basic concepts
  • references/images.md
    - Image building, dependencies, Dockerfiles
  • references/functions.md
    - Function patterns, deployment, parameters
  • references/gpu.md
    - GPU types, CUDA, multi-GPU configuration
  • references/resources.md
    - CPU, memory, disk management
  • references/scaling.md
    - Autoscaling, parallel execution, concurrency
  • references/volumes.md
    - Persistent storage, data management
  • references/secrets.md
    - Environment variables, authentication
  • references/web-endpoints.md
    - APIs, webhooks, endpoints
  • references/scheduled-jobs.md
    - Cron jobs, periodic tasks
  • references/examples.md
    - Common patterns for scientific computing
特定功能的详细文档:
  • references/getting-started.md
    - 认证、设置、基本概念
  • references/images.md
    - 镜像构建、依赖项、Dockerfile
  • references/functions.md
    - 函数模式、部署、参数
  • references/gpu.md
    - GPU类型、CUDA、多GPU配置
  • references/resources.md
    - CPU、内存、磁盘管理
  • references/scaling.md
    - 自动扩缩容、并行执行、并发
  • references/volumes.md
    - 持久化存储、数据管理
  • references/secrets.md
    - 环境变量、认证
  • references/web-endpoints.md
    - API、Webhook、端点
  • references/scheduled-jobs.md
    - Cron作业、周期性任务
  • references/examples.md
    - 科学计算的常见模式

Best Practices

最佳实践

  1. Pin dependencies in
    .uv_pip_install()
    for reproducible builds
  2. Use appropriate GPU types - L40S for inference, H100/A100 for training
  3. Leverage caching - Use Volumes for model weights and datasets
  4. Configure autoscaling - Set
    max_containers
    and
    min_containers
    based on workload
  5. Import packages in function body if not available locally
  6. Use
    .map()
    for parallel processing
    instead of sequential loops
  7. Store secrets securely - Never hardcode API keys
  8. Monitor costs - Check Modal dashboard for usage and billing
  1. 固定依赖项:在
    .uv_pip_install()
    中固定依赖版本以实现可复现的构建
  2. 选择合适的GPU类型 - L40S用于推理,H100/A100用于训练
  3. 利用缓存 - 使用Volumes存储模型权重和数据集
  4. 配置自动扩缩容 - 根据工作负载设置
    max_containers
    min_containers
  5. 在函数体内导入包:如果本地没有该包
  6. 使用
    .map()
    进行并行处理
    :替代顺序循环
  7. 安全存储密钥 - 切勿硬编码API密钥
  8. 监控成本 - 在Modal控制台查看使用情况和账单

Troubleshooting

故障排除

"Module not found" errors:
  • Add packages to image with
    .uv_pip_install("package-name")
  • Import packages inside function body if not available locally
GPU not detected:
  • Verify GPU specification:
    @app.function(gpu="A100")
  • Check CUDA availability:
    torch.cuda.is_available()
Function timeout:
  • Increase timeout:
    @app.function(timeout=3600)
  • Default timeout is 5 minutes
Volume changes not persisting:
  • Call
    volume.commit()
    after writing files
  • Verify volume mounted correctly in function decorator
For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
"Module not found"错误:
  • 使用
    .uv_pip_install("package-name")
    将包添加到镜像中
  • 如果本地没有该包,在函数体内导入
GPU未被检测到:
  • 验证GPU规格:
    @app.function(gpu="A100")
  • 检查CUDA可用性:
    torch.cuda.is_available()
函数超时:
  • 增加超时时间:
    @app.function(timeout=3600)
  • 默认超时时间为5分钟
Volume更改未持久化:
  • 写入文件后调用
    volume.commit()
  • 验证函数装饰器中Volume是否正确挂载
如需更多帮助,请查看Modal文档https://modal.com/docs或加入Modal Slack社区。