modal
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModal
Modal
Overview
概述
Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.
Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.
Modal是一个无需过多配置即可在云端运行Python代码的无服务器平台。可在强大的GPU上执行函数,自动扩缩至数千个容器,并且仅为实际使用的计算资源付费。
Modal尤其适用于AI/ML工作负载、高性能批处理、定时任务、GPU推理以及无服务器API。可在https://modal.com免费注册,每月可获得30美元的信用额度。
When to Use This Skill
何时使用该工具
Use Modal for:
- Deploying and serving ML models (LLMs, image generation, embedding models)
- Running GPU-accelerated computation (training, inference, rendering)
- Batch processing large datasets in parallel
- Scheduling compute-intensive jobs (daily data processing, model training)
- Building serverless APIs that need automatic scaling
- Scientific computing requiring distributed compute or specialized hardware
在以下场景使用Modal:
- 部署并提供机器学习模型服务(大语言模型、图像生成模型、嵌入模型)
- 运行GPU加速的计算任务(训练、推理、渲染)
- 并行批处理大型数据集
- 调度计算密集型作业(每日数据处理、模型训练)
- 构建需要自动扩缩容的无服务器API
- 需要分布式计算或专用硬件的科学计算
Authentication and Setup
认证与设置
Modal requires authentication via API token.
Modal需要通过API令牌进行认证。
Initial Setup
初始设置
bash
undefinedbash
undefinedInstall Modal
安装Modal
uv uv pip install modal
uv uv pip install modal
Authenticate (opens browser for login)
认证(打开浏览器登录)
modal token new
This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.modal token new
此操作会在`~/.modal.toml`中存储一个令牌,该令牌将用于所有Modal操作的认证。Verify Setup
验证设置
python
import modal
app = modal.App("test-app")
@app.function()
def hello():
print("Modal is working!")Run with:
modal run script.pypython
import modal
app = modal.App("test-app")
@app.function()
def hello():
print("Modal is working!")运行命令:
modal run script.pyCore Capabilities
核心功能
Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
Modal通过在容器中运行的函数提供无服务器Python执行能力。可声明式地定义计算需求、依赖项和扩缩容行为。
1. Define Container Images
1. 定义容器镜像
Specify dependencies and environment for functions using Modal Images.
python
import modal使用Modal镜像为函数指定依赖项和运行环境。
python
import modalBasic image with Python packages
包含Python包的基础镜像
image = (
modal.Image.debian_slim(python_version="3.12")
.uv_pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
**Common patterns:**
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
- Install system packages: `.apt_install("ffmpeg", "git")`
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- Add local code: `.add_local_python_source("my_module")`
See `references/images.md` for comprehensive image building documentation.image = (
modal.Image.debian_slim(python_version="3.12")
.uv_pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
**常见模式:**
- 安装Python包:`.uv_pip_install("pandas", "scikit-learn")`
- 安装系统包:`.apt_install("ffmpeg", "git")`
- 使用现有Docker镜像:`modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- 添加本地代码:`.add_local_python_source("my_module")`
详细的镜像构建文档请参考`references/images.md`。2. Create Functions
2. 创建函数
Define functions that run in the cloud with the decorator.
@app.function()python
@app.function()
def process_data(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
return df.describe()Call functions:
python
undefined使用装饰器定义在云端运行的函数。
@app.function()python
@app.function()
def process_data(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
return df.describe()调用函数:
python
undefinedFrom local entrypoint
本地入口点
@app.local_entrypoint()
def main():
result = process_data.remote("data.csv")
print(result)
Run with: `modal run script.py`
See `references/functions.md` for function patterns, deployment, and parameter handling.@app.local_entrypoint()
def main():
result = process_data.remote("data.csv")
print(result)
运行命令:`modal run script.py`
函数模式、部署和参数处理的详细内容请参考`references/functions.md`。3. Request GPUs
3. 请求GPU
Attach GPUs to functions for accelerated computation.
python
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# GPU-accelerated code hereAvailable GPU types:
- ,
T4- Cost-effective inferenceL4 - ,
A10,A100- Standard training/inferenceA100-80GB - - Excellent cost/performance balance (48GB)
L40S - ,
H100- High-performance trainingH200 - - Flagship performance (most powerful)
B200
Request multiple GPUs:
python
@app.function(gpu="H100:8") # 8x H100 GPUs
def train_large_model():
passSee for GPU selection guidance, CUDA setup, and multi-GPU configuration.
references/gpu.md为函数附加GPU以实现加速计算。
python
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# 此处为GPU加速代码可用GPU类型:
- 、
T4- 高性价比推理L4 - 、
A10、A100- 标准训练/推理A100-80GB - - 出色的性价比(48GB显存)
L40S - 、
H100- 高性能训练H200 - - 旗舰性能(最强大)
B200
请求多块GPU:
python
@app.function(gpu="H100:8") # 8块H100 GPU
def train_large_model():
passGPU选择指南、CUDA设置和多GPU配置的详细内容请参考。
references/gpu.md4. Configure Resources
4. 配置资源
Request CPU cores, memory, and disk for functions.
python
@app.function(
cpu=8.0, # 8 physical cores
memory=32768, # 32 GiB RAM
ephemeral_disk=10240 # 10 GiB disk
)
def memory_intensive_task():
passDefault allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
See for resource limits and billing details.
references/resources.md为函数请求CPU核心、内存和磁盘资源。
python
@app.function(
cpu=8.0, # 8个物理核心
memory=32768, # 32 GiB内存
ephemeral_disk=10240 # 10 GiB磁盘
)
def memory_intensive_task():
pass默认分配:0.125个CPU核心,128 MiB内存。计费基于预留资源或实际使用资源,以较高者为准。
资源限制和计费详情请参考。
references/resources.md5. Scale Automatically
5. 自动扩缩容
Modal autoscales functions from zero to thousands of containers based on demand.
Process inputs in parallel:
python
@app.function()
def analyze_sample(sample_id: int):
# Process single sample
return result
@app.local_entrypoint()
def main():
sample_ids = range(1000)
# Automatically parallelized across containers
results = list(analyze_sample.map(sample_ids))Configure autoscaling:
python
@app.function(
max_containers=100, # Upper limit
min_containers=2, # Keep warm
buffer_containers=5 # Idle buffer for bursts
)
def inference():
passSee for autoscaling configuration, concurrency, and scaling limits.
references/scaling.mdModal可根据需求自动将函数从0扩缩至数千个容器。
并行处理输入:
python
@app.function()
def analyze_sample(sample_id: int):
# 处理单个样本
return result
@app.local_entrypoint()
def main():
sample_ids = range(1000)
# 自动在多个容器间并行处理
results = list(analyze_sample.map(sample_ids))配置自动扩缩容:
python
@app.function(
max_containers=100, # 上限
min_containers=2, # 保持预热
buffer_containers=5 # 应对突发请求的空闲缓冲容器
)
def inference():
pass自动扩缩容配置、并发和扩缩容限制的详细内容请参考。
references/scaling.md6. Store Data Persistently
6. 持久化存储数据
Use Volumes for persistent storage across function invocations.
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # Persist changesVolumes persist data between runs, store model weights, cache datasets, and share data between functions.
See for volume management, commits, and caching patterns.
references/volumes.md使用Volumes在函数调用之间实现持久化存储。
python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # 持久化更改Volumes可在多次运行之间保留数据、存储模型权重、缓存数据集,并在函数之间共享数据。
卷管理、提交和缓存模式的详细内容请参考。
references/volumes.md7. Manage Secrets
7. 管理密钥
Store API keys and credentials securely using Modal Secrets.
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]
# Use token for authenticationCreate secrets in Modal dashboard or via CLI:
bash
modal secret create my-secret KEY=value API_TOKEN=xyzSee for secret management and authentication patterns.
references/secrets.md使用Modal Secrets安全存储API密钥和凭证。
python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]
# 使用令牌进行认证在Modal控制台或通过CLI创建密钥:
bash
modal secret create my-secret KEY=value API_TOKEN=xyz密钥管理和认证模式的详细内容请参考。
references/secrets.md8. Deploy Web Endpoints
8. 部署Web端点
Serve HTTP endpoints, APIs, and webhooks with .
@modal.web_endpoint()python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
# Process request
result = model.predict(data["input"])
return {"prediction": result}Deploy with:
bash
modal deploy script.pyModal provides HTTPS URL for the endpoint.
See for FastAPI integration, streaming, authentication, and WebSocket support.
references/web-endpoints.md使用提供HTTP端点、API和Webhook。
@modal.web_endpoint()python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
# 处理请求
result = model.predict(data["input"])
return {"prediction": result}部署命令:
bash
modal deploy script.pyModal会为该端点提供HTTPS URL。
FastAPI集成、流式传输、认证和WebSocket支持的详细内容请参考。
references/web-endpoints.md9. Schedule Jobs
9. 调度作业
Run functions on a schedule with cron expressions.
python
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM
def daily_backup():
# Backup data
pass
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours
def refresh_cache():
# Update cache
passScheduled functions run automatically without manual invocation.
See for cron syntax, timezone configuration, and monitoring.
references/scheduled-jobs.md使用cron表达式按计划运行函数。
python
@app.function(schedule=modal.Cron("0 2 * * *")) # 每日凌晨2点运行
def daily_backup():
# 备份数据
pass
@app.function(schedule=modal.Period(hours=4)) # 每4小时运行一次
def refresh_cache():
# 更新缓存
pass定时作业会自动运行,无需手动调用。
Cron语法、时区配置和监控的详细内容请参考。
references/scheduled-jobs.mdCommon Workflows
常见工作流
Deploy ML Model for Inference
部署机器学习模型用于推理
python
import modalpython
import modalDefine dependencies
定义依赖项
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)
Download model at build time
在构建时下载模型
@app.function()
def download_model():
from transformers import AutoModel
AutoModel.from_pretrained("bert-base-uncased")
@app.function()
def download_model():
from transformers import AutoModel
AutoModel.from_pretrained("bert-base-uncased")
Serve model
提供模型服务
@app.cls(gpu="L40S")
class Model:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
return self.pipe(text)@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")
print(result)
undefined@app.cls(gpu="L40S")
class Model:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
return self.pipe(text)@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")
print(result)
undefinedBatch Process Large Dataset
批处理大型数据集
python
@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
# Process data
return df.shape[0]
@app.local_entrypoint()
def main():
files = ["file1.csv", "file2.csv", ...] # 1000s of files
# Automatically parallelized across containers
for count in process_file.map(files):
print(f"Processed {count} rows")python
@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
# 处理数据
return df.shape[0]
@app.local_entrypoint()
def main():
files = ["file1.csv", "file2.csv", ...] # 数千个文件
# 自动在多个容器间并行处理
for count in process_file.map(files):
print(f"已处理 {count} 行数据")Train Model on GPU
在GPU上训练模型
python
@app.function(
gpu="A100:2", # 2x A100 GPUs
timeout=3600 # 1 hour timeout
)
def train_model(config: dict):
import torch
# Multi-GPU training code
model = create_model(config)
train(model)
return metricspython
@app.function(
gpu="A100:2", # 2块A100 GPU
timeout=3600 # 1小时超时时间
)
def train_model(config: dict):
import torch
# 多GPU训练代码
model = create_model(config)
train(model)
return metricsReference Documentation
参考文档
Detailed documentation for specific features:
- - Authentication, setup, basic concepts
references/getting-started.md - - Image building, dependencies, Dockerfiles
references/images.md - - Function patterns, deployment, parameters
references/functions.md - - GPU types, CUDA, multi-GPU configuration
references/gpu.md - - CPU, memory, disk management
references/resources.md - - Autoscaling, parallel execution, concurrency
references/scaling.md - - Persistent storage, data management
references/volumes.md - - Environment variables, authentication
references/secrets.md - - APIs, webhooks, endpoints
references/web-endpoints.md - - Cron jobs, periodic tasks
references/scheduled-jobs.md - - Common patterns for scientific computing
references/examples.md
特定功能的详细文档:
- - 认证、设置、基本概念
references/getting-started.md - - 镜像构建、依赖项、Dockerfile
references/images.md - - 函数模式、部署、参数
references/functions.md - - GPU类型、CUDA、多GPU配置
references/gpu.md - - CPU、内存、磁盘管理
references/resources.md - - 自动扩缩容、并行执行、并发
references/scaling.md - - 持久化存储、数据管理
references/volumes.md - - 环境变量、认证
references/secrets.md - - API、Webhook、端点
references/web-endpoints.md - - Cron作业、周期性任务
references/scheduled-jobs.md - - 科学计算的常见模式
references/examples.md
Best Practices
最佳实践
- Pin dependencies in for reproducible builds
.uv_pip_install() - Use appropriate GPU types - L40S for inference, H100/A100 for training
- Leverage caching - Use Volumes for model weights and datasets
- Configure autoscaling - Set and
max_containersbased on workloadmin_containers - Import packages in function body if not available locally
- Use for parallel processing instead of sequential loops
.map() - Store secrets securely - Never hardcode API keys
- Monitor costs - Check Modal dashboard for usage and billing
- 固定依赖项:在中固定依赖版本以实现可复现的构建
.uv_pip_install() - 选择合适的GPU类型 - L40S用于推理,H100/A100用于训练
- 利用缓存 - 使用Volumes存储模型权重和数据集
- 配置自动扩缩容 - 根据工作负载设置和
max_containersmin_containers - 在函数体内导入包:如果本地没有该包
- 使用进行并行处理:替代顺序循环
.map() - 安全存储密钥 - 切勿硬编码API密钥
- 监控成本 - 在Modal控制台查看使用情况和账单
Troubleshooting
故障排除
"Module not found" errors:
- Add packages to image with
.uv_pip_install("package-name") - Import packages inside function body if not available locally
GPU not detected:
- Verify GPU specification:
@app.function(gpu="A100") - Check CUDA availability:
torch.cuda.is_available()
Function timeout:
- Increase timeout:
@app.function(timeout=3600) - Default timeout is 5 minutes
Volume changes not persisting:
- Call after writing files
volume.commit() - Verify volume mounted correctly in function decorator
For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
"Module not found"错误:
- 使用将包添加到镜像中
.uv_pip_install("package-name") - 如果本地没有该包,在函数体内导入
GPU未被检测到:
- 验证GPU规格:
@app.function(gpu="A100") - 检查CUDA可用性:
torch.cuda.is_available()
函数超时:
- 增加超时时间:
@app.function(timeout=3600) - 默认超时时间为5分钟
Volume更改未持久化:
- 写入文件后调用
volume.commit() - 验证函数装饰器中Volume是否正确挂载
如需更多帮助,请查看Modal文档https://modal.com/docs或加入Modal Slack社区。