flash
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRunpod Flash
Runpod Flash
runpod-flash (v1.0.0) is a Python SDK for distributed execution of AI workloads on RunPod's serverless infrastructure. Write Python functions locally, decorate with , and Flash handles GPU/CPU provisioning, dependency management, and data transfer.
@remote- Package:
pip install runpod-flash - Import:
from runpod_flash import remote, LiveServerless, GpuGroup, ... - CLI:
flash - Python: >=3.10, <3.15
runpod-flash(v1.0.0)是一款Python SDK,用于在RunPod的无服务器基础设施上分布式执行AI工作负载。你可以在本地编写Python函数,使用装饰器标记,Flash会自动处理GPU/CPU资源分配、依赖管理以及数据传输。
@remote- 安装包:
pip install runpod-flash - 导入方式:
from runpod_flash import remote, LiveServerless, GpuGroup, ... - 命令行工具:
flash - Python版本要求:>=3.10, <3.15
Getting Started
快速开始
1. Install Flash
1. 安装Flash
bash
pip install runpod-flashbash
pip install runpod-flash2. Set your RunPod API key
2. 设置RunPod API密钥
Get a key from RunPod account settings, then either export it:
bash
export RUNPOD_API_KEY=your_api_key_hereOr save in a file in your project directory (Flash auto-loads via ):
.envpython-dotenvbash
echo "RUNPOD_API_KEY=your_api_key_here" > .env从RunPod账户设置页面获取密钥,然后通过以下方式配置:
bash
export RUNPOD_API_KEY=your_api_key_here或者在项目目录下创建文件(Flash会通过自动加载):
.envpython-dotenvbash
echo "RUNPOD_API_KEY=your_api_key_here" > .env3. Write and run a remote function
3. 编写并运行远程函数
python
import asyncio
from runpod_flash import remote, LiveServerless
gpu_config = LiveServerless(name="my-first-worker")
@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}
async def main():
result = await gpu_task([1, 2, 3, 4, 5])
print(result)
if __name__ == "__main__":
asyncio.run(main())First run takes ~1 minute (endpoint provisioning). Subsequent runs take ~1 second.
python
import asyncio
from runpod_flash import remote, LiveServerless
gpu_config = LiveServerless(name="my-first-worker")
@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}
async def main():
result = await gpu_task([1, 2, 3, 4, 5])
print(result)
if __name__ == "__main__":
asyncio.run(main())首次运行约需1分钟(用于端点资源分配),后续运行仅需约1秒。
4. Or create a Flash API project
4. 创建Flash API项目
bash
flash init my_project
cd my_project
pip install -r requirements.txtbash
flash init my_project
cd my_project
pip install -r requirements.txtEdit .env and add your RUNPOD_API_KEY
编辑.env文件并添加你的RUNPOD_API_KEY
flash run # Start local FastAPI server at localhost:8888
flash run --auto-provision # Pre-deploy all endpoints (faster testing)
API explorer available at `http://localhost:8888/docs`.flash run # 在localhost:8888启动本地FastAPI服务器
flash run --auto-provision # 预部署所有端点(加快测试速度)
API调试页面可访问:`http://localhost:8888/docs`。5. Build and deploy to production
5. 构建并部署到生产环境
bash
flash build # Scan @remote functions, package artifact
flash build --exclude torch,torchvision # Exclude packages in base image (500MB limit)
flash deploy new production # Create deployment environment
flash deploy send production # Upload and deploy
flash deploy list # List environments
flash deploy info production # Show details
flash deploy delete production # Tear downbash
flash build # 扫描@remote函数,打包产物
flash build --exclude torch,torchvision # 排除基础镜像中已有的包(部署包上限500MB)
flash deploy new production # 创建部署环境
flash deploy send production # 上传并部署
flash deploy list # 列出所有环境
flash deploy info production # 查看环境详情
flash deploy delete production # 销毁环境Core Concept: The @remote Decorator
核心概念:@remote装饰器
The decorator marks functions for remote execution on RunPod infrastructure. Code inside runs remotely; code outside runs locally.
@remotepython
from runpod_flash import remote, LiveServerless
config = LiveServerless(name="my-worker")
@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
import torch # MUST import inside function
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
result = await gpu_compute([1, 2, 3])@remotepython
from runpod_flash import remote, LiveServerless
config = LiveServerless(name="my-worker")
@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
import torch # 必须在函数内部导入
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
result = await gpu_compute([1, 2, 3])@remote Signature
@remote参数说明
python
def remote(
resource_config: ServerlessResource, # Required: GPU/CPU config
dependencies: list[str] = None, # pip packages
system_dependencies: list[str] = None,# apt-get packages
accelerate_downloads: bool = True, # CDN acceleration
local: bool = False, # Execute locally (testing)
method: str = None, # HTTP method (LoadBalancer only)
path: str = None, # HTTP path (LoadBalancer only)
)python
def remote(
resource_config: ServerlessResource, # 必填:GPU/CPU配置
dependencies: list[str] = None, # pip依赖包
system_dependencies: list[str] = None,# apt-get系统依赖
accelerate_downloads: bool = True, # CDN加速下载
local: bool = False, # 本地执行(测试用)
method: str = None, # HTTP方法(仅LoadBalancer可用)
path: str = None, # HTTP路径(仅LoadBalancer可用)
)CRITICAL: Cloudpickle Scoping Rules
关键注意事项:Cloudpickle作用域规则
Functions decorated with are serialized with cloudpickle. They can ONLY access:
@remote- Function parameters
- Local variables defined inside the function
- Imports done inside the function
- Built-in Python functions
They CANNOT access: module-level imports, global variables, external functions/classes.
python
undefined使用装饰的函数会通过cloudpickle序列化,仅能访问以下内容:
@remote- 函数参数
- 函数内部定义的局部变量
- 函数内部导入的模块
- Python内置函数
无法访问:模块级导入、全局变量、外部函数/类。
python
undefinedWRONG - external references
错误示例 - 引用外部资源
import torch
@remote(resource_config=config)
async def bad(data):
return torch.tensor(data) # torch not accessible
import torch
@remote(resource_config=config)
async def bad(data):
return torch.tensor(data) # torch无法被访问
CORRECT - everything inside
正确示例 - 所有资源在内部定义
@remote(resource_config=config, dependencies=["torch"])
async def good(data):
import torch
return torch.tensor(data)
undefined@remote(resource_config=config, dependencies=["torch"])
async def good(data):
import torch
return torch.tensor(data)
undefinedReturn Behavior
返回值行为
- Decorated function is always awaitable ()
await my_func(...) - Queue-based resources return with
JobOutput,.output,.error.status - Load-balanced resources return your dict directly
- 被装饰的函数始终是可等待的(需使用)
await my_func(...) - 基于队列的资源会返回对象,包含
JobOutput,.output,.error属性.status - 负载均衡类资源会直接返回你的字典数据
Resource Configuration Classes
资源配置类
Choose based on execution model and environment:
| Class | Queue | HTTP | Environment | Use Case |
|---|---|---|---|---|
| Yes | No | Dev | GPU with retries, remote code exec |
| Yes | No | Dev | CPU with retries, remote code exec |
| Yes | No | Prod | GPU, custom Docker images |
| Yes | No | Prod | CPU, custom Docker images |
| No | Yes | Dev | GPU low-latency HTTP APIs |
| No | Yes | Dev | CPU low-latency HTTP APIs |
| No | Yes | Prod | GPU production HTTP |
| No | Yes | Prod | CPU production HTTP |
Queue-based: Best for batch, long-running tasks, automatic retries.
Load-balanced: Best for real-time APIs, low-latency, direct HTTP routing.
Live* classes: Fixed optimized Docker image, full remote code execution.
Non-Live classes: Custom Docker images, dictionary payload only.
根据执行模型和环境选择合适的配置类:
| 类名 | 队列支持 | HTTP支持 | 适用环境 | 使用场景 |
|---|---|---|---|---|
| 是 | 否 | 开发环境 | 带重试机制的GPU任务、远程代码执行 |
| 是 | 否 | 开发环境 | 带重试机制的CPU任务、远程代码执行 |
| 是 | 否 | 生产环境 | GPU任务、自定义Docker镜像 |
| 是 | 否 | 生产环境 | CPU任务、自定义Docker镜像 |
| 否 | 是 | 开发环境 | 低延迟GPU HTTP API |
| 否 | 是 | 开发环境 | 低延迟CPU HTTP API |
| 否 | 是 | 生产环境 | 生产级GPU HTTP服务 |
| 否 | 是 | 生产环境 | 生产级CPU HTTP服务 |
基于队列的资源:最适合批量、长时间运行的任务,支持自动重试。
负载均衡类资源:最适合实时API、低延迟场景,支持直接HTTP路由。
Live*类:使用固定优化的Docker镜像,支持完整的远程代码执行。
非Live类:支持自定义Docker镜像,仅接受字典格式的负载。
Common Parameters
通用参数
python
LiveServerless(
name="worker-name", # Required, unique
gpus=[GpuGroup.AMPERE_80], # GPU type(s)
workersMin=0, # Min workers
workersMax=3, # Max workers
idleTimeout=300, # Seconds before scale-down
networkVolumeId="vol_abc123", # Persistent storage
env={"KEY": "value"}, # Environment variables
template=PodTemplate(containerDiskInGb=100),
)python
LiveServerless(
name="worker-name", # 必填,唯一标识
gpus=[GpuGroup.AMPERE_80], # GPU类型
workersMin=0, # 最小工作节点数
workersMax=3, # 最大工作节点数
idleTimeout=300, # 空闲超时后缩容(秒)
networkVolumeId="vol_abc123", # 持久化存储
env={"KEY": "value"}, # 环境变量
template=PodTemplate(containerDiskInGb=100),
)GPU Groups (GpuGroup enum)
GPU分组(GpuGroup枚举)
- - Any available (not for production)
GpuGroup.ANY - - RTX A4000, 16GB
GpuGroup.AMPERE_16 - - RTX A5000, 24GB
GpuGroup.AMPERE_24 - - A40/RTX A6000, 48GB
GpuGroup.AMPERE_48 - - A100, 80GB
GpuGroup.AMPERE_80 - - RTX 4090, 24GB
GpuGroup.ADA_24 - - RTX 5090, 32GB
GpuGroup.ADA_32_PRO - - RTX 6000 Ada, 48GB
GpuGroup.ADA_48_PRO - - H100, 80GB
GpuGroup.ADA_80_PRO - - H200, 141GB
GpuGroup.HOPPER_141
- - 任意可用GPU(不建议生产环境使用)
GpuGroup.ANY - - RTX A4000,16GB
GpuGroup.AMPERE_16 - - RTX A5000,24GB
GpuGroup.AMPERE_24 - - A40/RTX A6000,48GB
GpuGroup.AMPERE_48 - - A100,80GB
GpuGroup.AMPERE_80 - - RTX 4090,24GB
GpuGroup.ADA_24 - - RTX 5090,32GB
GpuGroup.ADA_32_PRO - - RTX 6000 Ada,48GB
GpuGroup.ADA_48_PRO - - H100,80GB
GpuGroup.ADA_80_PRO - - H200,141GB
GpuGroup.HOPPER_141
CPU Instance Types (CpuInstanceType enum)
CPU实例类型(CpuInstanceType枚举)
Format:
CPU{generation}{type}_{vcpu}_{memory_gb}| Instance Type | Gen | Type | vCPU | RAM |
|---|---|---|---|---|
| 3rd | General | 1 | 4GB |
| 3rd | General | 2 | 8GB |
| 3rd | General | 4 | 16GB |
| 3rd | General | 8 | 32GB |
| 3rd | Compute | 1 | 2GB |
| 3rd | Compute | 2 | 4GB |
| 3rd | Compute | 4 | 8GB |
| 3rd | Compute | 8 | 16GB |
| 5th | Compute | 1 | 2GB |
| 5th | Compute | 2 | 4GB |
| 5th | Compute | 4 | 8GB |
| 5th | Compute | 8 | 16GB |
Use with parameter:
instanceIdspython
config = LiveServerless(
name="cpu-worker",
instanceIds=[CpuInstanceType.CPU5C_4_8],
workersMax=5,
)Or use explicit CPU classes:
python
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)格式:
CPU{generation}{type}_{vcpu}_{memory_gb}| 实例类型 | 代际 | 类型 | vCPU | 内存 |
|---|---|---|---|---|
| 3代 | 通用型 | 1 | 4GB |
| 3代 | 通用型 | 2 | 8GB |
| 3代 | 通用型 | 4 | 16GB |
| 3代 | 通用型 | 8 | 32GB |
| 3代 | 计算型 | 1 | 2GB |
| 3代 | 计算型 | 2 | 4GB |
| 3代 | 计算型 | 4 | 8GB |
| 3代 | 计算型 | 8 | 16GB |
| 5代 | 计算型 | 1 | 2GB |
| 5代 | 计算型 | 2 | 4GB |
| 5代 | 计算型 | 4 | 8GB |
| 5代 | 计算型 | 8 | 16GB |
使用参数指定:
instanceIdspython
config = LiveServerless(
name="cpu-worker",
instanceIds=[CpuInstanceType.CPU5C_4_8],
workersMax=5,
)或者直接使用CPU专用配置类:
python
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)PodTemplate
PodTemplate
Override pod-level settings:
python
from runpod_flash import PodTemplate
template = PodTemplate(
containerDiskInGb=100,
env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)
config = LiveServerless(name="worker", template=template)覆盖Pod级别的设置:
python
from runpod_flash import PodTemplate
template = PodTemplate(
containerDiskInGb=100,
env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)
config = LiveServerless(name="worker", template=template)NetworkVolume
NetworkVolume
python
from runpod_flash import NetworkVolume, DataCenter
volume = NetworkVolume(
name="model-storage",
size=100, # GB
dataCenterId=DataCenter.EU_RO_1,
)python
from runpod_flash import NetworkVolume, DataCenter
volume = NetworkVolume(
name="model-storage",
size=100, # GB
dataCenterId=DataCenter.EU_RO_1,
)LoadBalancer Resources
LoadBalancer资源
When using or :
LoadBalancerSlsResourceLiveLoadBalancer- and
methodare required onpath@remote - must start with "/"
path - must be one of: GET, POST, PUT, DELETE, PATCH
method
python
from runpod_flash import remote, LiveLoadBalancer
api = LiveLoadBalancer(name="api-service")
@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
return {"result": x + y}
@remote(api, method="GET", path="/api/health")
def health():
return {"status": "ok"}Key differences from queue-based:
- Direct HTTP routing (no queue), lower latency
- Returns dict directly (no JobOutput wrapper)
- No automatic retries
使用或时:
LoadBalancerSlsResourceLiveLoadBalancer- 必须指定
@remote和method参数path - 必须以"/"开头
path - 必须是以下之一:GET, POST, PUT, DELETE, PATCH
method
python
from runpod_flash import remote, LiveLoadBalancer
api = LiveLoadBalancer(name="api-service")
@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
return {"result": x + y}
@remote(api, method="GET", path="/api/health")
def health():
return {"status": "ok"}与队列资源的主要区别:
- 直接HTTP路由(无队列),延迟更低
- 直接返回字典(无JobOutput包装)
- 不支持自动重试
Error Handling
错误处理
Queue-Based Resources
队列型资源
python
job_output = await my_function(data)
if job_output.error:
print(f"Failed: {job_output.error}")
else:
result = job_output.outputJobOutputidstatusoutputerrorstarted_atended_atpython
job_output = await my_function(data)
if job_output.error:
print(f"执行失败: {job_output.error}")
else:
result = job_output.outputJobOutputidstatusoutputerrorstarted_atended_atLoad-Balanced Resources
负载均衡型资源
python
try:
result = await my_function(data) # Returns dict directly
except Exception as e:
print(f"Error: {e}")python
try:
result = await my_function(data) # 直接返回字典
except Exception as e:
print(f"错误: {e}")Runtime Exceptions
运行时异常类型
FlashRuntimeError (base)
RemoteExecutionError # Remote function failed
SerializationError # cloudpickle serialization failed
GraphQLError # GraphQL base error
GraphQLMutationError # Mutation failed
GraphQLQueryError # Query failed
ManifestError # Invalid/missing manifest
ManifestServiceUnavailableError # State Manager unreachableFlashRuntimeError (基类)
RemoteExecutionError # 远程函数执行失败
SerializationError # cloudpickle序列化失败
GraphQLError # GraphQL基础错误
GraphQLMutationError # Mutation操作失败
GraphQLQueryError # Query操作失败
ManifestError # 清单文件无效/缺失
ManifestServiceUnavailableError # 状态管理器不可达Common Patterns
常见使用模式
Hybrid GPU/CPU Pipeline
GPU/CPU混合流水线
python
from runpod_flash import remote, LiveServerless, CpuInstanceType
cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])
@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
import pandas as pd
return pd.DataFrame(data).to_dict('records')
@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
async def pipeline(raw_data):
clean = await preprocess(raw_data)
return await inference(clean)python
from runpod_flash import remote, LiveServerless, CpuInstanceType
cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])
@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
import pandas as pd
return pd.DataFrame(data).to_dict('records')
@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
async def pipeline(raw_data):
clean = await preprocess(raw_data)
return await inference(clean)Parallel Execution
并行执行
python
results = await asyncio.gather(
process_item(item1),
process_item(item2),
process_item(item3),
)python
results = await asyncio.gather(
process_item(item1),
process_item(item2),
process_item(item3),
)Local Testing
本地测试
python
@remote(resource_config=config, local=True)
async def my_function(data):
return {"status": "ok"} # Runs locally, skips remotepython
@remote(resource_config=config, local=True)
async def my_function(data):
return {"status": "ok"} # 本地执行,跳过远程调用Cost Optimization
成本优化建议
- Use to scale from zero
workersMin=0 - Use to reduce churn
idleTimeout=600 - Use smaller GPUs if they fit your model
- Use classes for spot pricing in dev
Live* - Pass URLs/paths instead of large data objects
- 设置实现零实例启动
workersMin=0 - 调大减少节点频繁启停
idleTimeout=600 - 选择符合模型需求的小规格GPU
- 开发环境使用Live*类享受按需定价
- 传递URL/路径而非大体积数据对象
CLI Commands
CLI命令
flash init
flash init
bash
flash init [project_name]Creates a project template:
project_name/
├── main.py # FastAPI entry point
├── workers/
│ ├── gpu/__init__.py # GPU router
│ │ └── endpoint.py # GPU @remote function
│ └── cpu/__init__.py # CPU router
│ └── endpoint.py # CPU @remote function
├── .env # API key template
├── .gitignore
├── .flashignore # Deployment ignore patterns
├── requirements.txt
└── README.mdbash
flash init [project_name]创建项目模板:
project_name/
├── main.py # FastAPI入口文件
├── workers/
│ ├── gpu/__init__.py # GPU路由
│ │ └── endpoint.py # GPU @remote函数
│ └── cpu/__init__.py # CPU路由
│ └── endpoint.py # CPU @remote函数
├── .env # API密钥模板
├── .gitignore
├── .flashignore # 部署忽略规则
├── requirements.txt
└── README.mdflash run
flash run
bash
flash run [--auto-provision] [--host HOST] [--port PORT]| Option | Default | Description |
|---|---|---|
| off | Pre-deploy all endpoints before serving |
| | Server host (or |
| | Server port (or |
bash
flash run [--auto-provision] [--host HOST] [--port PORT]| 选项 | 默认值 | 描述 |
|---|---|---|
| 关闭 | 启动服务前预部署所有端点 |
| | 服务器地址(或通过FLASH_HOST环境变量设置) |
| | 服务器端口(或通过FLASH_PORT环境变量设置) |
flash build
flash build
bash
flash build [--exclude PACKAGES] [--keep-build] [--preview]| Option | Description |
|---|---|
| Skip packages already in base Docker image |
| Don't delete |
| Build then run in local Docker containers |
Build steps: scan decorators, group by resource config, create , install dependencies for Linux x86_64, package into .
@remoteflash_manifest.json.flash/artifact.tar.gz500MB deployment limit - use for packages in base image:
--excludebash
flash build --exclude torch,torchvision,torchaudio--previewlocalhost:8000bash
flash build [--exclude PACKAGES] [--keep-build] [--preview]| 选项 | 描述 |
|---|---|
| 跳过基础Docker镜像中已有的包 |
| 打包后不删除 |
| 构建后在本地Docker容器中运行,用于测试 |
构建步骤:扫描装饰器、按资源配置分组、创建、为Linux x86_64安装依赖、打包为。
@remoteflash_manifest.json.flash/artifact.tar.gz部署包上限500MB - 使用排除基础镜像中已有的包:
--excludebash
flash build --exclude torch,torchvision,torchaudio--previewlocalhost:8000flash deploy
flash deploy
bash
flash deploy new <env_name> [--app-name NAME] # Create environment
flash deploy send <env_name> [--app-name NAME] # Deploy archive
flash deploy list [--app-name NAME] # List environments
flash deploy info <env_name> [--app-name NAME] # Show details
flash deploy delete <env_name> [--app-name NAME] # Delete (double confirmation)flash deploy sendflash buildbash
flash deploy new <env_name> [--app-name NAME] # 创建环境
flash deploy send <env_name> [--app-name NAME] # 部署归档包
flash deploy list [--app-name NAME] # 列出所有环境
flash deploy info <env_name> [--app-name NAME] # 查看环境详情
flash deploy delete <env_name> [--app-name NAME] # 删除环境(需二次确认)执行前需先运行。
flash deploy sendflash buildflash undeploy
flash undeploy
bash
flash undeploy list # List all deployed resources
flash undeploy <name> # Undeploy specific resourcebash
flash undeploy list # 列出所有已部署资源
flash undeploy <name> # 卸载指定资源flash env / flash app
flash env / flash app
bash
flash env list|create|info|delete <name> # Environment management
flash app list|get <name> # App managementbash
flash env list|create|info|delete <name> # 环境管理
flash app list|get <name> # 应用管理Architecture Overview
架构概述
Deployment Architecture
部署架构
Mothership Pattern: Coordinator endpoint + distributed child endpoints.
- scans code, creates manifest + archive
flash build - uploads archive, provisions resources
flash deploy send - Mothership boots, reconciles desired vs current state
- Child endpoints query State Manager GraphQL for service discovery (peer-to-peer)
- Functions route locally or remotely based on manifest
协调节点模式:协调端点 + 分布式子端点。
- 扫描代码,创建清单文件和归档包
flash build - 上传归档包,分配资源
flash deploy send - 协调节点启动,协调期望状态与当前状态
- 子端点通过State Manager GraphQL进行服务发现(点对点)
- 根据清单文件决定函数本地执行或远程路由
Cross-Endpoint Routing
跨端点路由
Functions on different endpoints can call each other transparently:
- intercepts calls
ProductionWrapper - looks up function in manifest
ServiceRegistry - Local function? Execute directly
- Remote function? Serialize args (cloudpickle), POST to remote endpoint
Serialization: cloudpickle + base64, max 10MB payload.
不同端点的函数可以透明调用:
- 拦截函数调用
ProductionWrapper - 在清单中查找函数
ServiceRegistry - 本地函数?直接执行
- 远程函数?序列化参数(cloudpickle),POST到远程端点
序列化方式:cloudpickle + base64,最大负载10MB。
Common Gotchas
常见陷阱
- External scope in @remote functions - Most common error. Everything must be inside.
- Forgetting - All remote functions must be awaited.
await - Undeclared dependencies - Must be in parameter.
dependencies=[] - Queue vs LB confusion - Queue returns , LB returns dict directly.
JobOutput - Large serialization - Pass URLs/paths, not large data objects.
- Imports at module level - Import inside functions, not at top of file.
@remote - LoadBalancer requires method+path -
@remote(config, method="POST", path="/api/x") - Bundle too large (>500MB) - Use for packages in base Docker image.
--exclude - Endpoints accumulate - Clean up with /
flash undeploy list.flash undeploy <name>
- @remote函数引用外部作用域 - 最常见错误,所有依赖必须在函数内部定义
- 忘记使用await - 所有远程函数必须通过await调用
- 未声明依赖 - 依赖必须添加到参数中
dependencies=[] - 队列与负载均衡混淆 - 队列返回,负载均衡直接返回字典
JobOutput - 序列化数据过大 - 传递URL/路径而非大体积数据对象
- 模块级导入 - 必须在@remote函数内部导入模块,而非文件顶部
- LoadBalancer必须指定method+path - 需使用
@remote(config, method="POST", path="/api/x") - 打包体积超过500MB - 使用排除基础镜像中已有的包
--exclude - 端点堆积 - 使用/
flash undeploy list清理flash undeploy <name>