flash

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Runpod Flash

Runpod Flash

runpod-flash (v1.0.0) is a Python SDK for distributed execution of AI workloads on RunPod's serverless infrastructure. Write Python functions locally, decorate with
@remote
, and Flash handles GPU/CPU provisioning, dependency management, and data transfer.
  • Package:
    pip install runpod-flash
  • Import:
    from runpod_flash import remote, LiveServerless, GpuGroup, ...
  • CLI:
    flash
  • Python: >=3.10, <3.15
runpod-flash(v1.0.0)是一款Python SDK,用于在RunPod的无服务器基础设施上分布式执行AI工作负载。你可以在本地编写Python函数,使用
@remote
装饰器标记,Flash会自动处理GPU/CPU资源分配、依赖管理以及数据传输。
  • 安装包
    pip install runpod-flash
  • 导入方式
    from runpod_flash import remote, LiveServerless, GpuGroup, ...
  • 命令行工具
    flash
  • Python版本要求:>=3.10, <3.15

Getting Started

快速开始

1. Install Flash

1. 安装Flash

bash
pip install runpod-flash
bash
pip install runpod-flash

2. Set your RunPod API key

2. 设置RunPod API密钥

Get a key from RunPod account settings, then either export it:
bash
export RUNPOD_API_KEY=your_api_key_here
Or save in a
.env
file in your project directory (Flash auto-loads via
python-dotenv
):
bash
echo "RUNPOD_API_KEY=your_api_key_here" > .env
RunPod账户设置页面获取密钥,然后通过以下方式配置:
bash
export RUNPOD_API_KEY=your_api_key_here
或者在项目目录下创建
.env
文件(Flash会通过
python-dotenv
自动加载):
bash
echo "RUNPOD_API_KEY=your_api_key_here" > .env

3. Write and run a remote function

3. 编写并运行远程函数

python
import asyncio
from runpod_flash import remote, LiveServerless

gpu_config = LiveServerless(name="my-first-worker")

@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}

async def main():
    result = await gpu_task([1, 2, 3, 4, 5])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())
First run takes ~1 minute (endpoint provisioning). Subsequent runs take ~1 second.
python
import asyncio
from runpod_flash import remote, LiveServerless

gpu_config = LiveServerless(name="my-first-worker")

@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}

async def main():
    result = await gpu_task([1, 2, 3, 4, 5])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())
首次运行约需1分钟(用于端点资源分配),后续运行仅需约1秒。

4. Or create a Flash API project

4. 创建Flash API项目

bash
flash init my_project
cd my_project
pip install -r requirements.txt
bash
flash init my_project
cd my_project
pip install -r requirements.txt

Edit .env and add your RUNPOD_API_KEY

编辑.env文件并添加你的RUNPOD_API_KEY

flash run # Start local FastAPI server at localhost:8888 flash run --auto-provision # Pre-deploy all endpoints (faster testing)

API explorer available at `http://localhost:8888/docs`.
flash run # 在localhost:8888启动本地FastAPI服务器 flash run --auto-provision # 预部署所有端点(加快测试速度)

API调试页面可访问:`http://localhost:8888/docs`。

5. Build and deploy to production

5. 构建并部署到生产环境

bash
flash build                              # Scan @remote functions, package artifact
flash build --exclude torch,torchvision  # Exclude packages in base image (500MB limit)
flash deploy new production              # Create deployment environment
flash deploy send production             # Upload and deploy
flash deploy list                        # List environments
flash deploy info production             # Show details
flash deploy delete production           # Tear down
bash
flash build                              # 扫描@remote函数,打包产物
flash build --exclude torch,torchvision  # 排除基础镜像中已有的包(部署包上限500MB)
flash deploy new production              # 创建部署环境
flash deploy send production             # 上传并部署
flash deploy list                        # 列出所有环境
flash deploy info production             # 查看环境详情
flash deploy delete production           # 销毁环境

Core Concept: The @remote Decorator

核心概念:@remote装饰器

The
@remote
decorator marks functions for remote execution on RunPod infrastructure. Code inside runs remotely; code outside runs locally.
python
from runpod_flash import remote, LiveServerless

config = LiveServerless(name="my-worker")

@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
    import torch  # MUST import inside function
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

result = await gpu_compute([1, 2, 3])
@remote
装饰器用于标记需要在RunPod基础设施上远程执行的函数。装饰器内部的代码在远程运行,外部代码在本地运行。
python
from runpod_flash import remote, LiveServerless

config = LiveServerless(name="my-worker")

@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
    import torch  # 必须在函数内部导入
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

result = await gpu_compute([1, 2, 3])

@remote Signature

@remote参数说明

python
def remote(
    resource_config: ServerlessResource,  # Required: GPU/CPU config
    dependencies: list[str] = None,       # pip packages
    system_dependencies: list[str] = None,# apt-get packages
    accelerate_downloads: bool = True,    # CDN acceleration
    local: bool = False,                  # Execute locally (testing)
    method: str = None,                   # HTTP method (LoadBalancer only)
    path: str = None,                     # HTTP path (LoadBalancer only)
)
python
def remote(
    resource_config: ServerlessResource,  # 必填:GPU/CPU配置
    dependencies: list[str] = None,       # pip依赖包
    system_dependencies: list[str] = None,# apt-get系统依赖
    accelerate_downloads: bool = True,    # CDN加速下载
    local: bool = False,                  # 本地执行(测试用)
    method: str = None,                   # HTTP方法(仅LoadBalancer可用)
    path: str = None,                     # HTTP路径(仅LoadBalancer可用)
)

CRITICAL: Cloudpickle Scoping Rules

关键注意事项:Cloudpickle作用域规则

Functions decorated with
@remote
are serialized with cloudpickle. They can ONLY access:
  • Function parameters
  • Local variables defined inside the function
  • Imports done inside the function
  • Built-in Python functions
They CANNOT access: module-level imports, global variables, external functions/classes.
python
undefined
使用
@remote
装饰的函数会通过cloudpickle序列化,仅能访问以下内容:
  • 函数参数
  • 函数内部定义的局部变量
  • 函数内部导入的模块
  • Python内置函数
无法访问:模块级导入、全局变量、外部函数/类。
python
undefined

WRONG - external references

错误示例 - 引用外部资源

import torch @remote(resource_config=config) async def bad(data): return torch.tensor(data) # torch not accessible
import torch @remote(resource_config=config) async def bad(data): return torch.tensor(data) # torch无法被访问

CORRECT - everything inside

正确示例 - 所有资源在内部定义

@remote(resource_config=config, dependencies=["torch"]) async def good(data): import torch return torch.tensor(data)
undefined
@remote(resource_config=config, dependencies=["torch"]) async def good(data): import torch return torch.tensor(data)
undefined

Return Behavior

返回值行为

  • Decorated function is always awaitable (
    await my_func(...)
    )
  • Queue-based resources return
    JobOutput
    with
    .output
    ,
    .error
    ,
    .status
  • Load-balanced resources return your dict directly
  • 被装饰的函数始终是可等待的(需使用
    await my_func(...)
  • 基于队列的资源会返回
    JobOutput
    对象,包含
    .output
    ,
    .error
    ,
    .status
    属性
  • 负载均衡类资源会直接返回你的字典数据

Resource Configuration Classes

资源配置类

Choose based on execution model and environment:
ClassQueueHTTPEnvironmentUse Case
LiveServerless
YesNoDevGPU with retries, remote code exec
CpuLiveServerless
YesNoDevCPU with retries, remote code exec
ServerlessEndpoint
YesNoProdGPU, custom Docker images
CpuServerlessEndpoint
YesNoProdCPU, custom Docker images
LiveLoadBalancer
NoYesDevGPU low-latency HTTP APIs
CpuLiveLoadBalancer
NoYesDevCPU low-latency HTTP APIs
LoadBalancerSlsResource
NoYesProdGPU production HTTP
CpuLoadBalancerSlsResource
NoYesProdCPU production HTTP
Queue-based: Best for batch, long-running tasks, automatic retries. Load-balanced: Best for real-time APIs, low-latency, direct HTTP routing.
Live* classes: Fixed optimized Docker image, full remote code execution. Non-Live classes: Custom Docker images, dictionary payload only.
根据执行模型和环境选择合适的配置类:
类名队列支持HTTP支持适用环境使用场景
LiveServerless
开发环境带重试机制的GPU任务、远程代码执行
CpuLiveServerless
开发环境带重试机制的CPU任务、远程代码执行
ServerlessEndpoint
生产环境GPU任务、自定义Docker镜像
CpuServerlessEndpoint
生产环境CPU任务、自定义Docker镜像
LiveLoadBalancer
开发环境低延迟GPU HTTP API
CpuLiveLoadBalancer
开发环境低延迟CPU HTTP API
LoadBalancerSlsResource
生产环境生产级GPU HTTP服务
CpuLoadBalancerSlsResource
生产环境生产级CPU HTTP服务
基于队列的资源:最适合批量、长时间运行的任务,支持自动重试。 负载均衡类资源:最适合实时API、低延迟场景,支持直接HTTP路由。
Live*类:使用固定优化的Docker镜像,支持完整的远程代码执行。 非Live类:支持自定义Docker镜像,仅接受字典格式的负载。

Common Parameters

通用参数

python
LiveServerless(
    name="worker-name",              # Required, unique
    gpus=[GpuGroup.AMPERE_80],       # GPU type(s)
    workersMin=0,                     # Min workers
    workersMax=3,                     # Max workers
    idleTimeout=300,                  # Seconds before scale-down
    networkVolumeId="vol_abc123",     # Persistent storage
    env={"KEY": "value"},             # Environment variables
    template=PodTemplate(containerDiskInGb=100),
)
python
LiveServerless(
    name="worker-name",              # 必填,唯一标识
    gpus=[GpuGroup.AMPERE_80],       # GPU类型
    workersMin=0,                     # 最小工作节点数
    workersMax=3,                     # 最大工作节点数
    idleTimeout=300,                  # 空闲超时后缩容(秒)
    networkVolumeId="vol_abc123",     # 持久化存储
    env={"KEY": "value"},             # 环境变量
    template=PodTemplate(containerDiskInGb=100),
)

GPU Groups (GpuGroup enum)

GPU分组(GpuGroup枚举)

  • GpuGroup.ANY
    - Any available (not for production)
  • GpuGroup.AMPERE_16
    - RTX A4000, 16GB
  • GpuGroup.AMPERE_24
    - RTX A5000, 24GB
  • GpuGroup.AMPERE_48
    - A40/RTX A6000, 48GB
  • GpuGroup.AMPERE_80
    - A100, 80GB
  • GpuGroup.ADA_24
    - RTX 4090, 24GB
  • GpuGroup.ADA_32_PRO
    - RTX 5090, 32GB
  • GpuGroup.ADA_48_PRO
    - RTX 6000 Ada, 48GB
  • GpuGroup.ADA_80_PRO
    - H100, 80GB
  • GpuGroup.HOPPER_141
    - H200, 141GB
  • GpuGroup.ANY
    - 任意可用GPU(不建议生产环境使用)
  • GpuGroup.AMPERE_16
    - RTX A4000,16GB
  • GpuGroup.AMPERE_24
    - RTX A5000,24GB
  • GpuGroup.AMPERE_48
    - A40/RTX A6000,48GB
  • GpuGroup.AMPERE_80
    - A100,80GB
  • GpuGroup.ADA_24
    - RTX 4090,24GB
  • GpuGroup.ADA_32_PRO
    - RTX 5090,32GB
  • GpuGroup.ADA_48_PRO
    - RTX 6000 Ada,48GB
  • GpuGroup.ADA_80_PRO
    - H100,80GB
  • GpuGroup.HOPPER_141
    - H200,141GB

CPU Instance Types (CpuInstanceType enum)

CPU实例类型(CpuInstanceType枚举)

Format:
CPU{generation}{type}_{vcpu}_{memory_gb}
Instance TypeGenTypevCPURAM
CPU3G_1_4
3rdGeneral14GB
CPU3G_2_8
3rdGeneral28GB
CPU3G_4_16
3rdGeneral416GB
CPU3G_8_32
3rdGeneral832GB
CPU3C_1_2
3rdCompute12GB
CPU3C_2_4
3rdCompute24GB
CPU3C_4_8
3rdCompute48GB
CPU3C_8_16
3rdCompute816GB
CPU5C_1_2
5thCompute12GB
CPU5C_2_4
5thCompute24GB
CPU5C_4_8
5thCompute48GB
CPU5C_8_16
5thCompute816GB
Use with
instanceIds
parameter:
python
config = LiveServerless(
    name="cpu-worker",
    instanceIds=[CpuInstanceType.CPU5C_4_8],
    workersMax=5,
)
Or use explicit CPU classes:
python
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)
格式:
CPU{generation}{type}_{vcpu}_{memory_gb}
实例类型代际类型vCPU内存
CPU3G_1_4
3代通用型14GB
CPU3G_2_8
3代通用型28GB
CPU3G_4_16
3代通用型416GB
CPU3G_8_32
3代通用型832GB
CPU3C_1_2
3代计算型12GB
CPU3C_2_4
3代计算型24GB
CPU3C_4_8
3代计算型48GB
CPU3C_8_16
3代计算型816GB
CPU5C_1_2
5代计算型12GB
CPU5C_2_4
5代计算型24GB
CPU5C_4_8
5代计算型48GB
CPU5C_8_16
5代计算型816GB
使用
instanceIds
参数指定:
python
config = LiveServerless(
    name="cpu-worker",
    instanceIds=[CpuInstanceType.CPU5C_4_8],
    workersMax=5,
)
或者直接使用CPU专用配置类:
python
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)

PodTemplate

PodTemplate

Override pod-level settings:
python
from runpod_flash import PodTemplate

template = PodTemplate(
    containerDiskInGb=100,
    env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)

config = LiveServerless(name="worker", template=template)
覆盖Pod级别的设置:
python
from runpod_flash import PodTemplate

template = PodTemplate(
    containerDiskInGb=100,
    env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)

config = LiveServerless(name="worker", template=template)

NetworkVolume

NetworkVolume

python
from runpod_flash import NetworkVolume, DataCenter

volume = NetworkVolume(
    name="model-storage",
    size=100,  # GB
    dataCenterId=DataCenter.EU_RO_1,
)
python
from runpod_flash import NetworkVolume, DataCenter

volume = NetworkVolume(
    name="model-storage",
    size=100,  # GB
    dataCenterId=DataCenter.EU_RO_1,
)

LoadBalancer Resources

LoadBalancer资源

When using
LoadBalancerSlsResource
or
LiveLoadBalancer
:
  • method
    and
    path
    are required on
    @remote
  • path
    must start with "/"
  • method
    must be one of: GET, POST, PUT, DELETE, PATCH
python
from runpod_flash import remote, LiveLoadBalancer

api = LiveLoadBalancer(name="api-service")

@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
    return {"result": x + y}

@remote(api, method="GET", path="/api/health")
def health():
    return {"status": "ok"}
Key differences from queue-based:
  • Direct HTTP routing (no queue), lower latency
  • Returns dict directly (no JobOutput wrapper)
  • No automatic retries
使用
LoadBalancerSlsResource
LiveLoadBalancer
时:
  • @remote
    必须指定
    method
    path
    参数
  • path
    必须以"/"开头
  • method
    必须是以下之一:GET, POST, PUT, DELETE, PATCH
python
from runpod_flash import remote, LiveLoadBalancer

api = LiveLoadBalancer(name="api-service")

@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
    return {"result": x + y}

@remote(api, method="GET", path="/api/health")
def health():
    return {"status": "ok"}
与队列资源的主要区别:
  • 直接HTTP路由(无队列),延迟更低
  • 直接返回字典(无JobOutput包装)
  • 不支持自动重试

Error Handling

错误处理

Queue-Based Resources

队列型资源

python
job_output = await my_function(data)
if job_output.error:
    print(f"Failed: {job_output.error}")
else:
    result = job_output.output
JobOutput
fields:
id
,
status
,
output
,
error
,
started_at
,
ended_at
python
job_output = await my_function(data)
if job_output.error:
    print(f"执行失败: {job_output.error}")
else:
    result = job_output.output
JobOutput
字段:
id
,
status
,
output
,
error
,
started_at
,
ended_at

Load-Balanced Resources

负载均衡型资源

python
try:
    result = await my_function(data)  # Returns dict directly
except Exception as e:
    print(f"Error: {e}")
python
try:
    result = await my_function(data)  # 直接返回字典
except Exception as e:
    print(f"错误: {e}")

Runtime Exceptions

运行时异常类型

FlashRuntimeError (base)
  RemoteExecutionError      # Remote function failed
  SerializationError        # cloudpickle serialization failed
  GraphQLError              # GraphQL base error
    GraphQLMutationError    # Mutation failed
    GraphQLQueryError       # Query failed
  ManifestError             # Invalid/missing manifest
  ManifestServiceUnavailableError  # State Manager unreachable
FlashRuntimeError (基类)
  RemoteExecutionError      # 远程函数执行失败
  SerializationError        # cloudpickle序列化失败
  GraphQLError              # GraphQL基础错误
    GraphQLMutationError    # Mutation操作失败
    GraphQLQueryError       # Query操作失败
  ManifestError             # 清单文件无效/缺失
  ManifestServiceUnavailableError  # 状态管理器不可达

Common Patterns

常见使用模式

Hybrid GPU/CPU Pipeline

GPU/CPU混合流水线

python
from runpod_flash import remote, LiveServerless, CpuInstanceType

cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])

@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
    import pandas as pd
    return pd.DataFrame(data).to_dict('records')

@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

async def pipeline(raw_data):
    clean = await preprocess(raw_data)
    return await inference(clean)
python
from runpod_flash import remote, LiveServerless, CpuInstanceType

cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])

@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
    import pandas as pd
    return pd.DataFrame(data).to_dict('records')

@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

async def pipeline(raw_data):
    clean = await preprocess(raw_data)
    return await inference(clean)

Parallel Execution

并行执行

python
results = await asyncio.gather(
    process_item(item1),
    process_item(item2),
    process_item(item3),
)
python
results = await asyncio.gather(
    process_item(item1),
    process_item(item2),
    process_item(item3),
)

Local Testing

本地测试

python
@remote(resource_config=config, local=True)
async def my_function(data):
    return {"status": "ok"}  # Runs locally, skips remote
python
@remote(resource_config=config, local=True)
async def my_function(data):
    return {"status": "ok"}  # 本地执行,跳过远程调用

Cost Optimization

成本优化建议

  • Use
    workersMin=0
    to scale from zero
  • Use
    idleTimeout=600
    to reduce churn
  • Use smaller GPUs if they fit your model
  • Use
    Live*
    classes for spot pricing in dev
  • Pass URLs/paths instead of large data objects
  • 设置
    workersMin=0
    实现零实例启动
  • 调大
    idleTimeout=600
    减少节点频繁启停
  • 选择符合模型需求的小规格GPU
  • 开发环境使用Live*类享受按需定价
  • 传递URL/路径而非大体积数据对象

CLI Commands

CLI命令

flash init

flash init

bash
flash init [project_name]
Creates a project template:
project_name/
├── main.py                # FastAPI entry point
├── workers/
│   ├── gpu/__init__.py    # GPU router
│   │   └── endpoint.py    # GPU @remote function
│   └── cpu/__init__.py    # CPU router
│       └── endpoint.py    # CPU @remote function
├── .env                   # API key template
├── .gitignore
├── .flashignore           # Deployment ignore patterns
├── requirements.txt
└── README.md
bash
flash init [project_name]
创建项目模板:
project_name/
├── main.py                # FastAPI入口文件
├── workers/
│   ├── gpu/__init__.py    # GPU路由
│   │   └── endpoint.py    # GPU @remote函数
│   └── cpu/__init__.py    # CPU路由
│       └── endpoint.py    # CPU @remote函数
├── .env                   # API密钥模板
├── .gitignore
├── .flashignore           # 部署忽略规则
├── requirements.txt
└── README.md

flash run

flash run

bash
flash run [--auto-provision] [--host HOST] [--port PORT]
OptionDefaultDescription
--auto-provision
offPre-deploy all endpoints before serving
--host
localhost
Server host (or
FLASH_HOST
env)
--port
8888
Server port (or
FLASH_PORT
env)
bash
flash run [--auto-provision] [--host HOST] [--port PORT]
选项默认值描述
--auto-provision
关闭启动服务前预部署所有端点
--host
localhost
服务器地址(或通过FLASH_HOST环境变量设置)
--port
8888
服务器端口(或通过FLASH_PORT环境变量设置)

flash build

flash build

bash
flash build [--exclude PACKAGES] [--keep-build] [--preview]
OptionDescription
--exclude pkg1,pkg2
Skip packages already in base Docker image
--keep-build
Don't delete
.flash/.build/
after packaging
--preview
Build then run in local Docker containers
Build steps: scan
@remote
decorators, group by resource config, create
flash_manifest.json
, install dependencies for Linux x86_64, package into
.flash/artifact.tar.gz
.
500MB deployment limit - use
--exclude
for packages in base image:
bash
flash build --exclude torch,torchvision,torchaudio
--preview
mode
: Creates Docker containers per resource config, starts mothership on
localhost:8000
, enables end-to-end local testing.
bash
flash build [--exclude PACKAGES] [--keep-build] [--preview]
选项描述
--exclude pkg1,pkg2
跳过基础Docker镜像中已有的包
--keep-build
打包后不删除
.flash/.build/
目录
--preview
构建后在本地Docker容器中运行,用于测试
构建步骤:扫描
@remote
装饰器、按资源配置分组、创建
flash_manifest.json
、为Linux x86_64安装依赖、打包为
.flash/artifact.tar.gz
部署包上限500MB - 使用
--exclude
排除基础镜像中已有的包:
bash
flash build --exclude torch,torchvision,torchaudio
--preview
模式
:为每个资源配置创建Docker容器,在
localhost:8000
启动协调节点,支持端到端本地测试。

flash deploy

flash deploy

bash
flash deploy new <env_name> [--app-name NAME]   # Create environment
flash deploy send <env_name> [--app-name NAME]   # Deploy archive
flash deploy list [--app-name NAME]               # List environments
flash deploy info <env_name> [--app-name NAME]    # Show details
flash deploy delete <env_name> [--app-name NAME]  # Delete (double confirmation)
flash deploy send
requires
flash build
to have been run first.
bash
flash deploy new <env_name> [--app-name NAME]   # 创建环境
flash deploy send <env_name> [--app-name NAME]   # 部署归档包
flash deploy list [--app-name NAME]               # 列出所有环境
flash deploy info <env_name> [--app-name NAME]    # 查看环境详情
flash deploy delete <env_name> [--app-name NAME]  # 删除环境(需二次确认)
执行
flash deploy send
前需先运行
flash build

flash undeploy

flash undeploy

bash
flash undeploy list          # List all deployed resources
flash undeploy <name>        # Undeploy specific resource
bash
flash undeploy list          # 列出所有已部署资源
flash undeploy <name>        # 卸载指定资源

flash env / flash app

flash env / flash app

bash
flash env list|create|info|delete <name>   # Environment management
flash app list|get <name>                  # App management
bash
flash env list|create|info|delete <name>   # 环境管理
flash app list|get <name>                  # 应用管理

Architecture Overview

架构概述

Deployment Architecture

部署架构

Mothership Pattern: Coordinator endpoint + distributed child endpoints.
  1. flash build
    scans code, creates manifest + archive
  2. flash deploy send
    uploads archive, provisions resources
  3. Mothership boots, reconciles desired vs current state
  4. Child endpoints query State Manager GraphQL for service discovery (peer-to-peer)
  5. Functions route locally or remotely based on manifest
协调节点模式:协调端点 + 分布式子端点。
  1. flash build
    扫描代码,创建清单文件和归档包
  2. flash deploy send
    上传归档包,分配资源
  3. 协调节点启动,协调期望状态与当前状态
  4. 子端点通过State Manager GraphQL进行服务发现(点对点)
  5. 根据清单文件决定函数本地执行或远程路由

Cross-Endpoint Routing

跨端点路由

Functions on different endpoints can call each other transparently:
  1. ProductionWrapper
    intercepts calls
  2. ServiceRegistry
    looks up function in manifest
  3. Local function? Execute directly
  4. Remote function? Serialize args (cloudpickle), POST to remote endpoint
Serialization: cloudpickle + base64, max 10MB payload.
不同端点的函数可以透明调用:
  1. ProductionWrapper
    拦截函数调用
  2. ServiceRegistry
    在清单中查找函数
  3. 本地函数?直接执行
  4. 远程函数?序列化参数(cloudpickle),POST到远程端点
序列化方式:cloudpickle + base64,最大负载10MB。

Common Gotchas

常见陷阱

  1. External scope in @remote functions - Most common error. Everything must be inside.
  2. Forgetting
    await
    - All remote functions must be awaited.
  3. Undeclared dependencies - Must be in
    dependencies=[]
    parameter.
  4. Queue vs LB confusion - Queue returns
    JobOutput
    , LB returns dict directly.
  5. Large serialization - Pass URLs/paths, not large data objects.
  6. Imports at module level - Import inside
    @remote
    functions, not at top of file.
  7. LoadBalancer requires method+path -
    @remote(config, method="POST", path="/api/x")
  8. Bundle too large (>500MB) - Use
    --exclude
    for packages in base Docker image.
  9. Endpoints accumulate - Clean up with
    flash undeploy list
    /
    flash undeploy <name>
    .
  1. @remote函数引用外部作用域 - 最常见错误,所有依赖必须在函数内部定义
  2. 忘记使用await - 所有远程函数必须通过await调用
  3. 未声明依赖 - 依赖必须添加到
    dependencies=[]
    参数中
  4. 队列与负载均衡混淆 - 队列返回
    JobOutput
    ,负载均衡直接返回字典
  5. 序列化数据过大 - 传递URL/路径而非大体积数据对象
  6. 模块级导入 - 必须在@remote函数内部导入模块,而非文件顶部
  7. LoadBalancer必须指定method+path - 需使用
    @remote(config, method="POST", path="/api/x")
  8. 打包体积超过500MB - 使用
    --exclude
    排除基础镜像中已有的包
  9. 端点堆积 - 使用
    flash undeploy list
    /
    flash undeploy <name>
    清理