flash

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Runpod Flash

runpod-flash (v1.0.0) is a Python SDK for distributed execution of AI workloads on RunPod's serverless infrastructure. Write Python functions locally, decorate with

@remote

, and Flash handles GPU/CPU provisioning, dependency management, and data transfer.

Package:
```
pip install runpod-flash
```

Import:

from runpod_flash import remote, LiveServerless, GpuGroup, ...

CLI:
```
flash
```
Python: >=3.10, <3.15

runpod-flash（v1.0.0）是一款Python SDK，用于在RunPod的无服务器基础设施上分布式执行AI工作负载。你可以在本地编写Python函数，使用

@remote

装饰器标记，Flash会自动处理GPU/CPU资源分配、依赖管理以及数据传输。

安装包：
```
pip install runpod-flash
```

导入方式：

from runpod_flash import remote, LiveServerless, GpuGroup, ...

命令行工具：
```
flash
```
Python版本要求：>=3.10, <3.15

Getting Started

快速开始

1. Install Flash

1. 安装Flash

bash

pip install runpod-flash

bash

pip install runpod-flash

2. Set your RunPod API key

2. 设置RunPod API密钥

Get a key from RunPod account settings, then either export it:

bash

export RUNPOD_API_KEY=your_api_key_here

Or save in a

.env

file in your project directory (Flash auto-loads via

python-dotenv

bash

echo "RUNPOD_API_KEY=your_api_key_here" > .env

从RunPod账户设置页面获取密钥，然后通过以下方式配置：

bash

export RUNPOD_API_KEY=your_api_key_here

或者在项目目录下创建

.env

文件（Flash会通过

python-dotenv

自动加载）：

bash

echo "RUNPOD_API_KEY=your_api_key_here" > .env

3. Write and run a remote function

3. 编写并运行远程函数

python

import asyncio
from runpod_flash import remote, LiveServerless

gpu_config = LiveServerless(name="my-first-worker")

@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}

async def main():
    result = await gpu_task([1, 2, 3, 4, 5])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

First run takes ~1 minute (endpoint provisioning). Subsequent runs take ~1 second.

python

import asyncio
from runpod_flash import remote, LiveServerless

gpu_config = LiveServerless(name="my-first-worker")

@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}

async def main():
    result = await gpu_task([1, 2, 3, 4, 5])
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

首次运行约需1分钟（用于端点资源分配），后续运行仅需约1秒。

4. Or create a Flash API project

4. 创建Flash API项目

bash

flash init my_project
cd my_project
pip install -r requirements.txt

bash

flash init my_project
cd my_project
pip install -r requirements.txt

Edit .env and add your RUNPOD_API_KEY

编辑.env文件并添加你的RUNPOD_API_KEY

flash run # Start local FastAPI server at localhost:8888 flash run --auto-provision # Pre-deploy all endpoints (faster testing)


API explorer available at `http://localhost:8888/docs`.

flash run # 在localhost:8888启动本地FastAPI服务器 flash run --auto-provision # 预部署所有端点（加快测试速度）


API调试页面可访问：`http://localhost:8888/docs`。

5. Build and deploy to production

5. 构建并部署到生产环境

bash

flash build                              # Scan @remote functions, package artifact
flash build --exclude torch,torchvision  # Exclude packages in base image (500MB limit)
flash deploy new production              # Create deployment environment
flash deploy send production             # Upload and deploy
flash deploy list                        # List environments
flash deploy info production             # Show details
flash deploy delete production           # Tear down

bash

flash build                              # 扫描@remote函数，打包产物
flash build --exclude torch,torchvision  # 排除基础镜像中已有的包（部署包上限500MB）
flash deploy new production              # 创建部署环境
flash deploy send production             # 上传并部署
flash deploy list                        # 列出所有环境
flash deploy info production             # 查看环境详情
flash deploy delete production           # 销毁环境

Core Concept: The @remote Decorator

核心概念：@remote装饰器

The

@remote

decorator marks functions for remote execution on RunPod infrastructure. Code inside runs remotely; code outside runs locally.

python

from runpod_flash import remote, LiveServerless

config = LiveServerless(name="my-worker")

@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
    import torch  # MUST import inside function
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

result = await gpu_compute([1, 2, 3])

@remote

装饰器用于标记需要在RunPod基础设施上远程执行的函数。装饰器内部的代码在远程运行，外部代码在本地运行。

python

from runpod_flash import remote, LiveServerless

config = LiveServerless(name="my-worker")

@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
    import torch  # 必须在函数内部导入
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

result = await gpu_compute([1, 2, 3])

@remote Signature

@remote参数说明

python

def remote(
    resource_config: ServerlessResource,  # Required: GPU/CPU config
    dependencies: list[str] = None,       # pip packages
    system_dependencies: list[str] = None,# apt-get packages
    accelerate_downloads: bool = True,    # CDN acceleration
    local: bool = False,                  # Execute locally (testing)
    method: str = None,                   # HTTP method (LoadBalancer only)
    path: str = None,                     # HTTP path (LoadBalancer only)
)

python

def remote(
    resource_config: ServerlessResource,  # 必填：GPU/CPU配置
    dependencies: list[str] = None,       # pip依赖包
    system_dependencies: list[str] = None,# apt-get系统依赖
    accelerate_downloads: bool = True,    # CDN加速下载
    local: bool = False,                  # 本地执行（测试用）
    method: str = None,                   # HTTP方法（仅LoadBalancer可用）
    path: str = None,                     # HTTP路径（仅LoadBalancer可用）
)

CRITICAL: Cloudpickle Scoping Rules

关键注意事项：Cloudpickle作用域规则

Functions decorated with

@remote

are serialized with cloudpickle. They can ONLY access:

Function parameters
Local variables defined inside the function
Imports done inside the function
Built-in Python functions

They CANNOT access: module-level imports, global variables, external functions/classes.

python

undefined

使用

@remote

装饰的函数会通过cloudpickle序列化，仅能访问以下内容：

函数参数
函数内部定义的局部变量
函数内部导入的模块
Python内置函数

无法访问：模块级导入、全局变量、外部函数/类。

python

undefined

WRONG - external references

错误示例 - 引用外部资源

import torch @remote(resource_config=config) async def bad(data): return torch.tensor(data) # torch not accessible

import torch @remote(resource_config=config) async def bad(data): return torch.tensor(data) # torch无法被访问

CORRECT - everything inside

正确示例 - 所有资源在内部定义

@remote(resource_config=config, dependencies=["torch"]) async def good(data): import torch return torch.tensor(data)

undefined

@remote(resource_config=config, dependencies=["torch"]) async def good(data): import torch return torch.tensor(data)

undefined

Return Behavior

返回值行为

Decorated function is always awaitable (
```
await my_func(...)
```
)
Queue-based resources return
```
JobOutput
```
with
```
.output
```
,
```
.error
```
,
```
.status
```
Load-balanced resources return your dict directly

被装饰的函数始终是可等待的（需使用
```
await my_func(...)
```
）
基于队列的资源会返回
```
JobOutput
```
对象，包含
```
.output
```
,
```
.error
```
,
```
.status
```
属性
负载均衡类资源会直接返回你的字典数据

Resource Configuration Classes

资源配置类

Choose based on execution model and environment:

Class	Queue	HTTP	Environment	Use Case
`LiveServerless`	Yes	No	Dev	GPU with retries, remote code exec
`CpuLiveServerless`	Yes	No	Dev	CPU with retries, remote code exec
`ServerlessEndpoint`	Yes	No	Prod	GPU, custom Docker images
`CpuServerlessEndpoint`	Yes	No	Prod	CPU, custom Docker images
`LiveLoadBalancer`	No	Yes	Dev	GPU low-latency HTTP APIs
`CpuLiveLoadBalancer`	No	Yes	Dev	CPU low-latency HTTP APIs
`LoadBalancerSlsResource`	No	Yes	Prod	GPU production HTTP
`CpuLoadBalancerSlsResource`	No	Yes	Prod	CPU production HTTP

Queue-based: Best for batch, long-running tasks, automatic retries. Load-balanced: Best for real-time APIs, low-latency, direct HTTP routing.

Live* classes: Fixed optimized Docker image, full remote code execution. Non-Live classes: Custom Docker images, dictionary payload only.

根据执行模型和环境选择合适的配置类：

类名	队列支持	HTTP支持	适用环境	使用场景
`LiveServerless`	是	否	开发环境	带重试机制的GPU任务、远程代码执行
`CpuLiveServerless`	是	否	开发环境	带重试机制的CPU任务、远程代码执行
`ServerlessEndpoint`	是	否	生产环境	GPU任务、自定义Docker镜像
`CpuServerlessEndpoint`	是	否	生产环境	CPU任务、自定义Docker镜像
`LiveLoadBalancer`	否	是	开发环境	低延迟GPU HTTP API
`CpuLiveLoadBalancer`	否	是	开发环境	低延迟CPU HTTP API
`LoadBalancerSlsResource`	否	是	生产环境	生产级GPU HTTP服务
`CpuLoadBalancerSlsResource`	否	是	生产环境	生产级CPU HTTP服务

基于队列的资源：最适合批量、长时间运行的任务，支持自动重试。 负载均衡类资源：最适合实时API、低延迟场景，支持直接HTTP路由。

Live*类：使用固定优化的Docker镜像，支持完整的远程代码执行。 非Live类：支持自定义Docker镜像，仅接受字典格式的负载。

Common Parameters

通用参数

python

LiveServerless(
    name="worker-name",              # Required, unique
    gpus=[GpuGroup.AMPERE_80],       # GPU type(s)
    workersMin=0,                     # Min workers
    workersMax=3,                     # Max workers
    idleTimeout=300,                  # Seconds before scale-down
    networkVolumeId="vol_abc123",     # Persistent storage
    env={"KEY": "value"},             # Environment variables
    template=PodTemplate(containerDiskInGb=100),
)

python

LiveServerless(
    name="worker-name",              # 必填，唯一标识
    gpus=[GpuGroup.AMPERE_80],       # GPU类型
    workersMin=0,                     # 最小工作节点数
    workersMax=3,                     # 最大工作节点数
    idleTimeout=300,                  # 空闲超时后缩容（秒）
    networkVolumeId="vol_abc123",     # 持久化存储
    env={"KEY": "value"},             # 环境变量
    template=PodTemplate(containerDiskInGb=100),
)

GPU Groups (GpuGroup enum)

GPU分组（GpuGroup枚举）

```
GpuGroup.ANY
```
- Any available (not for production)
```
GpuGroup.AMPERE_16
```
- RTX A4000, 16GB
```
GpuGroup.AMPERE_24
```
- RTX A5000, 24GB
```
GpuGroup.AMPERE_48
```
- A40/RTX A6000, 48GB
```
GpuGroup.AMPERE_80
```
- A100, 80GB
```
GpuGroup.ADA_24
```
- RTX 4090, 24GB
```
GpuGroup.ADA_32_PRO
```
- RTX 5090, 32GB
```
GpuGroup.ADA_48_PRO
```
- RTX 6000 Ada, 48GB
```
GpuGroup.ADA_80_PRO
```
- H100, 80GB
```
GpuGroup.HOPPER_141
```
- H200, 141GB

```
GpuGroup.ANY
```
- 任意可用GPU（不建议生产环境使用）
```
GpuGroup.AMPERE_16
```
- RTX A4000，16GB
```
GpuGroup.AMPERE_24
```
- RTX A5000，24GB
```
GpuGroup.AMPERE_48
```
- A40/RTX A6000，48GB
```
GpuGroup.AMPERE_80
```
- A100，80GB
```
GpuGroup.ADA_24
```
- RTX 4090，24GB
```
GpuGroup.ADA_32_PRO
```
- RTX 5090，32GB
```
GpuGroup.ADA_48_PRO
```
- RTX 6000 Ada，48GB
```
GpuGroup.ADA_80_PRO
```
- H100，80GB
```
GpuGroup.HOPPER_141
```
- H200，141GB

CPU Instance Types (CpuInstanceType enum)

CPU实例类型（CpuInstanceType枚举）

Format:

CPU{generation}{type}_{vcpu}_{memory_gb}

Instance Type	Gen	Type	vCPU	RAM
`CPU3G_1_4`	3rd	General	1	4GB
`CPU3G_2_8`	3rd	General	2	8GB
`CPU3G_4_16`	3rd	General	4	16GB
`CPU3G_8_32`	3rd	General	8	32GB
`CPU3C_1_2`	3rd	Compute	1	2GB
`CPU3C_2_4`	3rd	Compute	2	4GB
`CPU3C_4_8`	3rd	Compute	4	8GB
`CPU3C_8_16`	3rd	Compute	8	16GB
`CPU5C_1_2`	5th	Compute	1	2GB
`CPU5C_2_4`	5th	Compute	2	4GB
`CPU5C_4_8`	5th	Compute	4	8GB
`CPU5C_8_16`	5th	Compute	8	16GB

Use with

instanceIds

parameter:

python

config = LiveServerless(
    name="cpu-worker",
    instanceIds=[CpuInstanceType.CPU5C_4_8],
    workersMax=5,
)

Or use explicit CPU classes:

python

from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)

格式：

CPU{generation}{type}_{vcpu}_{memory_gb}

实例类型	代际	类型	vCPU	内存
`CPU3G_1_4`	3代	通用型	1	4GB
`CPU3G_2_8`	3代	通用型	2	8GB
`CPU3G_4_16`	3代	通用型	4	16GB
`CPU3G_8_32`	3代	通用型	8	32GB
`CPU3C_1_2`	3代	计算型	1	2GB
`CPU3C_2_4`	3代	计算型	2	4GB
`CPU3C_4_8`	3代	计算型	4	8GB
`CPU3C_8_16`	3代	计算型	8	16GB
`CPU5C_1_2`	5代	计算型	1	2GB
`CPU5C_2_4`	5代	计算型	2	4GB
`CPU5C_4_8`	5代	计算型	4	8GB
`CPU5C_8_16`	5代	计算型	8	16GB

使用

instanceIds

参数指定：

python

config = LiveServerless(
    name="cpu-worker",
    instanceIds=[CpuInstanceType.CPU5C_4_8],
    workersMax=5,
)

或者直接使用CPU专用配置类：

python

from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)

PodTemplate

Override pod-level settings:

python

from runpod_flash import PodTemplate

template = PodTemplate(
    containerDiskInGb=100,
    env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)

config = LiveServerless(name="worker", template=template)

覆盖Pod级别的设置：

python

from runpod_flash import PodTemplate

template = PodTemplate(
    containerDiskInGb=100,
    env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)

config = LiveServerless(name="worker", template=template)

NetworkVolume

python

from runpod_flash import NetworkVolume, DataCenter

volume = NetworkVolume(
    name="model-storage",
    size=100,  # GB
    dataCenterId=DataCenter.EU_RO_1,
)

python

from runpod_flash import NetworkVolume, DataCenter

volume = NetworkVolume(
    name="model-storage",
    size=100,  # GB
    dataCenterId=DataCenter.EU_RO_1,
)

LoadBalancer Resources

LoadBalancer资源

When using

LoadBalancerSlsResource

LiveLoadBalancer

```
method
```
and
```
path
```
are required on
```
@remote
```
```
path
```
must start with "/"
```
method
```
must be one of: GET, POST, PUT, DELETE, PATCH

python

from runpod_flash import remote, LiveLoadBalancer

api = LiveLoadBalancer(name="api-service")

@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
    return {"result": x + y}

@remote(api, method="GET", path="/api/health")
def health():
    return {"status": "ok"}

Key differences from queue-based:

Direct HTTP routing (no queue), lower latency
Returns dict directly (no JobOutput wrapper)
No automatic retries

使用

LoadBalancerSlsResource

或

LiveLoadBalancer

时：

```
@remote
```
必须指定
```
method
```
和
```
path
```
参数
```
path
```
必须以"/"开头
```
method
```
必须是以下之一：GET, POST, PUT, DELETE, PATCH

python

from runpod_flash import remote, LiveLoadBalancer

api = LiveLoadBalancer(name="api-service")

@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
    return {"result": x + y}

@remote(api, method="GET", path="/api/health")
def health():
    return {"status": "ok"}

与队列资源的主要区别：

直接HTTP路由（无队列），延迟更低
直接返回字典（无JobOutput包装）
不支持自动重试

Error Handling

错误处理

Queue-Based Resources

队列型资源

python

job_output = await my_function(data)
if job_output.error:
    print(f"Failed: {job_output.error}")
else:
    result = job_output.output

JobOutput

fields:

id

status

output

error

started_at

ended_at

python

job_output = await my_function(data)
if job_output.error:
    print(f"执行失败: {job_output.error}")
else:
    result = job_output.output

JobOutput

字段：

id

status

output

error

started_at

ended_at

Load-Balanced Resources

负载均衡型资源

python

try:
    result = await my_function(data)  # Returns dict directly
except Exception as e:
    print(f"Error: {e}")

python

try:
    result = await my_function(data)  # 直接返回字典
except Exception as e:
    print(f"错误: {e}")

Runtime Exceptions

运行时异常类型

FlashRuntimeError (base)
  RemoteExecutionError      # Remote function failed
  SerializationError        # cloudpickle serialization failed
  GraphQLError              # GraphQL base error
    GraphQLMutationError    # Mutation failed
    GraphQLQueryError       # Query failed
  ManifestError             # Invalid/missing manifest
  ManifestServiceUnavailableError  # State Manager unreachable

FlashRuntimeError (基类)
  RemoteExecutionError      # 远程函数执行失败
  SerializationError        # cloudpickle序列化失败
  GraphQLError              # GraphQL基础错误
    GraphQLMutationError    # Mutation操作失败
    GraphQLQueryError       # Query操作失败
  ManifestError             # 清单文件无效/缺失
  ManifestServiceUnavailableError  # 状态管理器不可达

Common Patterns

常见使用模式

Hybrid GPU/CPU Pipeline

GPU/CPU混合流水线

python

from runpod_flash import remote, LiveServerless, CpuInstanceType

cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])

@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
    import pandas as pd
    return pd.DataFrame(data).to_dict('records')

@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

async def pipeline(raw_data):
    clean = await preprocess(raw_data)
    return await inference(clean)

python

from runpod_flash import remote, LiveServerless, CpuInstanceType

cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])

@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
    import pandas as pd
    return pd.DataFrame(data).to_dict('records')

@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
    import torch
    tensor = torch.tensor(data, device="cuda")
    return {"result": tensor.sum().item()}

async def pipeline(raw_data):
    clean = await preprocess(raw_data)
    return await inference(clean)

Parallel Execution

并行执行

python

results = await asyncio.gather(
    process_item(item1),
    process_item(item2),
    process_item(item3),
)

python

results = await asyncio.gather(
    process_item(item1),
    process_item(item2),
    process_item(item3),
)

Local Testing

本地测试

python

@remote(resource_config=config, local=True)
async def my_function(data):
    return {"status": "ok"}  # Runs locally, skips remote

python

@remote(resource_config=config, local=True)
async def my_function(data):
    return {"status": "ok"}  # 本地执行，跳过远程调用

Cost Optimization

成本优化建议

Use
```
workersMin=0
```
to scale from zero
Use
```
idleTimeout=600
```
to reduce churn
Use smaller GPUs if they fit your model
Use
```
Live*
```
classes for spot pricing in dev
Pass URLs/paths instead of large data objects

设置
```
workersMin=0
```
实现零实例启动
调大
```
idleTimeout=600
```
减少节点频繁启停
选择符合模型需求的小规格GPU
开发环境使用Live*类享受按需定价
传递URL/路径而非大体积数据对象

CLI Commands

CLI命令

flash init

bash

flash init [project_name]

Creates a project template:

project_name/
├── main.py                # FastAPI entry point
├── workers/
│   ├── gpu/__init__.py    # GPU router
│   │   └── endpoint.py    # GPU @remote function
│   └── cpu/__init__.py    # CPU router
│       └── endpoint.py    # CPU @remote function
├── .env                   # API key template
├── .gitignore
├── .flashignore           # Deployment ignore patterns
├── requirements.txt
└── README.md

bash

flash init [project_name]

创建项目模板：

project_name/
├── main.py                # FastAPI入口文件
├── workers/
│   ├── gpu/__init__.py    # GPU路由
│   │   └── endpoint.py    # GPU @remote函数
│   └── cpu/__init__.py    # CPU路由
│       └── endpoint.py    # CPU @remote函数
├── .env                   # API密钥模板
├── .gitignore
├── .flashignore           # 部署忽略规则
├── requirements.txt
└── README.md

flash run

bash

flash run [--auto-provision] [--host HOST] [--port PORT]

Option	Default	Description
`--auto-provision`	off	Pre-deploy all endpoints before serving
`--host`	`localhost`	Server host (or `FLASH_HOST` env)
`--port`	`8888`	Server port (or `FLASH_PORT` env)

bash

flash run [--auto-provision] [--host HOST] [--port PORT]

选项	默认值	描述
`--auto-provision`	关闭	启动服务前预部署所有端点
`--host`	`localhost`	服务器地址（或通过FLASH_HOST环境变量设置）
`--port`	`8888`	服务器端口（或通过FLASH_PORT环境变量设置）

flash build

bash

flash build [--exclude PACKAGES] [--keep-build] [--preview]

Option	Description
`--exclude pkg1,pkg2`	Skip packages already in base Docker image
`--keep-build`	Don't delete `.flash/.build/` after packaging
`--preview`	Build then run in local Docker containers

Build steps: scan

@remote

decorators, group by resource config, create

flash_manifest.json

, install dependencies for Linux x86_64, package into

.flash/artifact.tar.gz

500MB deployment limit - use

--exclude

for packages in base image:

bash

flash build --exclude torch,torchvision,torchaudio

--preview
mode: Creates Docker containers per resource config, starts mothership on

localhost:8000

, enables end-to-end local testing.

bash

flash build [--exclude PACKAGES] [--keep-build] [--preview]

选项	描述
`--exclude pkg1,pkg2`	跳过基础Docker镜像中已有的包
`--keep-build`	打包后不删除 `.flash/.build/` 目录
`--preview`	构建后在本地Docker容器中运行，用于测试

构建步骤：扫描

@remote

装饰器、按资源配置分组、创建

flash_manifest.json

、为Linux x86_64安装依赖、打包为

.flash/artifact.tar.gz

。

部署包上限500MB - 使用

--exclude

排除基础镜像中已有的包：

bash

flash build --exclude torch,torchvision,torchaudio

--preview
模式：为每个资源配置创建Docker容器，在

localhost:8000

启动协调节点，支持端到端本地测试。

flash deploy

bash

flash deploy new <env_name> [--app-name NAME]   # Create environment
flash deploy send <env_name> [--app-name NAME]   # Deploy archive
flash deploy list [--app-name NAME]               # List environments
flash deploy info <env_name> [--app-name NAME]    # Show details
flash deploy delete <env_name> [--app-name NAME]  # Delete (double confirmation)

flash deploy send

requires

flash build

to have been run first.

bash

flash deploy new <env_name> [--app-name NAME]   # 创建环境
flash deploy send <env_name> [--app-name NAME]   # 部署归档包
flash deploy list [--app-name NAME]               # 列出所有环境
flash deploy info <env_name> [--app-name NAME]    # 查看环境详情
flash deploy delete <env_name> [--app-name NAME]  # 删除环境（需二次确认）

执行

flash deploy send

前需先运行

flash build

。

flash undeploy

bash

flash undeploy list          # List all deployed resources
flash undeploy <name>        # Undeploy specific resource

bash

flash undeploy list          # 列出所有已部署资源
flash undeploy <name>        # 卸载指定资源

flash env / flash app

bash

flash env list|create|info|delete <name>   # Environment management
flash app list|get <name>                  # App management

bash

flash env list|create|info|delete <name>   # 环境管理
flash app list|get <name>                  # 应用管理

Architecture Overview

架构概述

Deployment Architecture

部署架构

Mothership Pattern: Coordinator endpoint + distributed child endpoints.

```
flash build
```
scans code, creates manifest + archive
```
flash deploy send
```
uploads archive, provisions resources
Mothership boots, reconciles desired vs current state
Child endpoints query State Manager GraphQL for service discovery (peer-to-peer)
Functions route locally or remotely based on manifest

协调节点模式：协调端点 + 分布式子端点。

```
flash build
```
扫描代码，创建清单文件和归档包
```
flash deploy send
```
上传归档包，分配资源
协调节点启动，协调期望状态与当前状态
子端点通过State Manager GraphQL进行服务发现（点对点）
根据清单文件决定函数本地执行或远程路由

Cross-Endpoint Routing

跨端点路由

Functions on different endpoints can call each other transparently:

```
ProductionWrapper
```
intercepts calls
```
ServiceRegistry
```
looks up function in manifest
Local function? Execute directly
Remote function? Serialize args (cloudpickle), POST to remote endpoint

Serialization: cloudpickle + base64, max 10MB payload.

不同端点的函数可以透明调用：

```
ProductionWrapper
```
拦截函数调用
```
ServiceRegistry
```
在清单中查找函数
本地函数？直接执行
远程函数？序列化参数（cloudpickle），POST到远程端点

序列化方式：cloudpickle + base64，最大负载10MB。

Common Gotchas

常见陷阱

External scope in @remote functions - Most common error. Everything must be inside.
Forgetting
await
- All remote functions must be awaited.
Undeclared dependencies - Must be in
```
dependencies=[]
```
parameter.
Queue vs LB confusion - Queue returns
```
JobOutput
```
, LB returns dict directly.
Large serialization - Pass URLs/paths, not large data objects.
Imports at module level - Import inside
```
@remote
```
functions, not at top of file.

LoadBalancer requires method+path -

@remote(config, method="POST", path="/api/x")

Bundle too large (>500MB) - Use
```
--exclude
```
for packages in base Docker image.

Endpoints accumulate - Clean up with

flash undeploy list

flash undeploy <name>

@remote函数引用外部作用域 - 最常见错误，所有依赖必须在函数内部定义
忘记使用await - 所有远程函数必须通过await调用
未声明依赖 - 依赖必须添加到
```
dependencies=[]
```
参数中
队列与负载均衡混淆 - 队列返回
```
JobOutput
```
，负载均衡直接返回字典
序列化数据过大 - 传递URL/路径而非大体积数据对象
模块级导入 - 必须在@remote函数内部导入模块，而非文件顶部
LoadBalancer必须指定method+path - 需使用
```
@remote(config, method="POST", path="/api/x")
```
打包体积超过500MB - 使用
```
--exclude
```
排除基础镜像中已有的包

端点堆积 - 使用

flash undeploy list

flash undeploy <name>

清理